Many most problems I see in industry that do with data science teams are about humans, not algorithms. Getting large teams to work together is tricky stuff and a large part of the issue is that knowledge does not scale automatically.
When the team is small people will know the results of experiments and they will understand why certain design choices were made. Typically, when teams get larger the knowledge does not get distributed but it gets spread out. Next thing you know some senior people leave the team and suddenly you start seeing gaps in knowledge within teams.
I don't suggest to know the solution to all these growing pains but it has started to occur to me that documentation helps out a lot here. Finding good documentation tools seemed tricky though. You want something that is dead simple to set up, contribute to and maintain. I've tried JIRA and I've played with githubs tooling but nothing really felt simple to use and easy to share with anybody in the organisation.
A good habit: mkdocs
It took a while before I came across a tool that serves these three features well but I seem to have found it: mkdocs. It is a simple python command line that that very rapidly takes markdown files and parses them into a knowledge repository. What pelican did for blogs, mkdocs will hopefully do for documentation.
To get an idea of what to expect, here is a screenshot of what the docs could look like:
All you need to generate these docs is a command line tool and a folder structure like such:
.
├── docs
│ ├── airflow
│ │ └── airflow101.md
│ ├── ansible
│ │ ├── ansible101.md
│ │ ├── ansible201.md
│ │ └── ansible301.md
│ ├── deployment
│ │ ├── amazon
│ │ │ ├── api-gateway.md
│ │ │ ├── cdn.md
│ │ │ ├── parameters.md
│ │ │ └── security.md
│ │ └── gitlab
│ │ ├── building.md
│ │ └── testing.md
│ ├── index.md
│ ├── recommender
│ │ ├── algorithm.md
│ │ └── chaching.md
│ └── spark-cluster
│ └── kerberos.md
└── mkdocs.yml
Features
Mkdocs has nice features, not limited to:
- To get started you'll only need to type
mkdocs new
and it makes a starting template for you. You can render it viamkdocs serve
and you can build it into static files viamkdocs build
. These static files are standalone, if you want to host them you can do that from S3. - The static files can easily be hosted in git, meaning that you can add version control on your documentation.
- Note that mkdocs adds simple bit of structure to your app. It recognises folder structure and uses it to show how documentation can be nested. Headings in markdownfiles will be translated tables of contents that are clickable too (see screenshot example).
Mkdocs
offers search functionality on the generated pages via lunr.js. This means that you can host the docs statically and still have search work. The trick is to have the search be performed on the frontend.- There's popular themes you can use. My favorite is
material
but there's also others which feel like something you might read from readthedocs.org.
My Setup
If you want to be able to add latex you may want to add some extra sauce at startup:
site_name: Team Documentation Page Name
theme: material
markdown_extensions:
- extra
- tables
- fenced_code
extra_javascript:
- https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML
- mathjaxhelper.js
MathJax.Hub.Config({
"tex2jax": { inlineMath: [ [ '$', '$' ] ] }
});
MathJax.Hub.Config({
config: ["MMLorHTML.js"],
jax: ["input/TeX", "output/HTML-CSS", "output/NativeMML"],
extensions: ["MathMenu.js", "MathZoom.js"]
});
Conclusion
Good documentation is a good habbit that goes further than just writing a proper readme file. Mkdocs
seems like the easiest way to get started with the least amount of headackes. Please give it a try and get started on working on a good habit in your team.