Many most problems I see in industry that do with data science teams are about humans, not algorithms. Getting large teams to work together is tricky stuff and a large part of the issue is that knowledge does not scale automatically.

When the team is small people will know the results of experiments and they will understand why certain design choices were made. Typically, when teams get larger the knowledge does not get distributed but it gets spread out. Next thing you know some senior people leave the team and suddenly you start seeing gaps in knowledge within teams.

I don't suggest to know the solution to all these growing pains but it has started to occur to me that documentation helps out a lot here. Finding good documentation tools seemed tricky though. You want something that is dead simple to set up, contribute to and maintain. I've tried JIRA and I've played with githubs tooling but nothing really felt simple to use and easy to share with anybody in the organisation.

A good habit: mkdocs

It took a while before I came across a tool that serves these three features well but I seem to have found it: mkdocs. It is a simple python command line that that very rapidly takes markdown files and parses them into a knowledge repository. What pelican did for blogs, mkdocs will hopefully do for documentation.

To get an idea of what to expect, here is a screenshot of what the docs could look like:

All you need to generate these docs is a command line tool and a folder structure like such:

.
├── docs
│   ├── airflow
│   │   └── airflow101.md
│   ├── ansible
│   │   ├── ansible101.md
│   │   ├── ansible201.md
│   │   └── ansible301.md
│   ├── deployment
│   │   ├── amazon
│   │   │   ├── api-gateway.md
│   │   │   ├── cdn.md
│   │   │   ├── parameters.md
│   │   │   └── security.md
│   │   └── gitlab
│   │       ├── building.md
│   │       └── testing.md
│   ├── index.md
│   ├── recommender
│   │   ├── algorithm.md
│   │   └── chaching.md
│   └── spark-cluster
│       └── kerberos.md
└── mkdocs.yml

Features

Mkdocs has nice features, not limited to:

  • To get started you'll only need to type mkdocs new and it makes a starting template for you. You can render it via mkdocs serve and you can build it into static files via mkdocs build. These static files are standalone, if you want to host them you can do that from S3.
  • The static files can easily be hosted in git, meaning that you can add version control on your documentation.
  • Note that mkdocs adds simple bit of structure to your app. It recognises folder structure and uses it to show how documentation can be nested. Headings in markdownfiles will be translated tables of contents that are clickable too (see screenshot example).
  • Mkdocs offers search functionality on the generated pages via lunr.js. This means that you can host the docs statically and still have search work. The trick is to have the search be performed on the frontend.
  • There's popular themes you can use. My favorite is material but there's also others which feel like something you might read from readthedocs.org.

My Setup

If you want to be able to add latex you may want to add some extra sauce at startup:

site_name: Team Documentation Page Name
theme: material
markdown_extensions:
  - extra
  - tables
  - fenced_code
extra_javascript:
  - https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML
  - mathjaxhelper.js
MathJax.Hub.Config({
  "tex2jax": { inlineMath: [ [ '$', '$' ] ] }
});
MathJax.Hub.Config({
  config: ["MMLorHTML.js"],
  jax: ["input/TeX", "output/HTML-CSS", "output/NativeMML"],
  extensions: ["MathMenu.js", "MathZoom.js"]
});

Conclusion

Good documentation is a good habbit that goes further than just writing a proper readme file. Mkdocs seems like the easiest way to get started with the least amount of headackes. Please give it a try and get started on working on a good habit in your team.