Documentation Formats

There are many ways to create documentation. Microsoft Word is the standard in the business world. Developers often use Markdown in README files. Online systems like Google docs can be handy for collaboration on a document. Content management systems like Wordpress fit well for blogs and web sites. Sometimes a raw text file makes sense.

This document explores the use of a documentation system in an organization when more than one person needs to maintain or review a document.

Requirements

What are the requirements?

  1. Relatively easy to use
  2. Encourage collaboration
  3. Easy to track changes and review
  4. Support any organization structure
  5. Rich formatting

Options

We are going to examine three options in this document:

  1. Microsoft Word
  2. Google docs
  3. Markdown stored in Git

Microsoft Word

Most people probably reach for MS Word these days, as we've been conditioned to do so. It is the "expected" document format in the business world. It is a powerful tool and delivers impressive results with very little work. However, there are some drawbacks to using Word:

  • Most software developers these days don't run Windows. This may come as a surprise to many but if you observe what people are using at tech conferences during presentations, and wander through the offices of leading tech companies, you will see mostly MACs and a lesser number of Linux notebooks/workstations. This is not an arbitrary choice or fad -- there are very good reasons for this which are beyond the scope of this document.
  • Even though Word can track changes, it is tedious to review them. It requires downloading the file, opening the file in Word, and looking through them. History can be reviewed, but it is usually too much effort to bother.
  • When someone commits a Word document to a Git repo, the changes are not easily visible in the commit. Again, if you want to see what changed, you need to check out and open the file. And, the history in Word is not necessarily tied to the Git commit history.
  • Multiple people can't work on the same document at the same time.

Google Docs

Google docs is a very impressive tool and excels in scenarios where a small number of trusted people want to collaborate on a document. The ability for multiple people to type into the same document and see each other's edits in real time is very neat (very handy for composing notes during a meeting). Edits are tracked, and it has all the normal features such as comments that can be useful. However, there are also drawbacks to Google docs in that a connection to the cloud is required to use it and changes can't be managed outside the normal sequence of revisions.

Markdown

Markdown is a fairly simple markup language where you can create nicely formatted documents with a simple and readable text document. As with everything, there are trade-offs. The simple/readable text format also means there are limitations to what all can be done with it. However, the functionality included in markdown is adequate for most documentation tasks.

Some of Markdown's drawbacks:

  • Is not WYSIWYG, although tools like Typora reduce this concern.
  • Is a markup language so you need to know the syntax.
  • Requires a little work in setting up a good editor, automatic formatters, etc.
  • Can't do all the advanced formatting that other solutions can.
  • Image files cannot be embedded in the document source file -- they must live in files next to markdown source.

However, there are also some significant advantages to Markdown stored in Git:

  • Because the source file syntax is simple, it is very easy to review changes in a commit diff.
  • The document/markup is very readable in a text editor -- developers don't have to leave their primary editor to work on documentation.
  • Syntax is easy to learn and use, but there are tools like Typora that give you a more classic word processor experience.
  • Encourages collaboration.

Example Scenario

Consider this scenario where we examine how things might go if the company handbook is maintained in Word, Google docs, and Markdown.

If the document was entirely in Word, the flow for an Employee to suggest a change might go as follows:

  1. employee emails the suggested change to someone in HR in the form of: "consider changed sentence "abc" on page 23 to "xyz"
  2. the maintainer of the handbook would have to make sure she had the latest version of the document -- perhaps she asks another person in the department for the latest version of the file, and that person emails the file to her
  3. she opens the Word document and makes the change with change tracking enabled
  4. a copy of the document is emailed to the CEO for review
  5. the CEO would email back some suggestions
  6. the maintainer would have to coordinate with the original author
  7. at this point, since everyone is busy, this task gets lost in the black hole of our overflowing in-boxes and is never completed.

Consider the alternative flow if the document is stored in Google Docs:

  1. employee has a suggestion, but cannot be give access to change the document in Google docs directly, as it there is no formal review process
  2. so, he emails the suggested change, similar to the Word scenario
  3. maintainer makes the change and notifies CEO to review
  4. review process is much easier that Word as both CEO and maintainer have write access to the document

Consider the alternative flow in Markdown stored in Git:

  1. employee checks out out handbook source
  2. he makes the change in a text editor (or tool like Typora) and pushes to a forked repo in his own namespace
  3. in the git portal (Github, Gitlab, Gitea, etc), a pull request is opened
  4. the maintainer of the handbook reviews the pull request and tags the CEO in the pull request to see what he things
  5. the CEO gets an email and reviews the pull request making a few suggestions
  6. the original author makes the changes and submits an updated pull request
  7. the handbook maintainer approves the pull request and the change is instantly merged into the master branch
  8. the CI (continuous integration) system sees a change in the master branch and generates an updated copy of the handbook and deploys it the company webserver (could be internal, or in the case of Gitlab, external)

Even though Markdown and Git may be less convenient to use than Word or Google docs for actually making an edit, collaborating on the change is much easier. Even if the maintainer takes a two week vacation in the middle of the process, when she gets back, the pull request is still open, reminding everyone of the pending change that needs completed. There is now process and tools that facilitate review and collaboration. Even though the editing process is a little harder, there is much less friction in the overall process of contributing a change to a shared document. Thus, more people will contribute as there is a low friction process to do so.

The difference between the push and pull model

The Markdown/Git flow described above is a "pull" model in which a person makes a change, publishes the change, and then the maintainer of the upstream document or project can easily pull the change in with a click of a button. This is different than the "push" model where a person might try to push a change into a document by emailing a suggested change, or might make the change directly in a Google doc. The big advantage of the pull model is that it enables process, tools, and workflow. A few more notes on the pull model:

  • The change is considered in the context of just that change and not mixed in with other changes. It can be easily reviewed in the pull request. Tools like Gitea, Github, etc allow discussions to happen in the pull request. Pull requests can be updated after comments are processed. Once everyone agrees on the change, it is easy to merge with a single button click.
  • There is a clear record of all changes and history that is easily reviewed. Additionally, with git blame, you can easily examine the history of any line in a file.
  • Multiple changes can be happening in parallel and each proceeds at their own pace. In Google docs, a change is made and then recorded. You can revert to a version of a document, but it is not easy to isolate a change by itself and merge it in when desired. If you want to revert a single change, you can't go back in history and pull out a single change -- this is easy to do in Git with the revert command.
  • The pull model allows any organizational structure. The maintainer is free to merge or reject any proposed change. Direct access to the master copy is never required by contributors. In Git systems like Github, Gitlab, and Gitea, permissions can be assigned to repositories, but also branches within a repository.
  • Multiple levels can also be organized. For instance, an engineering manager could be given responsibility for the engineering section in the handbook. He might collect contributions from members of that department, and submit one pull request to the company maintainer of the handbook. The changes can be staged in separate repositories, or branches within a repository.

The pull model is one reason Open Source projects have been so successful. The organizational structure of the Linux kernel is rather complex (with many levels). There are many subsystem maintainers who collect changes from contributors and then pass through multiple levels upstream. This process is all enabled by the Git distributed version-control system.

The difference between the push and pull models is the pull model scales.

The Choice

The choice of a documentation tool selection comes down to the following questions:

  1. Do we want to optimize for ease of editing (the short term)?
  2. Or do we want to optimize for collaboration and the spread of ideas and information (the long term)?

A casual study of various successful companies and open source projects suggest that reducing the friction of collaboration is critical to success -- especially in technology companies developing cutting edge/complex systems. Gitlab's handbook is maintained in a Git repo and anyone can open a pull request. While most of the contributions come within the company, there have been outside contributions. For more information on Gitlab's approach consider the following:

But Markdown/Git is too hard

Can non programmers learn Markdown and Git? This is a good question and not sure I have the answer yet, but I think they can. In many ways, learning a dozen elements of the Markdown syntax is simpler than navigating the complex menu trees of Word trying to figure things out. Git does have a learning curve, but the essence of it is fairly simple. Once you understand the basic concepts and operations, it is as natural as anything else.

Humans are inherently lazy, so most continue to just do what they have always done. Additionally, not all people are intrinsically motivated to share and collaborate. Probably the most important thing is to establish an organizational culture where the people at the top set the example, and ask others to follow. As much as we don't like to compare ourselves to sheep, most of us resemble them more than we like to admit -- we don't like to be driven, but will gladly follow the lead if it makes sense to us. Telling people to collaborate and use certain tools will not work if the people at the top of an organization are not doing the same.