Documentation Formats
There are many ways to create documentation. Microsoft Word is the standard in
the business world. Developers often use Markdown in README files. Online
systems like Google docs can be handy for collaboration on a document. Content
management systems like Wordpress fit well for blogs and web sites. Sometimes a
raw text file makes sense.
This document explores the use of a documentation system in an organization when
more than one person needs to maintain or review a document.
Requirements🔗
What are the requirements?
- Relatively easy to use
- Encourage collaboration
- Easy to track changes and review
- Support any organization structure
- Rich formatting
We are going to examine three options in this document:
- Microsoft Word
- Google docs
- Markdown stored in Git
Microsoft Word🔗
Most people probably reach for MS Word these days, as we've been conditioned to
do so. It is the "expected" document format in the business world. It is a
powerful tool and delivers impressive results with very little work. However,
there are some drawbacks to using Word:
- Most software developers these days don't run Windows. This may come as a
surprise to many but if you observe what people are using at tech conferences
during presentations, and wander through the offices of leading tech
companies, you will see mostly MACs and a lesser number of Linux
notebooks/workstations. This is not an arbitrary choice or fad -- there are
very good reasons for this which are beyond the scope of this document.
- Even though Word can track changes, it is tedious to review them. It requires
downloading the file, opening the file in Word, and looking through them.
History can be reviewed, but it is usually too much effort to bother.
- When someone commits a Word document to a Git repo, the changes are not easily
visible in the commit. Again, if you want to see what changed, you need to
check out and open the file. And, the history in Word is not necessarily tied
to the Git commit history.
- Multiple people can't work on the same document at the same time.
Google Docs🔗
Google docs is a very impressive tool and excels in scenarios where a small
number of trusted people want to collaborate on a document. The ability for
multiple people to type into the same document and see each other's edits in
real time is very neat (very handy for composing notes during a meeting). Edits
are tracked, and it has all the normal features such as comments that can be
useful. However, there are also drawbacks to Google docs in that a connection to
the cloud is required to use it and changes can't be managed outside the normal
sequence of revisions.
Markdown is a fairly simple markup language where you can create nicely
formatted documents with a simple and readable text document. As with
everything, there are trade-offs. The simple/readable text format also means
there are limitations to what all can be done with it. However, the
functionality included in markdown is adequate for most documentation tasks.
Some of Markdown's drawbacks:
- Is not WYSIWYG, although tools like Typora reduce this concern.
- Is a markup language so you need to know the syntax.
- Requires a little work in setting up a good editor, automatic formatters, etc.
- Can't do all the advanced formatting that other solutions can.
- Image files cannot be embedded in the document source file -- they must live
in files next to markdown source.
However, there are also some significant advantages to Markdown stored in Git:
- Because the source file syntax is simple, it is very easy to review changes in
a commit diff.
- The document/markup is very readable in a text editor -- developers don't have
to leave their primary editor to work on documentation.
- Syntax is easy to learn and use, but there are tools like Typora that give you
a more classic word processor experience.
- Encourages collaboration.
Example Scenario🔗
Consider this scenario where we examine how things might go if the company
handbook is maintained in Word, Google docs, and Markdown.
If the document was entirely in Word, the flow for an Employee to suggest a
change might go as follows:
- employee emails the suggested change to someone in HR in the form of:
"consider changed sentence "abc" on page 23 to "xyz"
- the maintainer of the handbook would have to make sure she had the latest
version of the document -- perhaps she asks another person in the department
for the latest version of the file, and that person emails the file to her
- she opens the Word document and makes the change with change tracking enabled
- a copy of the document is emailed to the CEO for review
- the CEO would email back some suggestions
- the maintainer would have to coordinate with the original author
- at this point, since everyone is busy, this task gets lost in the black hole
of our overflowing in-boxes and is never completed.
Consider the alternative flow if the document is stored in Google Docs:
- employee has a suggestion, but cannot be give access to change the document
in Google docs directly, as it there is no formal review process
- so, he emails the suggested change, similar to the Word scenario
- maintainer makes the change and notifies CEO to review
- review process is much easier that Word as both CEO and maintainer have write
access to the document
Consider the alternative flow in Markdown stored in Git:
- employee checks out out handbook source
- he makes the change in a text editor (or tool like Typora) and pushes to a
forked repo in his own namespace
- in the git portal (Github, Gitlab, Gitea, etc), a pull request is opened
- the maintainer of the handbook reviews the pull request and tags the CEO in
the pull request to see what he things
- the CEO gets an email and reviews the pull request making a few suggestions
- the original author makes the changes and submits an updated pull request
- the handbook maintainer approves the pull request and the change is instantly
merged into the master branch
- the CI (continuous integration) system sees a change in the master branch and
generates an updated copy of the handbook and deploys it the company
webserver (could be internal, or in the case of Gitlab, external)
Even though Markdown and Git may be less convenient to use than Word or Google
docs for actually making an edit, collaborating on the change is much easier.
Even if the maintainer takes a two week vacation in the middle of the process,
when she gets back, the pull request is still open, reminding everyone of the
pending change that needs completed. There is now process and tools that
facilitate review and collaboration. Even though the editing process is a little
harder, there is much less friction in the overall process of contributing a
change to a shared document. Thus, more people will contribute as there is a low
friction process to do so.
The difference between the push and pull model🔗
The Markdown/Git flow described above is a "pull" model in which a person makes
a change, publishes the change, and then the maintainer of the upstream document
or project can easily pull the change in with a click of a button. This is
different than the "push" model where a person might try to push a change into a
document by emailing a suggested change, or might make the change directly in a
Google doc. The big advantage of the pull model is that it enables process,
tools, and workflow. A few more notes on the pull model:
- The change is considered in the context of just that change and not mixed in
with other changes. It can be easily reviewed in the pull request. Tools like
Gitea, Github, etc allow discussions to happen in the pull request. Pull
requests can be updated after comments are processed. Once everyone agrees on
the change, it is easy to merge with a single button click.
- There is a clear record of all changes and history that is easily reviewed.
Additionally, with git blame, you can easily examine the history of any line
in a file.
- Multiple changes can be happening in parallel and each proceeds at their own
pace. In Google docs, a change is made and then recorded. You can revert to a
version of a document, but it is not easy to isolate a change by itself and
merge it in when desired. If you want to revert a single change, you can't go
back in history and pull out a single change -- this is easy to do in Git with
the
revert
command.
- The pull model allows any organizational structure. The maintainer is free to
merge or reject any proposed change. Direct access to the master copy is never
required by contributors. In Git systems like Github, Gitlab, and Gitea,
permissions can be assigned to repositories, but also branches within a
repository.
- Multiple levels can also be organized. For instance, an engineering manager
could be given responsibility for the engineering section in the handbook. He
might collect contributions from members of that department, and submit one
pull request to the company maintainer of the handbook. The changes can be
staged in separate repositories, or branches within a repository.
The pull model is one reason Open Source projects have been so successful. The
organizational structure of the Linux kernel is rather complex (with many
levels). There are many subsystem maintainers who collect changes from
contributors and then pass through multiple levels upstream. This process is all
enabled by the Git distributed version-control system.
The difference between the push and pull models is the pull model scales.
The Choice🔗
The choice of a documentation tool selection comes down to the following
questions:
- Do we want to optimize for ease of editing (the short term)?
- Or do we want to optimize for collaboration and the spread of ideas and
information (the long term)?
A casual study of various successful companies and open source projects suggest
that reducing the friction of collaboration is critical to success -- especially
in technology companies developing cutting edge/complex systems. Gitlab's
handbook is maintained in a Git repo and
anyone can open a pull request. While most of the contributions come within the
company, there have been
outside contributions. For
more information on Gitlab's approach consider the following:
But Markdown/Git is too hard🔗
Can non programmers learn Markdown and Git? This is a good question and not sure
I have the answer yet, but I think they can. In many ways, learning a dozen
elements of the Markdown syntax is simpler than navigating the complex menu
trees of Word trying to figure things out. Git does have a learning curve, but
the essence of it is fairly simple. Once you understand the basic concepts and
operations, it is as natural as anything else.
Humans are inherently lazy, so most continue to just do what they have always
done. Additionally, not all people are intrinsically motivated to share and
collaborate. Probably the most important thing is to establish an organizational
culture where the people at the top set the example, and ask others to follow.
As much as we don't like to compare ourselves to sheep, most of us resemble them
more than we like to admit -- we don't like to be driven, but will gladly follow
the lead if it makes sense to us. Telling people to collaborate and use certain
tools will not work if the people at the top of an organization are not doing
the same.