Documenting Database Design


Documenting database design just seems to be one of those things that web developers tend to suck at.

I have some theories as to why:

If done before coding, it becomes quickly out of date. If done after coding, it probably never gets done at all.

The odds are further stacked against web developers since conventions and techniques for documenting databases rarely gets brought up, ever.

All the focus goes into the relationships, modeling, and queries.

Garbage In, Garbage Out

The first step to having good documentation is having good database design in the first place.

If there is a convoluted design, expect convoluted documentation. In fact, expect convoluted data models and business logic in the application itself, too.

Be ruthless about:

Where the Pristine Runs Afoul

The design always seems to start off solid, but as development progresses, corners get cut, and before you know it, those pristine tables are full of little hacks that were thrown in last minute.

Sometimes it’s the removal of no-longer-needed columns, rendering a once-useful table into something no longer needed itself.

You then end up with this leftover.. thing, that keeps getting populated by the application. It gets even worse when its sole purpose in life becomes holding up foreign key constraints between other tables.

Other times, ultra-specific, awkwardly-named columns get added to an existing table due to convenience, which often turns into an eyebrow raiser for the next developer that has to figure out what the column does, and why it’s in that table.

In worst cases, flag or enum-type columns get added to allow rows to have their own state, effectively denormalizing the table.

And somewhere in the spacetime continuum, or perhaps outside of it, the gods weep.

It’s these sort of last minute, weird, stupid things that fouls up pristine database design, and makes documentation that more difficult and verbose. To this, I say: always refactor, no matter the risk.

What to Document

Moving along, I find those following the most important to document:

Sometimes you will be stuck with weird columns or tables that are not self-evident as to what they are for. This is what should be documented.

Redundantly documenting anything that is already self-evident adds noise and buries the tasty morsels that readers should be drooling over.

This applies to code, for that matter.

Table Descriptions

Expanding on the table descriptions part, provide a summary for each table:

Shut up About the Application

I don’t believe in going into detail about how the data may power things on the user interface, or what controllers use the data models, and so forth.

The database design should stand on its own, regardless of what application, services, or views are using it.

Diagrams

ER diagrams really helps one to see how everything fits together at a glance.

If there are a lot of tables, I find that it helps to break up the tables into clusters which share foreign key relationships, rather than provide a single overview that is not even readable.

Always generate ER diagrams, don’t even consider making one by hand for a second. For MySQL, there is the free MySQL Workbench.

comments powered by Disqus