Nothing in Excess: Deciphering Data Architectures by James Serra
In my last post, I talked about the lessons learned from the ancient Greeks. The second inscription on the temple of Apollo in Delphi was “Nothing in Excess.” Yet, inspired to be modern and embracing vendor’s propaganda, it’s my experience that companies tend to overdo their data architectures. I talked about this in this newsletter. Another excellent resource that could help you strike a balance is the James Serra’s latest book “Deciphering Data Architectures.” I was privileged to know James Serra for a while, have him multiple times present to our Atlanta BI Group, and be an early reviewer of this book.
Besides consolidating the wealth of information from his excellent blog, James took a vendor-neutral approach to various patterns. Each pattern describes the architecture followed by pros and cons analysis.
After all, the right architecture and “best practice” is what works for you given your specific circumstances and restrictions. Nothing more, nothing less. And you’ll still be “modern”, whatever that means.
How do I personally avoid excess in my data architectures? By keeping them simple. Although there are exceptions when it makes sense, the general tenants for most implementations are:
- Stage only the data I need instead of staging all source tables. Yes, this requires joining tables at the source, which is what the relational databases are designed to do (usually that’s the source).
- If the data is hosted in relational databases, stage directly to the cloud data warehouse and bypass intermittent staging to a lake. In general, I hate working with files.
- If the data is provided as files, such as reference data or source data downloaded or uploaded as files, stage it to a data lake.
- Avoid REST APIs of cloud vendors like the plague unless there is no other way.
- Do incremental data extraction when necessary (e.g. above a few million rows) and possible.
- Embrace the ELT pattern.