When to Use a Database (as a Spreadsheet)
Table of Contents
When it comes to storing and retrieving information, paper notes precede the ledger. The ledger precedes the spreadsheet. The spreadsheet precedes the database. Search the internet and you will find numerous comparisons between each of these tools, but which one is the best for you? That's the wrong question to ask.
There is one thing that all of these tools have in common: Keeping track of information (data). In other words, these tools help you store and retrieve information that you use in a variety of cases. Specialists complicate the selection of these tools by using extreme vocabulary to justify the existence of their payroll. It's really not that complicated...
Don't copy data that represents the same thing. Remember this rule. It's that simple. It being the answer to which tool you need to use to manage your information. "But wait, a rule is not a tool!" You are right. The rule isn't a tool. However, the tool you need to use will become obvious once you follow it. Let's see it in action.
The Brain
Your brain contains as much as 2.5 petabytes (2560 terabytes) of storage space. However, there is an arbitrary cost associated with storing information and no guarantee that information is stored correctly. This is fine for the experiences you encounter throughout your life which are rarely — if ever — the same. It's less fine for remembering swathes of structured information. As always, use the right tool for the job.
The Digital Notepad
This tool is the digital equivalent of paper. It allows you to store as many bytes — in as many files as necessary — on the hard drive of your computer. With this tool, you are free to type characters of information without any given structure. However, most people end up using lines and paragraphs to organize their information (data). While this is fine for one-off to-do lists and single submissions, it falls flat once you need to update a character in two places at once.
Example
You are keeping a list of people's names and items they own with Notepad. While most people only maintain a single entry, "Bob" is written on 5 different lines. This becomes problematic when "Bob" decides to change his name to "Jim". Now, you must update the values of all 5 lines manually. Unfortunately, there isn't a simple way to change a word and update it in multiple places at once (without potential issues). It's time to upgrade tools.
The Spreadsheet
A spreadsheet is structured paper. In fact, most spreadsheets can be managed from paper using the Comma Separated Values (CSV) format which specifies how the paper is split into cells. People use these cells to store information by organizing the cells as tables in each "sheet" (of paper). The development of applications such as Excel and Google Sheets even allow you to reference one cell from another. This is in addition to a host of other features such as queries and graphs.
Example
In the previous example, you kept track of a person's name and items the person owned using lines of text. Transferring this information — as is — to a spreadsheet yields the following result.
There are a lot of benefits to storing information in this way. The interface is simple to understand. The structured format makes working with others a piece of cake. Most important is the fact that you CAN reference a cell to a value in another cell. What does that mean? Using another table (for a person’s name), you can update a single "reference" cell to change all of the "value" cells in the ownership table. All it took was a few LOOKUPS!
The Problem
The above format works until you need to add a "reference" to another "referenced" cell. In this example, we are referencing the Ownership Person Name column (which references the People Table) from the Items Query Table. Here is what that looks like.
At this point, you are using a query (i.e QUERY) to reference a formula (i.e VLOOKUP) to reference data “values” (that represent the same things) and synchronize them across tables. Well… Almost. In a spreadsheet, a “reference” simply places values (that are being referenced) in the cells you specify. In other words, the data is simply copied over. So when you decide to use a QUERY “reference” on a VLOOKUP “reference”, the query is being performed on copied data (but not the actual reference). This subtle difference means that you are copying unique data. You are breaking the rule.
In a similar manner to the Notepad, problems don’t reveal themselves until you attempt to modify the data. As a reminder, that example encountered an issue when you needed to change multiple "values" at once. Since there was no way to create a character that referenced a position in the file (also known as another character), updating a person’s name would require multiple manual changes (harming the consistency of your data). The spreadsheet fixed this by providing the option to "reference" a cell to another cell using a formula or query. This worked great up to the point where you started to create indirect references; specifically a QUERY (on the Ownership Table) to a VLOOKUP (on the People Table) from the Item Query Table.
A formula or query (reference) cannot be referenced in a spreadsheet. Sure. You can copy the VLOOKUP formula into the QUERY formula. However, when you change one parameter in one of the VLOOKUP formulas, then you will have to update the other manually. This results in the same issue you experienced as the Notepad, but with formulas instead of characters. That’s because indirect references in a spreadsheet (formula to formula) are always “references” to copied “values”; not the actual reference. As a result, your data can become inconsistent in multiple ways.
How can you see this concept in practice? When you change the Ownership Table formula (reference) in any manner, the underlying "values" will change, but NOT the queries that reference the formula. While this is fine for a number of cases, it falls flat when you modify the VLOOKUP formula and want it to stay consistent with other references (to that formula). In that case, simply moving the location of the original referenced cells (from VLOOKUP) results in the following.
Values that represent the same thing are being copied. It's time for another upgrade.
The Database
The typical upgrade for information stored in a spreadsheet is the relational database. A relational database is common since most people store information that is related to other information. As an example, this article has focused on the relationship (ownership) between an item and a person. However, it should be noted that there is more than one type of database; each with their own advantages and disadvantages. Nonetheless, the rule still applies.
SQL (Structured Query Language) is a language that works with a number of database engines (MySQL, PSQL, etc) to provide structured data storage. Instead of creating tables on sheets, you create tables (that represent objects) with columns (that represent properties; with constraints). Each row represents a single object (from the respective table) which leads to Object Oriented Programming. Let's not get ahead of ourselves…
The issue with the spreadsheet was that references could not be copied. A relational database solves this with the use of foreign keys which can reference specific columns from a specific table. This not only improves your data consistency, but also allows you to adhere to “the rule” at all times. In fact, the rule becomes a real concept known as Database Normalization.
A relational database can store billions of rows without issue. It also helps that these tools are optimized for queries. To put this in perspective, databases are a first class citizen when it comes to web development; powering almost every web application you use. The only downside is that the database "program" is controlled using its language from a command line interface (or using input from another application).
Using a Database as a Spreadsheet
Developments have been made to optimize every aspect of modern databases further; especially the user interface. This has led to the creation of Backend-as-a-Service applications such as Supabase, which streamline the management of a database through a web application (instead of a terminal). Unfortunately, these applications still rely on storing and retrieving data through the most common cases (code). However, recent developments have changed this...
At first glance, the above image looks like a spreadsheet. It is a spreadsheet... but it is powered by a database. That's right. You can use a database as if you were working on a spreadsheet. Thus the title reveals itself: "When do I use a database as a spreadsheet?" Again, wrong question. Instead, ask "Why not both?" Then follow the rule. You will thank me later.