Size Requirement of SQL Database

Posted on

Question :

I have been asked this question in an interview that how much would be the DB size for a standalone SQL server if 650MB of data is processed every month. I couldn’t give any answer to this question in my interview and later on didn’t find any answer either. Any help would be highly appreciated.

Answer :

This is an open-ended interview question that doesn’t have a precise answer. The interviewer might expect you to ask additional questions for clarity.

Actual space requirements will depend on what is meant by “processed” (inserted?), data retention (are data ever deleted?), column data types, indexes, storage overhead, use of compression features (including columnstore), plus additional free space for routine maintenance.

These kind of questions are stupid interview questions in my opinion, and definitely need more context. Perhaps they just want to see your thinking and what follow up questions you’d ask for clarity.

If by processed they mean 650 MB of data is added every month (and even here would be an assumption that the added 650 MB is data + all indexes + any other supplemental objects consuming space), and there’s no other context given so we have to assume they mean purely the database’s total data file sizes (not the Transaction Log files or anything else – since those would depend on unknown variables like recovery models and backup frequency) then the size of the database is 650 MB times the number of months it’s been online.

As discussed in the comments: In the simplest example, if a brand new database is created with Full Recovery Model, it has no indexes (or anything else that would cause duplication of data), and 650 MB of data is written to it. At that moment in time time about 1.3 GB of disk space is currently being consumed, 650 MB for the data and 650 MB for the transaction that inserted the data into the database and is now sitting in the Transaction Log. Because they didn’t provide any of those additional details in my most simplest of examples, you can only assume they mean the raw data at best, which is 650 MB at this point in time.

But you can see how variable it can get fast, even in a simple example. Even in that example I assume by the word processed they mean added. But processed could mean 350 MB were added and then 300 MB were deleted. Now the database size (for just the raw data) is only 50 MB. In my previous example, the total disk space consumed would be about 700 MB because there is 650 MB of transactions that occurred and sitting in the transaction log plus an additional 50 MB of actual data in the database at this point in time.

If we knew more information such as how often the Transaction Log is backed up, then that might change the answer above as well. For example, if Transaction Log backups occurred every 5 minutes, and if 350 MB of data was inserted into that same example database above then 7 minutes later 300 MB was deleted, the total space consumed on disk is currently 350 MB. 50 MB of actual data + 300 MB currently sitting in the Transaction Log for the deletion operation. The initial insert operation’s transaction is gone at this point in time because after a backup of the Transaction Log occurs, the transactions get flushed from it, and the space they were previously consumed gets re-used by new transactions.

So unfortunately there’s a lot of ways this question can be interpreted, and it’s best to show your knowledge in the interview by asking the right follow up questions such as “Are we talking only database file data or Transaction Log data as well?”, “What kind of Recovery Model is in place?” (Simple vs Full Recovery Model will result in a different amount of data being stored in the Transaction Log), “What type of backups are scheduled and how often are they ran?”, “Does the database have indexes, and what sizes do they consume for 650 MB of database data?” etc. If the interviewer knows what they’re doing then those kind of follow up questions will show them it’s not the actual answer to the question that matters, because you demonstrated your competency by asking what is necessary to know to be able to solve that kind of question.

To your last question on increasing the available Disk Space and Memory on the fly: yes it’s definitely possible to add more Disk Space and Memory on the fly, and even while keeping the database online. The way to actually do it will depend on your server host, whether it’s an on-prem server or in the cloud, a physical server or a VM, and the providers of each when applicable.

Finally here are some blogs I find are really helpful (in no particular order):

  1. Josh Darnell’s Blog
  2. Brent Ozar Team’s Blog
  3. Erik Darling’s Blog
  4. Aaron Bertrand’s Blog on
  5. Aaron Bertrand’s Other SQL Blog
  6. Kendra Little’s Blog
  7. RedGate’s Simple Talk Blog

There is no single right answer to a vague question like this, and the interviewer isn’t expecting one (if they know anything themselves of course). They are looking to test how you realise the question is too vague and how you might start trying to answer such a question in a real situation – to see if you understand what extra details would be needed and how you might go about asking for or finding those details from the documentation, dev team, or by inspecting an existing database.

The only wrong answers are no answer (including a straight “I don’t know”) or the naive one (650MB * months) without any caveats.

I would probably answer a question like this with something like

The naive answer would be to just multiply that 650Mb by the number of months we expect the service to run, assuming that amount includes index growth & other side costs, and that we are keeping all the incoming data each month, but to give a proper estimate we need to know how much of the processed data is new, or updates for existing data, or to be discarded as it is already present but being resent by an external system, and we’d need to know more detail of how the data is stored and indexed which would have significant effects on the storage requirements. Also how the data is processed will add other temporary storage needs due to work tempdb and transaction log growth.

That tells them you know the question is vague, that you know what the naive and probably incorrect answer is (so you’d recognise such an answer coming from elsewhere), and that you know the sort of detail needed in order to progress towards a useful estimate. They may then ask you to expand on some of those points, or ask you more specific questions branching of from what you’ve said.

Leave a Reply

Your email address will not be published. Required fields are marked *