Suggestions on database model design/ performance

Posted on

Question :

I’m currently building an ‘online examination’ type of web app using Ruby on Rails and PostgreSQL (or MySQL), which is going to be part of a bigger application once pushed to production. Since I just recently got into RD design I’d appreciate some feedback on the following design.

App Specifications

  • There are 1000+ different questions. Each question can have 2-5
    different options, only 1 option is correct.
  • The user is presented with 10 questions at a time.
  • If a user answers a question correctly 2 times, the question will not
    be shown to him again (let’s call those questions now completed).
  • If a user answers a question incorrectly then he needs to answer the
    question 2 MORE times correctly.
  • The user passes the exam when there are no more questions left to

My current database schema is:

-name etc

-correct_id (or option_id, stores the correct option id)



So in order to create a ’10 question test’ we could:

  1. create a list of every completed question querying the user_answer table

  2. create a list of possible questions (all questions – completed questions)

  3. finally serve the user with 10 random questions from the last list.

Is there a better approach to represent “the remaining times a question needs to be answered” inside the database? Please feel free to suggest even a completely different design.

What concerns me is that for each single user there would be more than 2×1000 user_answer rows. This quickly becomes a huge number, so I guess there must be a better way.

Would using a json array be okay in this case? But as far as I understand it’d quickly become a hassle to add more questions etc.

Answer :

As you are at the DBA site, it is likely we will give you database solutions 🙂

As far as I can see, you are stuck with your two options – a row for every users answer or a large list. Best bet in terms of database design is to go for a row for every users answer and to make the columns as narrow as possible.

My solution to this would be to change your field “is_correct” in the “user_answer” table to “answer_status” and make it a tinyint. If the user gets the answer wrong, this is set to 0. If the user gets the answer right, it is incremented by 1. For displaying non correctly answered questions, you would scan for all questions where “answer_status” is < 2. Now the column would represent:

  • 0 = question not yet answered or answered incorrectly
  • 1 = question answered correctly once
  • 2 = question answered correctly

When the user starts their exam, you will make an entry for every question in the question table for that user, with the default answer_status = 0.

If your “question_id” column and “user_id” column are both ints, you will store 9 bytes per row – which means if you use 100MB for this table, you would be storing ~119 million rows. If 10,000 rows per user, you will be storing data for ~12,000 users. You would most likely want to cluster this table on “user_id”.

In addition, you could clean up the “user_answer” table by setting a flag in a “user_exam” table. It is likely you will need this table to encapsulate the questions into an exam and to list which exams users are enrolled for. This table could have a “status” field which could be set to “Completed” when a user has passed the exam. Subsequently all of the rows in the “user_answer” table can be deleted for that exam, for that user, once they have completed all the answers (save possibly getting some statistics like length of time taken). This would reclaim space in your database when the tables clustered index is rebuilt.

You can also see below web page on online examination database design help .

Leave a Reply

Your email address will not be published. Required fields are marked *