Which database to use? [closed]

Posted on

Question :

Hey there. I am looking for a free to use for commercial website database that should have these design features, in the given order:

  1. Auto-Sharding (getting more free space by adding more server nodes)
  2. high read performance
  3. data larger than available memory (so, persistent of course)
  4. New inserts should be available to read/update in real time.

Features that would be very handy, but not necessary:

  1. In-Place-Updates like in MongoDB because the columns 1 to 7 never change in size. And the column 0 never changes at all.
  2. Handy would be something where I can query slave nodes too. Not just masters. To achieve more concurrent operations (especially reads).

Non-relational, relational, does not matter.

I am going to have some billions of entries that look like this:

    0: (64 or 32, not decided yet) byte string,
    1: 64 bit integer,
    2: 64 bit integer,
    3: 64 bit integer,
    4: 64 bit integer,
    5: 64 bit integer,
    6: 64 bit integer,
    7: 64 bit integer

The only key/index will be “0” (primary index in SQL, or the key in NoSQL databases).

Queries will look something like these:

  • SELECT * WHERE 0=’string’
  • UPDATE 3=int, 6=int, 7=int WHERE 0=’string’
  • DELETE WHERE 0=’string’
  • INSERT {0: ‘string’, 1=int, 2=int, 3=int, 4=int, 5=int, 6=int, 7=int}

No other queries (like range, etc).

For every 1,000 operations there will be:

  • 600 reads (not sequential)
  • 200 updates (not sequential)
  • 150 inserts (bulk if possible)
  • 50 deletes (bulk if possible)

Thank you for your answers!

Answer :

A few db’s that come close:

1: Postgres-XC, meets most areas except that sharding takes some design effort

2: VoltDB, meets most areas except for data larger than memory

The key question though is why do you need the autosharding? Anything automatic adds complexity and if you are big enough to require sharding, you can probably afford to have the experts look carefully at how best to do this (i.e. via Postgres-XC).

Honestly, I think the best option for most people in your position is to start with PostgreSQL and plan on scaling up and out with bigger servers and Postgres-XC when you get to the point that you need this. But there are a lot of big systems out there that run on PostgreSQL, including Skype and Instagram, so it certainly can do web scale even without sharding.

Leave a Reply

Your email address will not be published. Required fields are marked *