We run Slony 2.0 with Postgres 8.4 on two CentOS 6 servers–one master, and one slave. Our database is about 30GB in size, which isn’t unusual, but we do have a couple of tables that are more than 5GB each.
Recently, we needed to re-build our Slony cluster. I turned off Slony, restored identical database snapshots on the master and the slave, set up my slony.conf and slon_tools.conf, started the slons, ran
slonik_init_cluster | slonik, then
slonik_create_set 1 | slonik (we only have one replication set), and finally
slonik_subscribe_set 1 2 | slonik. Everything looked good, and I was able to watch subscription progress in the logs.
Then the server stopped responding. I rebooted it, and saw “Kernel panic – not syncing: Out of memory and no killable processes” after it had killed everything it could.
What I’ve tried:
First I blew away the database completely, re ran
initdb, and then restored the identical snapshots again. Same kernel panic. Then I blew it away, uninstalled Postgres and Slony, and reinstalled them. I double-checked all of our memory-based settings in postgresql.conf, and they are all at stock/recommended levels (i.e.
shared_buffers is at 1/4 of RAM etc etc). I ran a
VACUUM ANALYZE FULL on the database before initializing the Slony cluster. Same result every time: kernel panic, out of memory.
There’s no chance of random/manual config changes having caused this: all of our Postgres and Slony configuration is managed by Puppet, and hasn’t changed for months.
Why is this happening?
Our database has grown fairly linearly over the past few months (at the beginning of the year it was about 23GB, now it’s 30), and every other time I have had to re-initialize the Slony cluster on these same servers, it has worked fine.
The problem turned out to be unrelated: in
/etc/sysctl.conf, the system’s
shmmax value was set to an amount greater than the available RAM.
Setting it to 60% of RAM (our DB consultant’s recommendation) solved the problem.
Why this issue didn’t crop up before is a mystery to me.