Manni Wood
Senior Software Engineer
Black Duck Software
mwood@blackducksoftware.com
Budgets / business requirements often leave you with a setup that is not as "by the book" as you'd like.
The ideal:
Our reality (a couple of years ago):
Pain points resulting from our reality
WAL file shipping to our mirror kept up under regular load
but
WAL file shipping always got bogged down when autovacuuming began on one of our massive partitions
One of our tables was going to surpass 1.2 billion rows.
Integer primary key would not be enough. Needed to change schema to bigint...
...and all software that was using int had to be converted to use long.
The looming 2.1 billion row limit
Not enough disk to completely re-write our new table using alter table
.
What to do?
Followed a similar pattern for upgrading from 8.3 to 9.0
(Perhaps we should have used pg_upgrade? We were unsure of its readiness for prime time.)
full dump and restore. (The approved way to do database upgrades.)
only a weekend to do dump and restore; the restore itself would be OK, but rebuilding indexes and re-enabling foreign key constraints takes FOREVER.
Phased dump/restore.
Begin dumping large, static partitions early, while db still in operation
After Game Weekend
For continued growth of the db, we needed to master partitions and vacuuming.
How should we partition?
Decided not to go this route: ideal number of partitions now could turn out to not be the ideal number of partitions later.
This is the route we took, because it better matched out work load.
Abandoned clever triggers to auto-create new partitions as needed: they were too slow. Our largest tables are our most active tables.
Instead, we create empty partitions out in front of our business processes, and have cron jobs check how many empty partitions we have left, warning us when too few empty partitions exist.
Clever cron jobs to auto-create partitions ahead of time also had to be abandoned: DDLs would cause odd race conditions, other issues.
Only choice: pick a time to create more empty partitions, shut down business processes, create more partitions, restart business processes.
Up side: this is very quick.
Chances are, you know your busiest tables better than the autovacuumer does.
We have one table that is so active (and so large) that we vacuum it every day.
These vacuums take a few minutes because we do them every night; they used to take hours, and they would grind some things to a halt.
Another more general-purpose cron job runs every three hours, looks at the partitions of our huge tables, picks the partition that is closest to being autovacuumed, and vacuums it manually instead.
Now that we are more proactive (and agressive) about vacuuming, and now that each of our partitions is small enough that it vacuums in reasonable time, our database no longer has those times when it grinds to a halt.
Business need for 8 hour lag on WAL shipping is removed, and we are on 9.0, so we switch to streaming replication to a live mirror!
Weekly snapshot server is changed to all SSDs with loads of RAM! Is now often pressed into service for many read-only workloads that do not need most up-to-minute data.