PostgreSQL Vacuum Test
28 Dec 2015; updated 20 Aug 2017
Auto vacuuming definitely needs to be tuned on any heavily updated PostgreSQL database. 2nd Quadrant has written two great blog entries about autovacuum tuning that I had to put to the test.
Based on prior experience, I had found autovacuuming never agressive enough,
so I would just set up cron jobs to manually vacuum our most heavily updated
tables. However, it seems I may not have been giving autovacuum_vacuum_cost_limit
a fair shake.
I encountered a work situation with a table that was always getting behind on vacuuming. The table had about 30 million rows, and was updated very heavily. I decided to mock up and test the situation on my local laptop.
I had a custom-compiled PostgreSQL already on my laptop, similar to this.
I did not want to
wear out my internal SSD, so for my experiment, I used two external USB
spinning drives: one for WAL, and one for tables. (Note how I mount the drives
with noatime
so that every read is not also a write.)
root@mwood-ThinkPad-X250:~# mount /dev/sdc1 /mnt/wal -o noatime root@mwood-ThinkPad-X250:~# mount /dev/sdb1 /mnt/data -o noatime root@mwood-ThinkPad-X250:~# mkdir -p /mnt/wal/wal/pg_xlog root@mwood-ThinkPad-X250:~# mkdir /mnt/data/data root@mwood-ThinkPad-X250:~# chown -R postgres:postgres /mnt/wal/wal root@mwood-ThinkPad-X250:~# chown -R postgres:postgres /mnt/data/data root@mwood-ThinkPad-X250:~# chmod 700 /mnt/wal/wal root@mwood-ThinkPad-X250:~# chmod 700 /mnt/wal/wal/pg_xlog root@mwood-ThinkPad-X250:~# chmod 700 /mnt/data/data
I also set the external HDDs to write to their platters right away on fsync so that I would have durability similar to a real server environment:
root@mwood-ThinkPad-X250:~# hdparm -W0 /dev/sdb root@mwood-ThinkPad-X250:~# hdparm -W0 /dev/sdc
I then stopped my postgres and set up a new cluster on my exernal data hard drive and put the WAL directory on my OTHER external hard drive:
root@mwood-ThinkPad-X250:~# systemctl stop postgresql.service root@mwood-ThinkPad-X250:~# mkdir /var/run/postgresql root@mwood-ThinkPad-X250:~# chown postgres:postgres /var/run/postgresql root@mwood-ThinkPad-X250:~# su - postgres postgres@mwood-ThinkPad-X250:~$ /usr/local/pgsql-9.2.14/bin/initdb \ --pgdata=/mnt/data/data \ --xlogdir=/mnt/wal/wal/pg_xlog \ --encoding=UTF8 \ --no-locale ... Success. You can now start the database server using: /usr/local/pgsql-9.2.14/bin/postgres -D /mnt/data/data or /usr/local/pgsql-9.2.14/bin/pg_ctl -D /mnt/data/data -l logfile start
I then edited my postgresql.conf
...
postgres@mwood-ThinkPad-X250:~$ cd /mnt/data/data/ postgres@mwood-ThinkPad-X250:/mnt/data/data$ vim postgresql.conf
...and changed autovacuum_vacuum_cost_limit
to its
maximum allowable setting:
autovacuum_vacuum_cost_limit = 10000
Now it was time to run PostgreSQL:
postgres@mwood-ThinkPad-X250:/mnt/data/data$ /usr/local/pgsql-9.2.14/bin/postgres -D /mnt/data/data
Meanwhile, in another terminal, I created a table and filled it with 30 million rows:
mwood@mwood-ThinkPad-X250:~$ psql -U postgres -d postgres postgres@[local]:5432/postgres# create table things (id bigint not null, data text not null); postgres@[local]:5432/postgres*# commit; postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(1, 10000000) as id, md5(random()::text) as data; commit; postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(10000001, 20000000) as id, md5(random()::text) as data; commit; postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(20000001, 30000000) as id, md5(random()::text) as data; commit;
I saw that my 30 million row table was spread across 3 files:
root@mwood-ThinkPad-X250:~# ls -l /mnt/data/data/base/12040/16384* -rw------- 1 postgres postgres 1073741824 Aug 16 12:24 /mnt/data/data/base/12040/16384 -rw------- 1 postgres postgres 1073741824 Aug 16 12:26 /mnt/data/data/base/12040/16384.1 -rw------- 1 postgres postgres 149340160 Aug 16 12:26 /mnt/data/data/base/12040/16384.2 -rw------- 1 postgres postgres 581632 Aug 16 12:26 /mnt/data/data/base/12040/16384_fsm
I built indexes to let updates go quickly:
postgres@[local]:5432/postgres# set maintenance_work_mem = '8GB'; commit; postgres@[local]:5432/postgres# alter table things add constraint things_pk primary key (id); commit;
And then I tuned the table to be vacuumed very aggressively. I besically told autovacuum to vacuum this table as soon as it had 5000 or more dead tuples:
postgres@[local]:5432/postgres# alter table things set (autovacuum_vacuum_scale_factor = 0.0); postgres@[local]:5432/postgres*# alter table things set (autovacuum_vacuum_threshold = 5000); postgres@[local]:5432/postgres*# alter table things set (autovacuum_analyze_scale_factor = 0.0); postgres@[local]:5432/postgres*# alter table things set (autovacuum_analyze_threshold = 5000); postgres@[local]:5432/postgres*# commit;
For good measure, I also cranked up the statistics that would be kept for such a large table. Even though this is outside the scope of "can autovacuum keep up?", it's something I did to mimick a real-world situation where one would want to keep more statistics on such a large table:
postgres@[local]:5432/postgres# alter table things alter column id set statistics 10000; postgres@[local]:5432/postgres*# commit;
I double-checked the autovacuum settings from within my psql
session:
postgres@[local]:5432/postgres# show autovacuum_naptime; rollback; ┌────────────────────┐ │ autovacuum_naptime │ ├────────────────────┤ │ 1min │ └────────────────────┘ (1 row) postgres@[local]:5432/postgres# show autovacuum_vacuum_cost_limit; rollback; ┌──────────────────────────────┐ │ autovacuum_vacuum_cost_limit │ ├──────────────────────────────┤ │ 10000 │ └──────────────────────────────┘ (1 row)
I remembered that I had set maintenance_work_mem higher for my particular session, to speed up building of indexes, so I reset the value to its default to ask the server what it thought the configured value was:
postgres@[local]:5432/postgres# set maintenance_work_mem to default; commit; postgres@[local]:5432/postgres# show maintenance_work_mem; rollback; ┌──────────────────────┐ │ maintenance_work_mem │ ├──────────────────────┤ │ 1GB │ └──────────────────────┘ (1 row)
This would be a good time to note that the laptop I ran this experiment on
had 16GB of RAM, so I was tuning maintenance_work_mem
quite
agressively, knowing that autovacuum could use the extra RAM while doing its
housekeeping.
I now had all settings primed for a large, heavily-updated table. Would PostgreSQL be able to keep up?
Here was the starting state:
postgres@[local]:5432/postgres# \x Expanded display is on. postgres@[local]:5432/postgres# select reltuples as rough_row_count, pg_stat_get_live_tuples(c.oid) as n_live_tup, pg_stat_get_dead_tuples(c.oid) as n_dead_tup from pg_class as c where relname = 'things'; rollback; ┌─[ RECORD 1 ]────┬──────────┐ │ rough_row_count │ 3e+07 │ │ n_live_tup │ 29387222 │ │ n_dead_tup │ 0 │ └─────────────────┴──────────┘
Here is a little Go program I named pg-pummel
, which would
randomly update any of the 30 million rows in my test table, at random.
The purpose is to create lots of dead tuples.
package main import ( "fmt" "math/rand" "os" "github.com/jackc/pgx" ) func main() { pummel() } func pummel() { var id int var runtimeParams map[string]string runtimeParams = make(map[string]string) runtimeParams["application_name"] = "pummel" connConfig := pgx.ConnConfig{ User: "postgres", Password: "postgres", Host: "localhost", Port: 5432, Database: "postgres", TLSConfig: nil, UseFallbackTLS: false, FallbackTLSConfig: nil, RuntimeParams: runtimeParams, } conn := connect(connConfig) defer conn.Close() prep(conn) // just throw updates at the database in as tight a loop as possible for { // random int, 1..30,000,000 // Our test table has 30 million rows, // so randomly update any of those. id = rand.Intn(30000000) + 1 update(conn, id) } } func connect(connConfig pgx.ConnConfig) (conn *pgx.Conn) { conn, err := pgx.Connect(connConfig) if err != nil { fmt.Fprintf(os.Stderr, "Unable to establish connection: %v", err) os.Exit(1) } return conn } func prep(conn *pgx.Conn) (ps *pgx.PreparedStatement) { sql := "update things set data = md5(random()::text) where id = $1" ps, err := conn.Prepare("foo", sql) if err != nil { fmt.Fprintf(os.Stderr, "Unable to prepare statement: %v", err) os.Exit(1) } return ps } func update(conn *pgx.Conn, id int) { _, err := conn.Exec("foo", id) if err != nil { fmt.Fprintf(os.Stderr, "Unable to update: %v", err) os.Exit(1) } }
I started running pg-pummel
!
top
and vmstat
showed the system was IO busy,
not CPU busy, while running the updates.
In my psql
session, I kept re-running this query,
to see if n_dead_tup
would keep rising forever,
or of the autovacuumer could keep up.
postgres@[local]:5432/postgres# select reltuples as rough_row_count, pg_stat_get_live_tuples(c.oid) as n_live_tup, pg_stat_get_dead_tuples(c.oid) as n_dead_tup from pg_class as c where relname = 'things'; rollback; ┌─[ RECORD 1 ]────┬─────────────┐ │ rough_row_count │ 2.99997e+07 │ │ n_live_tup │ 29999727 │ │ n_dead_tup │ 9320 │ └─────────────────┴─────────────┘
Every so often, I see a vacuum kick in:
postgres@[local]:5432/postgres# select reltuples as rough_row_count, pg_stat_get_live_tuples(c.oid) as n_live_tup, pg_stat_get_dead_tuples(c.oid) as n_dead_tup from pg_class as c where relname = 'things'; rollback; ┌─[ RECORD 1 ]────┬─────────────┐ │ rough_row_count │ 2.99964e+07 │ │ n_live_tup │ 29996438 │ │ n_dead_tup │ 41 │ └─────────────────┴─────────────┘
That's exactly the result I was hoping for!
Of course, then I reworked pg-pummel
to throw 4 updating goroutines at
PostgreSQL, updating even more quickly.
This started to appear in the PostgreSQL logs:
LOG: using stale statistics instead of current ones because stats collector is not responding
And long story short, I overwhelmed my PostgreSQL. But vacuuming wasn't the issue here: it was the whole system not keeping up with the onslaught.
Bottom line, though, is that autovacuum_vacuum_cost_limit
is an
incredibly useful tunable for heavily updated systems! It is not enough to just
tune autovacuum settings on individual, heavily-updated tables. One also has
to tell PostgreSQL's autovacuumer to do more work in the first place.