class: center, middle # IIDY: Is It Done Yet? --- IIDY is a TODO list that assumes - lists can grow large (millions of items) - lists items are heavily mutated - list items not necessarily completed on first try (number of attemepts must be tracked) --- class: center, middle # DEMO! --- ### Operations on single items: ``` $ curl localhost:8080/lists/downloads/a.txt Not found. $ curl -X PUT localhost:8080/lists/downloads/a.txt $ curl localhost:8080/lists/downloads/a.txt 0 $ curl -X INCREMENT localhost:8080/lists/downloads/a.txt INCREMENTED 1 $ curl localhost:8080/lists/downloads/a.txt 1 $ curl -X DELETE localhost:8080/lists/downloads/a.txt DELETED 1 $ curl localhost:8080/lists/downloads/a.txt Not found. ``` --- ## Performance of using one-at-a-time API: - Network round trip for each mutation - If DB is on a different host, 2 network hops --- class: center, middle # We can do better! --- ### Batch insert ``` $ curl -X BULKPUT --data-binary @- localhost:8080/lists/downloads b.txt c.txt d.txt e.txt f.txt g.txt h.txt i.txt ^D ADDED 8 ``` Perhaps a process decides items to be worked on and does the above. --- ### Batch fetch ``` $ curl -X BULKGET -H "X-IIDY-Count: 2" localhost:8080/lists/downloads b.txt 0 c.txt 0 $ curl -X BULKGET -H "X-IIDY-Count: 2" -H "X-IIDY-After-Item: c.txt" \ localhost:8080/lists/downloads d.txt 0 e.txt 0 $ curl -X BULKGET -H "X-IIDY-Count: 4" -H "X-IIDY-After-Item: e.txt" \ localhost:8080/lists/downloads f.txt 0 g.txt 0 h.txt 0 i.txt 0 ``` Perhaps a process reads batches of items and farms them out to other processes to complete on different hosts. --- ### Batch increment attempts ``` $ curl -X BULKINCREMENT --data-binary @- localhost:8080/lists/downloads b.txt c.txt d.txt e.txt ^D INCREMENTED 4 $ curl -X BULKGET -H "X-IIDY-Count: 100" localhost:8080/lists/downloads b.txt 1 c.txt 1 d.txt 1 e.txt 1 f.txt 0 g.txt 0 h.txt 0 i.txt 0 ``` Perhaps a worker job tries a bunch of items and, in one batch, updates those that have failed. --- ### Batch delete list items ``` $ curl -X BULKDELETE --data-binary @- localhost:8080/lists/downloads d.txt e.txt f.txt g.txt ^D DELETED 4 $ curl -X BULKGET -H "X-IIDY-Count: 100" localhost:8080/lists/downloads b.txt 1 c.txt 1 h.txt 0 i.txt 0 ``` Perhaps a worker complets a bunch of items and, in one batch, removes them from the list. --- class: center, middle # IIDY's API allows batching, but what's happening in the database? --- ## Bulk inserts ### One way: ``` insert into lists (list, item, attempts) values ('downloads', 'a.txt', 0); insert into lists (list, item, attempts) values ('downloads', 'b.txt', 0); insert into lists (list, item, attempts) values ('downloads', 'c.txt', 0); ``` ### Better way Leverage defaults! ``` insert into lists (list, item) values ('downloads', 'a.txt'); insert into lists (list, item) values ('downloads', 'b.txt'); insert into lists (list, item) values ('downloads', 'c.txt'); ``` --- ## Bulk inserts ### Better better way: Leverage repeated values clauses! ``` insert into lists (list, item) values ('downloads', 'a.txt'), ('downloads', 'b.txt'), ('downloads', 'c.txt'); ``` see `func (p *PgStore) Inc(list string, item string) (int64, error)` --- ## Bulk range selects (assume items `a.txt` through `j.txt` exist for 'downloads') ### One way: ``` explain analyze select item, attempts from lists where list = 'downloads' order by item offset 5 fetch next 5 rows only; Limit (cost=0.28..0.40 rows=5 width=10) (actual time=0.039..0.044 rows=5 loops=1) Index Scan using list_pk on lists (cost=0.15..41.90 rows=1700 width=10) (actual time=0.030..0.039 rows=10 loops=1) <= still reads 10 rows! Index Cond: (list = 'downloads'::text) ``` --- ## Bulk range selects ### Better way: ``` explain alalyze select item, attempts from lists where list = 'downloads' and item > 'e.txt' order by item limit 5; Limit (cost=0.15..2.17 rows=1 width=10) (actual time=0.052..0.058 rows=5 loops=1) Index Scan using list_pk on lists (cost=0.15..2.17 rows=1 width=10) (actual time=0.049..0.054 rows=5 loops=1) <= seeks and then reads 5 rows! Index Cond: ((list = 'downloads'::text) AND (item > 'e.txt'::text)) ``` see `func (p *PgStore) BulkGet(list string, startID string, count int) ([]ListEntry, error)` --- ## Bulk updates ### One way: ``` update lists set attempts = 1 where list = 'downloads' and item = 'a.txt'; update lists set attempts = 1 where list = 'downloads' and item = 'b.txt'; update lists set attempts = 1 where list = 'downloads' and item = 'c.txt'; ``` ### Better way: Leverage the fact that we increment existing value by one! ``` update lists set attempts = attempts + 1 where list = 'downloads' and item in ('a.txt', 'b.txt', 'c.txt'); ``` --- ## Bulk updates ### Better better way ``` update lists set attempts = attempts + 1 where list = 'downloads' and item in (select unnest(array['a.txt', 'b.txt', 'c.txt'])); ``` `unnest` turns an array 90 degrees into a 1 column table. If we select that whole "table", PostgreSQL can "join" to that table, which for larger numbers of values, is more efficient. see `func (p *PgStore) BulkInc(list string, items []string) (int64, error)` --- ## Bulk deletes Let's skip to the... ### Better better way ``` delete from lists where list = $1 and item in (select unnest(array['a.txt', 'b.txt', 'c.txt'])); ``` see `func (p *PgStore) BulkDel(list string, items []string) (int64, error)` --- ## Nice side effects of using PostgreSQL this way: - One network round trip per batch - One query for the query planner to plan - Even if AUTOCOMMIT is enabled, only one transaction --- class: center, middle # OTHER DESIGN DECISIONS --- ## Why text/plain? Pros: - nice for demos :-) - usable with `curl` and shell - columns, instead of lists of maps, avoid repeated key listings Cons: - not JSON (widely expected) - not protobuf (more efficient over the wire?) --- ## Why REST? Pros: - nice for demos :-) - usable with `curl` and other HTTP clients - well understood Cons: - REST is not a standard - gRPC better / more efficient? --- ## Future enhancements - Make PostgreSQL connection *way* more configurable - Make listen port configurable - 12-factor app config through env vars - What about items with spaces in their names? --- class: center, middle # QUESTIONS?