Diffing Tables in PostgreSQL
I sometimes have to diff two tables in PostgreSQL. My time-honored way is
to use except
:
First, I'll create some temporary tables to play with. Note the
use of table
as a command that's syntactic sugar for select * from
.
create temporary table t1 (a int not null, b int not null); create temporary table t2 (a int not null, b int not null); insert into t1 (a, b) values (1, 2), (3, 4), (5, 6); insert into t2 (a, b) values (3, 4), (5, 6), (7, 8); table t1; ┌───┬───┐ │ a │ b │ ├───┼───┤ │ 1 │ 2 │ │ 3 │ 4 │ │ 5 │ 6 │ └───┴───┘ (3 rows) table t2; ┌───┬───┐ │ a │ b │ ├───┼───┤ │ 3 │ 4 │ │ 5 │ 6 │ │ 7 │ 8 │ └───┴───┘ (3 rows)
Here's how I would usually diff the tables: use except
to get the rows in t1
that were not in t2, and then again to get the rows in t2 that were not in t1.
except
.
table t1 except table t2; ┌───┬───┐ │ a │ b │ ├───┼───┤ │ 1 │ 2 │ └───┴───┘ (1 row) table t2 except table t1; ┌───┬───┐ │ a │ b │ ├───┼───┤ │ 7 │ 8 │ └───┴───┘ (1 row)
Some tips at The jooq blog
go way further. First, why not combine our efforts using union
?
(table t1 except table t2) union (table t2 except table t1); ┌───┬───┐ │ a │ b │ ├───┼───┤ │ 7 │ 8 │ │ 1 │ 2 │ └───┴───┘ (2 rows)
Getting fancier, it might be nice to order the returned rows:
(table t1 except table t2) union (table t2 except table t1) order by a, b; ┌───┬───┐ │ a │ b │ ├───┼───┤ │ 1 │ 2 │ │ 7 │ 8 │ └───┴───┘ (2 rows)
The jooq blog shows us something fancier using natural full join
.
It's fancier in two ways: 1) it uses natural full join
to join on any
identically-named columns from both tables (in this case, all of the columns, because
our test tables have the same definitions), and 2) it only does a full table scan of
t1 once and t2 once, rather than twice for each table as with the previous methods
we were using.
First, let's look at the result of a natural full join where we purposefully add a column specific to each table we are comparing:
select * from (select t1.*, 't1' as tbl_t1 from t1) as t1 natural full join (select t2.*, 't2' as tbl_t2 from t2) as t2; ┌───┬───┬────────┬────────┐ │ a │ b │ tbl_t1 │ tbl_t2 │ ├───┼───┼────────┼────────┤ │ 1 │ 2 │ t1 │ [NULL] │ │ 3 │ 4 │ t1 │ t2 │ │ 5 │ 6 │ t1 │ t2 │ │ 7 │ 8 │ [NULL] │ t2 │ └───┴───┴────────┴────────┘ (4 rows)
Nifty. Of course, we only want the rows that were different, so we would do:
select * from (select t1.*, 't1' as tbl_t1 from t1) as t1 natural full join (select t2.*, 't2' as tbl_t2 from t2) as t2 where tbl_t1 is null or tbl_t2 is null order by a, b; ┌───┬───┬────────┬────────┐ │ a │ b │ tbl_t1 │ tbl_t2 │ ├───┼───┼────────┼────────┤ │ 1 │ 2 │ t1 │ [NULL] │ │ 7 │ 8 │ [NULL] │ t2 │ └───┴───┴────────┴────────┘ (4 rows)
Nice. But BE WARE: once your tables have nulls in them, except
will
work more reliably, because natural full join
will not join
identical rows in both tables when those identical rows contain nulls; because null does
not equal null. except
will still work, though.