Rationale
Postgres databases can be backed up using the bundled pg_dump utility. This can create text based dumps, or binary dumps. Text based dumps are easier to read and change, but they are rather big, and don’t allow for reordering data or objects. This can be a real show stopper with restoring databases that contain triggers or foreign keys. In short, binary dump are the way to go.
If the contents of a database do not change, subsequent text based dumps are 100% the same. This allows for conditional backups (see below). However, subsequent binary dumps are different, even if nothing has changed. The Postgres developers told me there is a timestamp in the binary dumps. It is not possible to configure pg_dump to not add this timestamp. If I knew where exactly this timestamp was located in the file, I could exclude that part and do a diff on the rest of the file.
Dissecting a binary dump
By combining a hex editor, the pg_restore utility, and its C sourcecode I was able to make a nice map of the pg_dump file structure and exactly locate the timestamp. I took a real world postgres dump as an example, ‘listmanager.pgdump’. A summary can be printed like this:
Rationale
Postgres databases can be backed up using the bundled pg_dump utility. This can create text based dumps, or binary dumps. Text based dumps are easier to read and change, but they are rather big, and don’t allow for reordering data or objects. This can be a real show stopper with restoring databases that contain triggers or foreign keys. In short, binary dump are the way to go.
If the contents of a database do not change, subsequent text based dumps are 100% the same. This allows for conditional backups (see below). However, subsequent binary dumps are different, even if nothing has changed. The Postgres developers told me there is a timestamp in the binary dumps. It is not possible to configure pg_dump to not add this timestamp. If I knew where exactly this timestamp was located in the file, I could exclude that part and do a diff on the rest of the file.
Dissecting a binary dump
By combining a hex editor, the pg_restore utility, and its C sourcecode I was able to make a nice map of the pg_dump file structure and exactly locate the timestamp. I took a real world postgres dump as an example, ‘listmanager.pgdump’. A summary can be printed like this:
pg_restore -l listmanager.pgdump | head -12 ; ; Archive created at Thu Aug 23 12:12:03 2007 ; dbname: listmanager ; TOC Entries: 71 ; Compression: 9 ; Dump Version: 1.10-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 8.2.4 ; Dumped by pg_dump version: 8.2.4
The following table is a bit representation of the first part (header) of the dump file. Each row is 8 bytes, and each byte is represented by its hexadecimal, decimal, and ASCII value.
If you take a close look, you should be able to spot most of the fields:
[EXEC]echo p(month)[/exec]000-
[EXEC]echo p(pad)[/exec]00
0
–
[EXEC]echo p(year)[/exec]6B
107
–
[EXEC]echo p(year)[/exec]00
0
–
[EXEC]echo p(year)[/exec]00
0
–
[EXEC]echo p(year)[/exec]00
0
–
[EXEC]echo p(pad)[/exec]00
0
–
[EXEC]echo p(dst)[/exec]01
1
-[EXEC]echo p(dst)[/exec]000-
[EXEC]echo p(dst)[/exec]00
0
–
[EXEC]echo p(dst)[/exec]00
0
–
[EXEC]echo p(pad)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]0B
11
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(conn)[/exec]6C108l
[EXEC]echo p(conn)[/exec]69
105
i
[EXEC]echo p(conn)[/exec]73
115
s
[EXEC]echo p(conn)[/exec]74
116
t
[EXEC]echo p(conn)[/exec]6D
109
m
[EXEC]echo p(conn)[/exec]61
97
a
[EXEC]echo p(conn)[/exec]6E
110
n
[EXEC]echo p(conn)[/exec]61
97
a
[EXEC]echo p(conn)[/exec]67103g
[EXEC]echo p(conn)[/exec]65
105
e
[EXEC]echo p(conn)[/exec]72
114
r
[EXEC]echo p(pad)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]05
5
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(pg_remote)[/exec]38568
[EXEC]echo p(pg_remote)[/exec]2E
46
.
[EXEC]echo p(pg_remote)[/exec]32
50
2
[EXEC]echo p(pg_remote)[/exec]2E
46
.
[EXEC]echo p(pg_remote)[/exec]34
34
4
[EXEC]echo p(pad)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]05
5
–
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(nxt)[/exec]000-
[EXEC]echo p(nxt)[/exec]00
0
–
[EXEC]echo p(pg_local)[/exec]38
56
8
[EXEC]echo p(pg_local)[/exec]2E
46
.
[EXEC]echo p(pg_local)[/exec]32
50
2
[EXEC]echo p(pg_local)[/exec]2E
46
.
[EXEC]echo p(pg_local)[/exec]34
34
4
[EXEC]echo p(pad)[/exec]00
0
–