Rationale

Postgres databases can be backed up using the bundled pg_dump utility. This can create text based dumps, or binary dumps. Text based dumps are easier to read and change, but they are rather big, and don't allow for reordering data or objects. This can be a real show stopper with restoring databases that contain triggers or foreign keys. In short, binary dump are the way to go.

If the contents of a database do not change, subsequent text based dumps are 100% the same. This allows for conditional backups (see below). However, subsequent binary dumps are different, even if nothing has changed. The Postgres developers told me there is a timestamp in the binary dumps. It is not possible to configure pg_dump to not add this timestamp. If I knew where exactly this timestamp was located in the file, I could exclude that part and do a diff on the rest of the file.

Dissecting a binary dump

By combining a hex editor, the pg_restore utility, and its C sourcecode I was able to make a nice map of the pg_dump file structure and exactly locate the timestamp. I took a real world postgres dump as an example, 'listmanager.pgdump'. A summary can be printed like this:

    pg_restore -l listmanager.pgdump | head -12
    ;
    ; Archive created at Thu Aug 23 12:12:03 2007
    ;     dbname: listmanager
    ;     TOC Entries: 71
    ;     Compression: 9
    ;     Dump Version: 1.10-0
    ;     Format: CUSTOM
    ;     Integer: 4 bytes
    ;     Offset: 8 bytes
    ;     Dumped from database version: 8.2.4
    ;     Dumped by pg_dump version: 8.2.4
    

The following table is a bit representation of the first part (header) of the dump file. Each row is 8 bytes, and each byte is represented by its hexadecimal, decimal, and ASCII value.

If you take a close look, you should be able to spot most of the fields:

50
80
P
47
71
G
44
68
D
4D
77
M
50
80
P
01
1
-
0A
10
-
00
00
-
04
4
-
08
8
-
01
1
-
00
0
-
09
9
-
00
0
-
00
0
-
00
0
-
00
0
-
03
3
-
00
0
-
00
0
-
00
0
-
00
0
-
0C
12
-
00
0
-
00
0
-
00
0
-
00
0
-
0C
12
-
00
0
-
00
0
-
00
0
-
00
0
-
17
23
-
00
0
-
00
0
-
00
0
-
00
0
-
07
7
-
00
0
-
00
0
-
00
0
-
00
0
-
6B
107
-
00
0
-
00
0
-
00
0
-
00
0
-
01
1
-
00
0
-
00
0
-
00
0
-
00
0
-
0B
11
-
00
0
-
00
0
-
00
0
-
6C
108
l
69
105
i
73
115
s
74
116
t
6D
109
m
61
97
a
6E
110
n
61
97
a
67
103
g
65
105
e
72
114
r
00
0
-
05
5
-
00
0
-
00
0
-
00
0
-
38
56
8
2E
46
.
32
50
2
2E
46
.
34
34
4
00
0
-
05
5
-
00
0
-
00
0
-
00
0
-
38
56
8
2E
46
.
32
50
2
2E
46
.
34
34
4
00
0
-
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

From this table you can see that the timestamp is scattered over several fields, but its place is fixed to bytes 18-51 (at least for the architectures that I've checked: i386 and PPC). By skipping this byterange, you can still compare binary postgres dumps.

The script

I've created a shellscript to do conditional backups. This means that a newly created backup is compared to the previous backup. If the contents are the same, the new file is deleted. If they are different, the new file is written over the old one.

Chances are that your file level backup scenario only backs up changed files; if you combine that with this Postgres backup script, then you'll have the most recent backup of each database, but without wasting space. Another added benefit is that you can very quickly see if a database has not been changed for a while, by looking at the modification date of the backups:

    postgres@huge:~$ ls -lth backups/ | cut -d" " -f 5-30
    
    1.6K 2007-09-26 14:28 testing_001.pgdump
    1.1K 2007-09-26 14:28 test.pgdump
     37K 2007-09-26 14:28 gallery2.pgdump
     71M 2007-09-26 14:28 g2_dick.pgdump
    6.2M 2007-09-26 14:22 edna.pgdump
    1.4K 2007-09-26 14:21 globals.dump
    179K 2007-09-25 21:05 listmanager.pgdump
    145K 2007-09-18 21:06 g2_jeroen.pgdump
    1.2K 2007-08-23 21:05 testtesttesttestt.pgdump
     51K 2007-08-22 21:05 postgres.pgdump
    290K 2007-08-22 21:05 listmanager_demo.pgdump
     21K 2007-08-17 21:12 sqlists.pgdump

Download

The script is a simple bash shellscript, available for everyone to use, you can download it here:

fulldump.sh

Comments are welcome!

Todo / misc

I am writing a similar script for MySQL databases.

Conditional Postgres backups

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.