Introduction to pt-table-sync and pt-table-checksum

In this post I would like to introduce you to a couple of must-have tools from Percona which enable you to validate the table contents between two MySQL databases. This is especially useful in replication setups and as a way to ensure that both master and slave have the exact same data, and if not, to sync them accordingly.

pt-table-checksum: This tool can be used to detect any differences between master and slave, however it is not able to tell which are the differences - only where are they.
Usage example: pt-table-checksum -uusername -ppassword --create-replicate-table --replicate percona.checksums -h master1

pt-table-sync: This tool can be used both to detect differences between master and slave and to fix the differences. Unlike pt-table-checksum, this tool is able to actually understand the difference between the master and slave.
Usage example: pt-table-sync -uusername -ppassword --print --sync-to-master --replicate percona.checksums -h slave1 --databases database1, database2

One common question is "why do you normally run pt-table-checksum before pt-table-sync?". The answer is simply so pt-table-sync does less work! In fact, pt-table-checksum will record in a table (percona.checksums) where there are difference (but it doesn't know what are the differences) and pt-table-sync will analyze and correct only these chunks.

For example: If you have a table with 1M rows and you run pt-table-checksum using chunks of 1k rows each, pt-table-checksum detects that in 3 chunks there are differences. When you then run pt-table-sync it will analyze and correct only these 3 chunks (3k rows) instead of the whole table.

No comments: