From c2dff1d2272f3da94406dcb7ae17d6b5f903e765 Mon Sep 17 00:00:00 2001 From: Daniel Nichter Date: Mon, 2 Nov 2015 14:00:01 -0800 Subject: [PATCH] Document problem LP 1389041: pt-table-checksum has high likelyhood to skip a table when row count is around chunk-size * chunk-size-limit. --- bin/pt-table-checksum | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/bin/pt-table-checksum b/bin/pt-table-checksum index 5adf6e93..6386a55c 100755 --- a/bin/pt-table-checksum +++ b/bin/pt-table-checksum @@ -12703,11 +12703,22 @@ exists on all replicas, printing L<"--progress"> messages while it waits. =item 3. Single chunk size If a table can be checksummed in a single chunk on the master, -pt-table-checksum will check that the table size on all replicas is -approximately the same. This prevents a rare problem where the table -on the master is empty or small, but on a replica it is much larger. -In this case, the single chunk checksum on the master would overload -the replica. This check cannot be disabled. +pt-table-checksum will check that the table size on all replicas is less than +L<"--chunk-size"> * L<"--chunk-size-limit">. This prevents a rare problem +where the table on the master is empty or small, but on a replica it is much +larger. In this case, the single chunk checksum on the master would overload +the replica. + +Another rare problem occurs when the table size on a replica is close to +L<"--chunk-size"> * L<"--chunk-size-limit">. In such cases, the table is more +likely to be skipped even though it's safe to checksum in a single chunk. +This happens because table sizes are estimates. When those estimates and +L<"--chunk-size"> * L<"--chunk-size-limit"> are almost equal, this check +becomes more sensitive to the estimates' margin of error rather than actual +significant differences in table sizes. Specifying a larger value for +L<"--chunk-size-limit"> helps avoid this problem. + +This check cannot be disabled. =item 4. Lag