percona-toolkit/docs/user/source/pt-table-checksum.rst

.. program:: pt-table-checksum

==============================
 :program:`pt-table-checksum`
==============================

.. highlight:: perl


NAME
====

 :program:`pt-table-checksum` - Perform an online replication consistency check, or checksum |MySQL| tables efficiently on one or many servers.


SYNOPSIS
========


Usage
-----

::

   pt-table-checksum [OPTION...] DSN [DSN...]

:program:`pt-table-checksum` checksums |MySQL| tables efficiently on one or more hosts.
Each host is specified as a DSN and missing values are inherited from the
first host.  If you specify multiple hosts, the first is assumed to be the
master.

\ **STOP!**\   Are you checksumming slaves against a master?  Then be sure to learn
what :option:`--replicate` does.  It is probably the option you want to use.

Checksum all slaves against the master:


.. code-block:: perl

    pt-table-checksum             \
       h=master-host              \
       --replicate mydb.checksums

    # Wait for first command to complete and replication to catchup
    # on all slaves, then...

    pt-table-checksum            \
       h=master-host             \
       --replicat mydb.checksums \
       --replicate-check 2


Checksum all databases and tables on two servers and print the differences:


.. code-block:: perl

    pt-table-checksum h=host1,u=user h=host2 | pt-checksum-filter


See "SPECIFYING HOSTS" for more on the syntax of the host arguments.


RISKS
=====


The following section is included to inform users about the potential risks,
whether known or unknown, of using this tool.  The two main categories of risks
are those created by the nature of the tool (e.g. read-only tools vs. read-write
tools) and those created by bugs.

:program:`pt-table-checksum` executes queries that cause the |MySQL| server to checksum its data.  This can cause significant server load.  It is read-only unless you use
the :option:`--replicate` option, in which case it inserts a small amount of data
into the specified table.

At the time of this release, we know of no bugs that could cause serious harm to
users.  There are miscellaneous bugs that might be annoying.

The authoritative source for updated information is always the online issue
tracking system.  Issues that affect this tool will be marked as such.  You can
see a list of such issues at the following URL:
`http://www.percona.com/bugs/pt-table-checksum <http://www.percona.com/bugs/pt-table-checksum>`_.

See also :ref:`bugs` for more information on filing bugs and getting help.


DESCRIPTION
===========

:program:`pt-table-checksum` generates table checksums for |MySQL| tables, typically
useful for verifying your slaves are in sync with the master.  The checksums
are generated by a query on the server, and there is very little network
traffic as a result.

Checksums typically take about twice as long as COUNT(\*) on very large |InnoDB|
tables in my tests.  For smaller tables, COUNT(\*) is a good bit faster than
the checksums.  See :option:`--algorithm` for more details on performance.

If you specify more than one server, :program:`pt-table-checksum` assumes the first
server is the master and others are slaves.  Checksums are parallelized for
speed, forking off a child process for each table.  Duplicate server names are
ignored, but if you want to checksum a server against itself you can use two
different forms of the hostname (for example, "localhost 127.0.0.1", or
"h=localhost,P=3306 h=localhost,P=3307").

If you want to compare the tables in one database to those in another database
on the same server, just checksum both databases:


.. code-block:: perl

    pt-table-checksum --databases db1,db2


You can then use pt-checksum-filter to compare the results in both databases
easily.

:program:`pt-table-checksum` examines table structure only on the first host specified,
so if anything differs on the others, it won't notice.  It ignores views.

The checksums work on |MySQL| version 3.23.58 through 6.0-alpha.  They will not
necessarily produce the same values on all versions.  Differences in
formatting and/or space-padding between 4.1 and 5.0, for example, will cause
the checksums to be different.


SPECIFYING HOSTS
================


Each host is specified on the command line as a DSN.  A DSN is a comma-separted
list of \ ``option=value``\  pairs.  The most basic DSN is \ ``h=host``\  to specify
the hostname of the server and use default for everything else (port, etc.).
See "DSN OPTIONS" for more information.

DSN options that are listed as \ ``copy: yes``\  are copied from the first DSN
to subsequent DSNs that do not specify the DSN option.  For example,
\ ``h=host1,P=12345 h=host2``\  is equivalent to \ ``h=host1,P=12345 h=host2,P=12345``\ .
This allows you to avoid repeating DSN options that have the same value
for all DSNs.

Connection-related command-line options like :option:`--user" and "--password`
provide default DSN values for the corresponding DSN options indicated by
the short form of each option.  For example, the short form of :option:`--user`
is \ ``-u``\  which corresponds to the \ ``u``\  DSN option, so \ ``--user bob h=host``\
is equivalent to \ ``h=host,u=bob``\ .  These defaults apply to all DSNs that
do not specify the DSN option.

The DSN option value precedence from higest to lowest is:


.. code-block:: perl

    * explicit values in each DSN on the command-line
    * copied values from the first DSN
    * default values from connection-related command-line options


If you are confused about how :program:`pt-table-checksum` will connect to your servers,
use the :option:`--explain-hosts` option and it will tell you.


HOW FAST IS IT?
===============


Speed and efficiency are important, because the typical use case is checksumming
large amounts of data.

:program:`pt-table-checksum` is designed to do very little work itself, and generates
very little network traffic aside from inspecting table structures with ``SHOW
CREATE TABLE``.  The results of checksum queries are typically 40-character or
shorter strings.

The |MySQL| server does the bulk of the work, in the form of the checksum queries.
The following benchmarks show the checksum query times for various checksum
algorithms.  The first two results are simply running \ ``COUNT(col8)``\  and
\ ``CHECKSUM TABLE``\  on the table.  \ ``CHECKSUM TABLE``\  is just \ ``CRC32``\  under the hood, but it's implemented inside the storage engine layer instead of at the
|MySQL| layer.


.. code-block:: perl

  ALGORITHM       HASH FUNCTION  EXTRA           TIME
  ==============  =============  ==============  =====
  COUNT(col8)                                    2.3
  CHECKSUM TABLE                                 5.3
  BIT_XOR         FNV_64                         12.7
  ACCUM           FNV_64                         42.4
  BIT_XOR         MD5            --optimize-xor  80.0
  ACCUM           MD5                            87.4
  BIT_XOR         SHA1           --optimize-xor  90.1
  ACCUM           SHA1                           101.3
  BIT_XOR         MD5                            172.0
  BIT_XOR         SHA1                           197.3


The tests are entirely CPU-bound.  The sample data is an |InnoDB| table with the
following structure:


.. code-block:: perl

  CREATE TABLE test (
    col1 int NOT NULL,
    col2 date NOT NULL,
    col3 int NOT NULL,
    col4 int NOT NULL,
    col5 int,
    col6 decimal(3,1),
    col7 smallint unsigned NOT NULL,
    col8 timestamp NOT NULL,
    PRIMARY KEY  (col2, col1),
    KEY (col7),
    KEY (col1)
  ) ENGINE=|InnoDB|


The table has 4303585 rows, 365969408 bytes of data and 173457408 bytes of
indexes.  The server is a Dell PowerEdge 1800 with dual 32-bit Xeon 2.8GHz
processors and 2GB of RAM.  The tests are fully CPU-bound, and the server is
otherwise idle.  The results are generally consistent to within a tenth of a
second on repeated runs.

\ ``CRC32``\  is the default checksum function to use, and should be enough for most
cases.  If you need stronger guarantees that your data is identical, you should
use one of the other functions.


ALGORITHM SELECTION
===================


The :option:`--algorithm` option allows you to specify which algorithm you would
like to use, but it does not guarantee that :program:`pt-table-checksum` will use this
algorithm.  :program:`pt-table-checksum` will ultimately select the best algorithm possible
given various factors such as the |MySQL| version and other command line options.

The three basic algorithms in descending order of preference are CHECKSUM,
BIT_XOR and ACCUM.  CHECKSUM cannot be used if any one of these criteria
is true:


.. code-block:: perl

   * --where is used
   * --since is used
   * --chunk-size is used
   * --replicate is used
   * --count is used
   * MySQL version less than 4.1.1


The BIT_XOR algorithm also requires |MySQL| version 4.1.1 or later.

After checking these criteria, if the requested :option:`--algorithm` remains then it
is used, otherwise the first remaining algorithm with the highest preference
is used.


CONSISTENT CHECKSUMS
====================


If you are using this tool to verify your slaves still have the same data as the
master, which is why I wrote it, you should read this section.

The best way to do this with replication is to use the :option:`--replicate` option.
When the queries are finished running on the master and its slaves, you can go
to the slaves and issue SQL queries to see if any tables are different from the
master.  Try the following:


.. code-block:: perl

   SELECT db, tbl, chunk, this_cnt-master_cnt AS cnt_diff,
      this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
         AS crc_diff
   FROM checksum
   WHERE master_cnt <> this_cnt OR master_crc <> this_crc
      OR ISNULL(master_crc) <> ISNULL(this_crc);


The :option:`--replicate-check` option can do this query for you.  If you can't use
this method, try the following:

  * If your servers are not being written to, you can just run the tool with no
 further ado:


 .. code-block:: perl

   pt-table-checksum server1 server2 ... serverN


  * If the servers are being written to, you need some way to make sure they are
 consistent at the moment you run the checksums.  For situations other than
 master-slave replication, you will have to figure this out yourself.  You may be
 able to use the :option:`--where` option with a date or time column to only checksum
 data that's not recent.

  * If you are checksumming a master and slaves, you can do a fast parallel
 checksum and assume the slaves are caught up to the master.  In practice, this
 tends to work well except for tables which are constantly updated.  You can
 use the :option:`--slave-lag` option to see how far behind each slave was when it
 checksummed a given table.  This can help you decide whether to investigate
 further.

  * The next most disruptive technique is to lock the table on the master, then take
 checksums.  This should prevent changes from propagating to the slaves.  You can
 just lock on the master (with :option:`--lock`), or you can both lock on the master
 and wait on the slaves till they reach that point in the master's binlog
 (:option:`--wait`).  Which is better depends on your workload; only you know that.


  * If you decide to make the checksums on the slaves wait until they're guaranteed
 to be caught up to the master, the algorithm looks like this:


 .. code-block:: perl

   For each table,
     Master: lock table
     Master: get pos
     In parallel,
       Master: checksum
       Slave(s): wait for pos, then checksum
     End
     Master: unlock table
   End


What I typically do when I'm not using the :option:`--replicate` option is simply run
the tool on all servers with no further options.  This runs fast, parallel,
non-blocking checksums simultaneously.  If there are tables that look different,
I re-run with :option:`--wait`=600 on the tables in question.  This makes the tool
lock on the master as explained above.


OUTPUT
======

Output is to ``STDOUT``, one line per server and table, with header lines for each
database.  I tried to make the output easy to process with awk.  For this reason
columns are always present.  If there's no value, :program:`pt-table-checksum` prints
'NULL'.

The default is column-aligned output for human readability, but you can change
it to tab-separated if you want.  Use the :option:`--tab` option for this.

Output is unsorted, though all lines for one table should be output together.
For speed, all checksums are done in parallel (as much as possible) and may
complete out of the order in which they were started.  You might want to run
them through another script or command-line utility to make sure they are in the
order you want.  If you pipe the output through pt-checksum-filter, you
can sort the output and/or avoid seeing output about tables that have no
differences.

The columns in the output are as follows.  The database, table, and chunk come
first so you can sort by them easily (they are the "primary key").

Output from :option:`--replicate-check` and :option:`--checksum` are different.


  * ``DATABASE``

 The database the table is in.


  * ``TABLE``

 The table name.


  * ``CHUNK``

 The chunk (see :option:`--chunk-size`).  Zero if you are not doing chunked checksums.


  * ``HOST``

 The server's hostname.


  * ``ENGINE``

 The table's storage engine.


  * ``COUNT``

 The table's row count, unless you specified to skip it.  If \ ``OVERSIZE``\  is
 printed, the chunk was skipped because the actual number of rows was greater
 than :option:`--chunk-size` times :option:`--chunk-size-limit`.


  * ``CHECKSUM``

 The table's checksum, unless you specified to skip it or the table has no rows.
 some types of checksums will be 0 if there are no rows; others will print NULL.

  * ``TIME``

 How long it took to checksum the \ ``CHUNK``\ , not including \ ``WAIT``\  time.
 Total checksum time is \ ``WAIT + TIME``\ .


  * ``WAIT``

 How long the slave waited to catch up to its master before beginning to
 checksum.  \ ``WAIT``\  is always 0 for the master.  See :option:`--wait`.


  * ``STAT``

 The return value of MASTER_POS_WAIT().  \ ``STAT``\  is always \ ``NULL``\  for the
 master.


  * ``LAG``

 How far the slave lags the master, as reported by SHOW SLAVE STATUS.
 \ ``LAG``\  is always \ ``NULL``\  for the master.


REPLICATE TABLE MAINTENANCE
===========================


If you use :option:`--replicate` to store and replicate checksums, you may need to
perform maintenance on the replicate table from time to time to remove old
checksums.  This section describes when checksums in the replicate table are
deleted automatically by :program:`pt-table-checksum` and when you must manually delete
them.

Before starting, :program:`pt-table-checksum` calculates chunks for each table, even
if :option:`--chunk-size` is not specified (in that case there is one chunk: "1=1").
Then, before checksumming each table, the tool deletes checksum chunks in the
replicate table greater than the current number of chunks.  For example,
if a table is chunked into 100 chunks, 0-99, then :program:`pt-table-checksum` does:


.. code-block:: perl

   DELETE FROM replicate table WHERE db=? AND tbl=? AND chunk > 99


That removes any high-end chunks from previous runs which no longer exist.
Currently, this operation cannot be disabled.

If you use :option:`--resume`, :option:`--resume-replicate`, or :option:`--modulo`, then
you need to be careful that the number of rows in a table does not decrease
so much that the number of chunks decreases too, else some checksum chunks may
be deleted.  The one exception is if only rows at the high end of the range
are deleted.  In that case, the high-end chunks are deleted and lower chunks
remain unchanged.  An increasing number of rows or chunks should not cause
any adverse affects.

Changing the :option:`--chunk-size` between runs with :option:`--resume`,
:option:`--resume-replicate`, or :option:`--modulo` can cause odd or invalid checksums.
You should not do this.  It won't work with the resume options.  With
:option:`--modulo`, the safest thing to do is manually delete all the rows in
the replicate table for the table in question and start over.

If the replicate table becomes cluttered with old or invalid checksums
and the auto-delete operation is not deleting them, then you will need to
manually clean up the replicate table.  Alternatively, if you specify
:option:`--empty-replicate-table`, then the tool deletes every row in the
replicate table.


EXIT STATUS
===========

An exit status of 0 (sometimes also called a return value or return code)
indicates success.  If there is an error checksumming any table, the exit status
is 1.

When running :option:`--replicate-check`, if any slave has chunks that differ from
the master, the exit status is 1.


QUERIES
=======


If you are using innotop (see `http://code.google.com/p/innotop <http://code.google.com/p/innotop>`_),
mytop, or another tool to watch currently running |MySQL| queries, you may see
the checksum queries.  They look similar to this:


.. code-block:: perl

   REPLACE /*test.test_tbl:'2'/'5'*/ INTO test.checksum(db, ...


Since :program:`pt-table-checksum`'s queries run for a long time and tend to be
textually very long, and thus won't fit on one screen of these monitoring
tools, I've been careful to place a comment at the beginning of the query so
you can see what it is and what it's doing.  The comment contains the name of
the table that's being checksummed, the chunk it is currently checksumming,
and how many chunks will be checksummed.  In the case above, it is
checksumming chunk 2 of 5 in table test.test_tbl.


OPTIONS
=======


:option:`--schema` is restricted to option groups Connection, Filter, Output, Help, Config, Safety.

:option:`--empty-replicate-table`, :option:`--resume` and :option:`--resume-replicate` are mutually exclusive.

This tool accepts additional command-line arguments.  Refer to the "SYNOPSIS" and usage information for details.


.. option:: --algorithm

 type: string

 Checksum algorithm (ACCUM|CHECKSUM|BIT_XOR).

 Specifies which checksum algorithm to use.  Valid arguments are CHECKSUM,
 BIT_XOR and ACCUM.  The latter two do cryptographic hash checksums.
 See also "ALGORITHM SELECTION".

 CHECKSUM is built into |MySQL|, but has some disadvantages.  BIT_XOR and ACCUM are
 implemented by SQL queries.  They use a cryptographic hash of all columns
 concatenated together with a separator, followed by a bitmap of each nullable
 column that is NULL (necessary because CONCAT_WS() skips NULL columns).

 CHECKSUM is the default.  This method uses |MySQL|'s built-in CHECKSUM TABLE
 command, which is a CRC32 behind the scenes.  It cannot be used before |MySQL|
 4.1.1, and various options disable it as well.  It does not simultaneously count
 rows; that requires an extra COUNT(\*) query.  This is a good option when you are
 using |MyISAM| tables with live checksums enabled; in this case both the COUNT(\*)
 and CHECKSUM queries will run very quickly.

 The BIT_XOR algorithm is available for |MySQL| 4.1.1 and newer.  It uses
 BIT_XOR(), which is order-independent, to reduce all the rows to a single
 checksum.

 ACCUM uses a user variable as an accumulator.  It reduces each row to a single
 checksum, which is concatenated with the accumulator and re-checksummed.  This
 technique is order-dependent.  If the table has a primary key, it will be used
 to order the results for consistency; otherwise it's up to chance.

 The pathological worst case is where identical rows will cancel each other out
 in the BIT_XOR.  In this case you will not be able to distinguish a table full
 of one value from a table full of another value.  The ACCUM algorithm will
 distinguish them.

 However, the ACCUM algorithm is order-dependent, so if you have two tables
 with identical data but the rows are out of order, you'll get different
 checksums with ACCUM.

 If a given algorithm won't work for some reason, :program:`pt-table-checksum` falls back to
 another.  The least common denominator is ACCUM, which works on |MySQL| 3.23.2 and
 newer.


.. option:: --arg-table

 type: string

 The database.table with arguments for each table to checksum.

 This table may be named anything you wish.  It must contain at least the
 following columns:


 .. code-block:: perl

    CREATE TABLE checksum_args (
       db         char(64)     NOT NULL,
       tbl        char(64)     NOT NULL,
       -- other columns as desired
       PRIMARY KEY (db, tbl)
    );


 In addition to the columns shown, it may contain any of the other columns listed
 here (Note: this list is used by the code, MAGIC_overridable_args):


 .. code-block:: perl

    algorithm chunk-column chunk-index chunk-size columns count crc function lock
    modulo use-index offset optimize-xor chunk-size-limit probability separator
    save-since single-chunk since since-column sleep sleep-coef trim wait where


 Each of these columns corresponds to the long form of a command-line option.
 Each column should be NULL-able.  Column names with hyphens should be enclosed
 in backticks (e.g. \`chunk-size\`) when the table is created.  The data type does
 not matter, but it's suggested you use a sensible data type to prevent garbage
 data.

 When :program:`pt-table-checksum` checksums a table, it will look for a matching entry
 in this table.  Any column that has a defined value will override the
 corresponding command-line argument for the table being currently processed.
 In this way it is possible to specify custom command-line arguments for any
 table.

 If you add columns to the table that aren't in the above list of allowable
 columns, it's an error.  The exceptions are \ ``db``\ , \ ``tbl``\ , and \ ``ts``\ .  The \ ``ts``\
 column can be used as a timestamp for easy visibility into the last time the
 \ ``since``\  column was updated with :option:`--save-since`.

 This table is assumed to be located on the first server given on the
 command-line.


.. option:: --ask-pass

 group: Connection

 Prompt for a password when connecting to |MySQL|.


.. option:: --check-interval

 type: time; group: Throttle; default: 1s

 How often to check for slave lag if :option:`--check-slave-lag` is given.


.. option:: --[no]check-replication-filters

 default: yes; group: Safety

 Do not :option:`--replicate` if any replication filters are set.  When
 --replicate is specified, :program:`pt-table-checksum` tries to detect slaves and look
 for options that filter replication, such as binlog_ignore_db and
 replicate_do_db.  If it finds any such filters, it aborts with an error.
 Replication filtering makes it impossible to be sure that the checksum
 queries won't break replication or simply fail to replicate.  If you are sure
 that it's OK to run the checksum queries, you can negate this option to
 disable the checks.  See also :option:`--replicate-database`.


.. option:: --check-slave-lag

 type: DSN; group: Throttle

 Pause checksumming until the specified slave's lag is less than :option:`--max-lag`.

 If this option is specified and :option:`--throttle-method` is set to \ ``slavelag``\
 then :option:`--throttle-method` only checks this slave.


.. option:: --checksum

 group: Output

 Print checksums and table names in the style of md5sum (disables
 :option:`--[no]count`).

 Makes the output behave more like the output of \ ``md5sum``\ .  The checksum is
 first on the line, followed by the host, database, table, and chunk number,
 concatenated with dots.


.. option:: --chunk-column

 type: string

 Prefer this column for dividing tables into chunks.  By default,
 :program:`pt-table-checksum` chooses the first suitable column for each table, preferring
 to use the primary key.  This option lets you specify a preferred column, which
 :program:`pt-table-checksum` uses if it exists in the table and is chunkable.  If not, then
 :program:`pt-table-checksum` will revert to its default behavior.  Be careful when using
 this option; a poor choice could cause bad performance.  This is probably best
 to use when you are checksumming only a single table, not an entire server.  See
 also :option:`--chunk-index`.


.. option:: --chunk-index

 type: string

 Prefer this index for chunking tables.  By default, :program:`pt-table-checksum` chooses an appropriate index for the :option:`--chunk-column` (even if it chooses the chunk
 column automatically).  This option lets you specify the index you prefer.  If
 the index doesn't exist, then :program:`pt-table-checksum` will fall back to its default
 behavior.  :program:`pt-table-checksum` adds the index to the checksum SQL statements in a \ ``FORCE INDEX``\  clause.  Be careful when using this option; a poor choice of
 index could cause bad performance.  This is probably best to use when you are
 checksumming only a single table, not an entire server.


.. option:: --chunk-range

 type: string; default: open

 Set which ends of the chunk range are open or closed.  Possible values are
 one of MAGIC_chunk_range:


 .. code-block:: perl

     VALUE       OPENS/CLOSES
     ==========  ======================
     open        Both ends are open
     openclosed  Low end open, high end closed


 By default :program:`pt-table-checksum` uses an open range of chunks like:


 .. code-block:: perl

    `id` <  '10'
    `id` >= '10' AND < '20'
    `id` >= '20'


 That range is open because the last chunk selects any row with id greater than
 (or equal to) 20.  An open range can be a problem in cases where a lot of new
 rows are inserted with IDs greater than 20 while :program:`pt-table-checksum`ming is
 running because the final open-ended chunk will select all the newly inserted
 rows.  (The less common case of inserting rows with IDs less than 10 would
 require a \ ``closedopen``\  range but that is not currently implemented.)
 Specifying \ ``openclosed``\  will cause the final chunk to be closed like:


 .. code-block:: perl

    `id` >= '20' AND `id` <= N


 N is the \ ``MAX(\`id\`)``\  that :program:`pt-table-checksum` used when it first chunked
 the rows.  Therefore, it will only chunk the range of rows that existed when
 the tool started and not any newly inserted rows (unless those rows happen
 to be inserted with IDs less than N).

 See also :option:`--chunk-size-limit`.


.. option:: --chunk-size

 type: string

 Approximate number of rows or size of data to checksum at a time.  Allowable
 suffixes are k, M, G. Disallows \ ``--algorithm CHECKSUM``\ .

 If you specify a chunk size, :program:`pt-table-checksum` will try to find an index that
 will let it split the table into ranges of approximately :option:`--chunk-size`
 rows, based on the table's index statistics.  Currently only numeric and date
 types can be chunked.

 If the table is chunkable, :program:`pt-table-checksum` will checksum each range separately
 with parameters in the checksum query's WHERE clause.  If :program:`pt-table-checksum`
 cannot find a suitable index, it will do the entire table in one chunk as though
 you had not specified :option:`--chunk-size` at all.  Each table is handled
 individually, so some tables may be chunked and others not.

 The chunks will be approximately sized, and depending on the distribution of
 values in the indexed column, some chunks may be larger than the value you
 specify.

 If you specify a suffix (one of k, M or G), the parameter is treated as a data
 size rather than a number of rows.  The output of SHOW TABLE STATUS is then used
 to estimate the amount of data the table contains, and convert that to a number
 of rows.


.. option:: --chunk-size-limit

 type: float; default: 2.0; group: Safety

 Do not checksum chunks with this many times more rows than :option:`--chunk-size`.

 When :option:`--chunk-size` is given it specifies an ideal size for each chunk
 of a chunkable table (in rows; size values are converted to rows).  Before
 checksumming each chunk, :program:`pt-table-checksum` checks how many rows are in the
 chunk with EXPLAIN.  If the number of rows reported by EXPLAIN is this many
 times greater than :option:`--chunk-size`, then the chunk is skipped and \ ``OVERSIZE``\
 is printed for the \ ``COUNT``\  column of the "OUTPUT".

 For example, if you specify :option:`--chunk-size` 100 and a chunk has 150 rows,
 then it is checksummed with the default :option:`--chunk-size-limit` value 2.0
 because 150 is less than 100 \* 2.0.  But if the chunk has 205 rows, then it
 is not checksummed because 205 is greater than 100 \* 2.0.

 The minimum value for this option is 1 which means that no chunk can be any
 larger than :option:`--chunk-size`.  You probably don't want to specify 1 because
 rows reported by EXPLAIN are estimates which can be greater than or less than
 the real number of rows in the chunk.  If too many chunks are skipped because
 they are oversize, you might want to specify a value larger than 2.

 You can disable oversize chunk checking by specifying :option:`--chunk-size-limit` 0.

 See also :option:`--unchunkable-tables`.


.. option:: --columns

 short form: -c; type: array; group: Filter

 Checksum only this comma-separated list of columns.


.. option:: --config

 type: Array; group: Config

 Read this comma-separated list of config files; if specified, this must be the
 first option on the command line.


.. option:: --[no]count

 Count rows in tables.  This is built into ACCUM and BIT_XOR, but requires an
 extra query for CHECKSUM.

 This is disabled by default to avoid an extra COUNT(\*) query when
 :option:`--algorithm` is CHECKSUM.  If you have only |MyISAM| tables and live checksums
 are enabled, both CHECKSUM and COUNT will be very fast, but otherwise you may
 want to use one of the other algorithms.


.. option:: --[no]crc

 default: yes

 Do a CRC (checksum) of tables.

 Take the checksum of the rows as well as their count.  This is enabled by
 default.  If you disable it, you'll just get COUNT(\*) queries.


.. option:: --create-replicate-table

 Create the replicate table given by :option:`--replicate` if it does not exist.

 Normally, if the replicate table given by :option:`--replicate` does not exist,
 \ ` :program:`pt-table-checksum```\  will die. With this option, however, \ ` :program:`pt-table-checksum```\
 will create the replicate table for you, using the database.table name given to
 :option:`--replicate`.

 The structure of the replicate table is the same as the suggested table
 mentioned in :option:`--replicate`. Note that since ENGINE is not specified, the
 replicate table will use the server's default storage engine.  If you want to
 use a different engine, you need to create the table yourself.


.. option:: --databases

 short form: -d; type: hash; group: Filter

 Only checksum this comma-separated list of databases.


.. option:: --databases-regex

 type: string

 Only checksum databases whose names match this *Perl*  regex.


.. option:: --defaults-file

 short form: -F; type: string; group: Connection

 Only read mysql options from the given file.  You must give an absolute
 pathname.


.. option:: --empty-replicate-table

 DELETE all rows in the :option:`--replicate` table before starting.

 Issues a DELETE against the table given by :option:`--replicate` before beginning
 work.  Ignored if :option:`--replicate` is not specified.  This can be useful to
 remove entries related to tables that no longer exist, or just to clean out the
 results of a previous run.

 If you want to delete entries for specific databases or tables you must
 do this manually.


.. option:: --engines

 short form: -e; type: hash; group: Filter

 Do only this comma-separated list of storage engines.


.. option:: --explain

 group: Output

 Show, but do not execute, checksum queries (disables :option:`--empty-replicate-table`).


.. option:: --explain-hosts

 group: Help

 Print full DSNs for each host and exit.  This option allows you to see how
 :program:`pt-table-checksum` parses DSNs from the command-line and how it will connect
 to those hosts.  See "SPECIFYING HOSTS".


.. option:: --float-precision

 type: int

 Precision for \ ``FLOAT``\  and \ ``DOUBLE``\  number-to-string conversion.  Causes FLOAT
 and DOUBLE values to be rounded to the specified number of digits after the
 decimal point, with the ROUND() function in |MySQL|.  This can help avoid
 checksum mismatches due to different floating-point representations of the same
 values on different |MySQL| versions and hardware.  The default is no rounding;
 the values are converted to strings by the CONCAT() function, and |MySQL| chooses
 the string representation.  If you specify a value of 2, for example, then the
 values 1.008 and 1.009 will be rounded to 1.01, and will checksum as equal.


.. option:: --function

 type: string

 Hash function for checksums (FNV1A_64, MURMUR_HASH, SHA1, MD5, CRC32, etc).

 You can use this option to choose the cryptographic hash function used for
 :option:`--algorithm=ACCUM` or :option:`--algorithm=BIT_XOR`.  The default is to use
 \ ``CRC32``\ , but \ ``MD5``\  and \ ``SHA1``\  also work, and you can use your own function, such as a compiled UDF, if you wish.  Whatever function you specify is run in
 SQL, not in *Perl* , so it must be available to |MySQL|.

 The \ ``FNV1A_64``\  UDF mentioned in the benchmarks is much faster than \ ``MD5``\ .  The C++ source code is distributed with Maatkit.  It is very simple to compile and
 install; look at the header in the source code for instructions.  If it is
 installed, it is preferred over \ ``MD5``\ .  You can also use the MURMUR_HASH
 function if you compile and install that as a UDF; the source is also
 distributed with Maatkit, and it is faster and has better distribution
 than FNV1A_64.


.. option:: --help

 group: Help

 Show help and exit.


.. option:: --ignore-columns

 type: Hash; group: Filter

 Ignore this comma-separated list of columns when calculating the checksum.

 This option only affects the checksum when using the ACCUM or BIT_XOR
 :option:`--algorithm`.


.. option:: --ignore-databases

 type: Hash; group: Filter

 Ignore this comma-separated list of databases.


.. option:: --ignore-databases-regex

 type: string

 Ignore databases whose names match this *Perl*  regex.


.. option:: --ignore-engines

 type: Hash; default: FEDERATED,MRG_MyISAM; group: Filter

 Ignore this comma-separated list of storage engines.


.. option:: --ignore-tables

 type: Hash; group: Filter

 Ignore this comma-separated list of tables.

 Table names may be qualified with the database name.


.. option:: --ignore-tables-regex

 type: string

 Ignore tables whose names match the *Perl*  regex.


.. option:: --lock

 Lock on master until done on slaves (implies :option:`--slave-lag`).

 This option can help you to get a consistent read on a master and many slaves.
 If you specify this option, :program:`pt-table-checksum` will lock the table on the
 first server on the command line, which it assumes to be the master.  It will
 keep this lock until the checksums complete on the other servers.

 This option isn't very useful by itself, so you probably want to use :option:`--wait`
 instead.

 Note: if you're checksumming a slave against its master, you should use
 :option:`--replicate`.  In that case, there's no need for locking, waiting, or any of
 that.


.. option:: --max-lag

 type: time; group: Throttle; default: 1s

 Suspend checksumming if the slave given by :option:`--check-slave-lag` lags.

 This option causes :program:`pt-table-checksum` to look at the slave every time it's about
 to checksum a chunk.  If the slave's lag is greater than the option's value, or
 if the slave isn't running (so its lag is NULL), :program:`pt-table-checksum` sleeps for
 :option:`--check-interval` seconds and then looks at the lag again.  It repeats until
 the slave is caught up, then proceeds to checksum the chunk.

 This option is useful to let you checksum data as fast as the slaves can handle
 it, assuming the slave you directed :program:`pt-table-checksum` to monitor is
 representative of all the slaves that may be replicating from this server.  It
 should eliminate the need for :option:`--sleep` or :option:`--sleep-coef`.


.. option:: --modulo

 type: int

 Do only every Nth chunk on chunked tables.

 This option lets you checksum only some chunks of the table.  This is a useful
 alternative to :option:`--probability` when you want to be sure you get full coverage
 in some specified number of runs; for example, you can do only every 7th chunk,
 and then use :option:`--offset` to rotate the modulo every day of the week.

 Just like with :option:`--probability`, a table that cannot be chunked is done every
 time.


.. option:: --offset

 type: string; default: 0

 Modulo offset expression for use with :option:`--modulo`.

 The argument may be an SQL expression, such as \ ``WEEKDAY(NOW())``\  (which returns
 a number from 0 through 6).  The argument is evaluated by |MySQL|.  The result is
 used as follows: if chunk_num % :option:`--modulo" == "--offset`, the chunk will
 be checksummed.


.. option:: --[no]optimize-xor

 default: yes

 Optimize BIT_XOR with user variables.

 This option specifies to use user variables to reduce the number of times each
 row must be passed through the cryptographic hash function when you are using
 the BIT_XOR algorithm.

 With the optimization, the queries look like this in pseudo-code:


 .. code-block:: perl

    SELECT CONCAT(
       BIT_XOR(SLICE_OF(@user_variable)),
       BIT_XOR(SLICE_OF(@user_variable)),
       ...
       BIT_XOR(SLICE_OF(@user_variable := HASH(col1, col2... colN))));


 The exact positioning of user variables and calls to the hash function is
 determined dynamically, and will vary between |MySQL| versions.  Without the
 optimization, it looks like this:


 .. code-block:: perl

    SELECT CONCAT(
       BIT_XOR(SLICE_OF(MD5(col1, col2... colN))),
       BIT_XOR(SLICE_OF(MD5(col1, col2... colN))),
       ...
       BIT_XOR(SLICE_OF(MD5(col1, col2... colN))));


 The difference is the number of times all the columns must be mashed together
 and fed through the hash function.  If you are checksumming really large
 columns, such as BLOB or TEXT columns, this might make a big difference.


.. option:: --password

 short form: -p; type: string; group: Connection

 Password to use when connecting.


.. option:: --pid

 type: string

 Create the given PID file.  The file contains the process ID of the script.
 The PID file is removed when the script exits.  Before starting, the script
 checks if the PID file already exists.  If it does not, then the script creates
 and writes its own PID to it.  If it does, then the script checks the following:
 if the file contains a PID and a process is running with that PID, then
 the script dies; or, if there is no process running with that PID, then the
 script overwrites the file with its own PID and starts; else, if the file
 contains no PID, then the script dies.


.. option:: --port

 short form: -P; type: int; group: Connection

 Port number to use for connection.


.. option:: --probability

 type: int; default: 100

 Checksums will be run with this percent probability.

 This is an integer between 1 and 100.  If 100, every chunk of every table will
 certainly be checksummed.  If less than that, there is a chance that some chunks
 of some tables will be skipped.  This is useful for routine jobs designed to
 randomly sample bits of tables without checksumming the whole server.  By
 default, if a table is not chunkable, it will be checksummed every time even
 when the probability is less than 100.  You can override this with
 :option:`--single-chunk`.

 See also :option:`--modulo`.


.. option:: --progress

 type: array; default: time,30

 Print progress reports to ``STDERR``.  Currently, this feature is only for when
 :option:`--throttle-method` waits for slaves to catch up.

 The value is a comma-separated list with two parts.  The first part can be
 percentage, time, or iterations; the second part specifies how often an update
 should be printed, in percentage, seconds, or number of iterations.


.. option:: --quiet

 short form: -q; group: Output

 Do not print checksum results.


.. option:: --recheck

 Re-checksum chunks that :option:`--replicate-check` found to be different.


.. option:: --recurse

 type: int; group: Throttle

 Number of levels to recurse in the hierarchy when discovering slaves.
 Default is infinite.

 See :option:`--recursion-method`.


.. option:: --recursion-method

 type: string

 Preferred recursion method for discovering slaves.

 Possible methods are:


 .. code-block:: perl

    METHOD       USES
    ===========  ================
    processlist  SHOW PROCESSLIST
    hosts        SHOW SLAVE HOSTS


 The processlist method is preferred because SHOW SLAVE HOSTS is not reliable.
 However, the hosts method is required if the server uses a non-standard
 port (not 3306).  Usually :program:`pt-table-checksum` does the right thing and finds
 the slaves, but you may give a preferred method and it will be used first.
 If it doesn't find any slaves, the other methods will be tried.


.. option:: --replicate

 type: string

 Replicate checksums to slaves (disallows --algorithm CHECKSUM).

 This option enables a completely different checksum strategy for a consistent,
 lock-free checksum across a master and its slaves.  Instead of running the
 checksum queries on each server, you run them only on the master.  You specify a
 table, fully qualified in db.table format, to insert the results into.  The
 checksum queries will insert directly into the table, so they will be replicated
 through the binlog to the slaves.

 When the queries are finished replicating, you can run a simple query on each
 slave to see which tables have differences from the master.  With the
 :option:`--replicate-check` option, :program:`pt-table-checksum` can run the query for you to make it even easier.  See "CONSISTENT CHECKSUMS" for details.

 If you find tables that have differences, you can use the chunk boundaries in a
 WHERE clause with pt-table-sync to help repair them more efficiently.  See
 pt-table-sync for details.

 The table must have at least these columns: db, tbl, chunk, boundaries,
 this_crc, master_crc, this_cnt, master_cnt.  The table may be named anything you
 wish.  Here is a suggested table structure, which is automatically used for
 :option:`--create-replicate-table` (MAGIC_create_replicate):


 .. code-block:: perl

    CREATE TABLE checksum (
       db         char(64)     NOT NULL,
       tbl        char(64)     NOT NULL,
       chunk      int          NOT NULL,
       boundaries char(100)    NOT NULL,
       this_crc   char(40)     NOT NULL,
       this_cnt   int          NOT NULL,
       master_crc char(40)         NULL,
       master_cnt int              NULL,
       ts         timestamp    NOT NULL,
       PRIMARY KEY (db, tbl, chunk)
    );


 Be sure to choose an appropriate storage engine for the checksum table.  If you
 are checksumming |InnoDB| tables, for instance, a deadlock will break replication
 if the checksum table is non-transactional, because the transaction will still
 be written to the binlog.  It will then replay without a deadlock on the
 slave and break replication with "different error on master and slave."  This
 is not a problem with :program:`pt-table-checksum`, it's a problem with |MySQL|
 replication, and you can read more about it in the |MySQL| manual.

 This works only with statement-based replication  :program:`pt-table-checksum` will switch
 the binlog format to STATEMENT for the duration of the session if your server
 uses row-based replication).

 In contrast to running the tool against multiple servers at once, using this
 option eliminates the complexities of synchronizing checksum queries across
 multiple servers, which normally requires locking and unlocking, waiting for
 master binlog positions, and so on.  Thus, it disables :option:`--lock`, :option:`--wait`, and :option:`--slave-lag` (but not :option:`--check-slave-lag`, which is a way to throttle the execution speed).

 The checksum queries actually do a REPLACE into this table, so existing rows
 need not be removed before running.  However, you may wish to do this anyway to
 remove rows related to tables that don't exist anymore.  The
 :option:`--empty-replicate-table` option does this for you.

 Since the table must be qualified with a database (e.g. \ ``db.checksums``\ ),
 :program:`pt-table-checksum` will only USE this database.  This may be important if any
 replication options are set because it could affect whether or not changes
 to the table are replicated.

 If the slaves have any --replicate-do-X or --replicate-ignore-X options, you
 should be careful not to checksum any databases or tables that exist on the
 master and not the slaves.  Changes to such tables may not normally be executed
 on the slaves because of the :option:`--replicate` options, but the checksum queries
 modify the contents of the table that stores the checksums, not the tables whose
 data you are checksumming.  Therefore, these queries will be executed on the
 slave, and if the table or database you're checksumming does not exist, the
 queries will cause replication to fail.  For more information on replication
 rules, see `http://dev.mysql.com/doc/en/replication-rules.html <http://dev.mysql.com/doc/en/replication-rules.html>`_.

 The table specified by :option:`--replicate` will never be checksummed itself.


.. option:: --replicate-check

 type: int

 Check results in :option:`--replicate` table, to the specified depth.  You must use
 this after you run the tool normally; it skips the checksum step and only checks
 results.

 It recursively finds differences recorded in the table given by
 :option:`--replicate`.  It recurses to the depth you specify: 0 is no recursion
 (check only the server you specify), 1 is check the server and its slaves, 2 is
 check the slaves of its slaves, and so on.

 It finds differences by running the query shown in "CONSISTENT CHECKSUMS",
 and prints results, then exits after printing.  This is just a convenient way of
 running the query so you don't have to do it manually.

 The output is one informational line per slave host, followed by the results
 of the query, if any.  If :option:`--quiet` is specified, there is no output.  If
 there are no differences between the master and any slave, there is no output.
 If any slave has chunks that differ from the master, :program:`pt-table-checksum`'s
 exit status is 1; otherwise it is 0.

 This option makes :program:`pt-table-checksum` look for slaves by running \ ``SHOW PROCESSLIST``\ .  If it finds connections that appear to be from slaves, it derives
 connection information for each slave with the same default-and-override method
 described in "SPECIFYING HOSTS".

 If \ ``SHOW PROCESSLIST``\  doesn't return any rows, :program:`pt-table-checksum` looks at  \ ``SHOW SLAVE HOSTS``\  instead.  The host and port, and user and password if
 available, from \ ``SHOW SLAVE HOSTS``\  are combined into a DSN and used as the
 argument.  This requires slaves to be configured with \ ``report-host``\ ,
 \ ``report-port``\  and so on.

 This requires the @@SERVER_ID system variable, so it works only on |MySQL|
 3.23.26 or newer.


.. option:: --replicate-database

 type: string

 \ ``USE``\  only this database with :option:`--replicate`.  By default, :program:`pt-table-checksum`  executes USE to set its default database to the database that contains the table it's currently working on.  It changes its default database as it works on different tables.  This is is a best effort to avoid problems with replication
 filters such as binlog_ignore_db and replicate_ignore_db.  However, replication
 filters can create a situation where there simply is no one right way to do
 things.  Some statements might not be replicated, and others might cause
 replication to fail on the slaves.  In such cases, it is up to the user to
 specify a safe default database.  This option specifies a default database that
 :program:`pt-table-checksum` selects with USE, and never changes afterwards.  See also
 :option:`--[no]check-replication-filters`.


.. option:: --resume

 type: string

 Resume checksum using given output file from a previously interrupted run.

 The given output file should be the literal output from a previous run of
 :program:`pt-table-checksum`.  For example:


 .. code-block:: perl

     pt-table-checksum host1 host2 -C 100 > checksum_results.txt
     pt-table-checksum host1 host2 -C 100 --resume checksum_results.txt


 The command line options given to the first run and the resumed run must
 be identical (except, of course, for :option:`--resume`).  If they are not, the result
 will be unpredictable and probably wrong.

 :option:`--resume" does not work with "--replicate`; for that, use :option:`--resume-replicate`.


.. option:: --resume-replicate

 Resume :option:`--replicate`.

 This option resumes a previous checksum operation using :option:`--replicate`.
 It is like :option:`--resume` but does not require an output file.  Instead,
 it uses the checksum table given to :option:`--replicate` to determine where to
 resume the checksum operation.


.. option:: --save-since

 When :option:`--arg-table` and :option:`--since` are given, save the current :option:`--since` value into that table's \ ``since``\  column after checksumming.  In this way you can incrementally checksum tables by starting where the last one finished.

 The value to be saved could be the current timestamp, or it could be the maximum
 existing value of the column given by :option:`--since-column`.  It depends on what
 options are in effect.  See the description of :option:`--since` to see how
 timestamps are different from ordinary values.


.. option:: --schema

 Checksum \ ``SHOW CREATE TABLE``\  instead of table data.


.. option:: --separator

 type: string; default: #

 The separator character used for CONCAT_WS().

 This character is used to join the values of columns when checksumming with
 :option:`--algorithm` of BIT_XOR or ACCUM.


.. option:: --set-vars

 type: string; default: wait_timeout=10000; group: Connection

 Set these |MySQL| variables.  Immediately after connecting to |MySQL|, this
 string will be appended to SET and executed.


.. option:: --since

 type: string

 Checksum only data newer than this value.

 If the table is chunk-able or nibble-able, this value will apply to the first
 column of the chunked or nibbled index.

 This is not too different to :option:`--where`, but instead of universally applying a
 WHERE clause to every table, it selectively finds the right column to use and
 applies it only if such a column is found.  See also :option:`--since-column`.

 The argument may be an expression, which is evaluated by |MySQL|.  For example,
 you can specify \ ``CURRENT_DATE - INTERVAL 7 DAY``\  to get the date of one week
 ago.

 A special bit of extra magic: if the value is temporal (looks like a date or
 datetime), then the table is checksummed only if the create time (or last
 modified time, for tables that report the last modified time, such as |MyISAM|
 tables) is newer than the value.  In this sense it's not applied as a WHERE
 clause at all.


.. option:: --since-column

 type: string

 The column name to be used for :option:`--since`.

 The default is for the tool to choose the best one automatically.  If you
 specify a value, that will be used if possible; otherwise the best
 auto-determined one; otherwise none.  If the column doesn't exist in the table,
 it is just ignored.


.. option:: --single-chunk

 Permit skipping with :option:`--probability` if there is only one chunk.

 Normally, if a table isn't split into many chunks, it will always be
 checksummed regardless of :option:`--probability`.  This setting lets the
 probabilistic behavior apply to tables that aren't divided into chunks.


.. option:: --slave-lag

 group: Output

 Report replication delay on the slaves.

 If this option is enabled, the output will show how many seconds behind the
 master each slave is.  This can be useful when you want a fast, parallel,
 non-blocking checksum, and you know your slaves might be delayed relative to the
 master.  You can inspect the results and make an educated guess whether any
 discrepancies on the slave are due to replication delay instead of corrupt data.

 If you're using :option:`--replicate`, a slave that is delayed relative to the master
 does not invalidate the correctness of the results, so this option is disabled.


.. option:: --sleep

 type: int; group: Throttle

 Sleep time between checksums.

 If this option is specified, :program:`pt-table-checksum` will sleep the specified
 number of seconds between checksums.  That is, it will sleep between every
 table, and if you specify :option:`--chunk-size`, it will also sleep between chunks.

 This is a very crude way to throttle checksumming; see :option:`--sleep-coef` and
 :option:`--check-slave-lag` for techniques that permit greater control.


.. option:: --sleep-coef

 type: float; group: Throttle

 Calculate :option:`--sleep` as a multiple of the last checksum time.

 If this option is specified, :program:`pt-table-checksum` will sleep the amount of
 time elapsed during the previous checksum, multiplied by the specified
 coefficient.  This option is ignored if :option:`--sleep` is specified.

 This is a slightly more sophisticated way to throttle checksum speed: sleep a
 varying amount of time between chunks, depending on how long the chunks are
 taking.  Even better is to use :option:`--check-slave-lag` if you're checksumming
 master/slave replication.


.. option:: --socket

 short form: -S; type: string; group: Connection

 Socket file to use for connection.


.. option:: --tab

 group: Output

 Print tab-separated output, not column-aligned output.


.. option:: --tables

 short form: -t; type: hash; group: Filter

 Do only this comma-separated list of tables.

 Table names may be qualified with the database name.


.. option:: --tables-regex

 type: string

 Only checksum tables whose names match this *Perl*  regex.


.. option:: --throttle-method

 type: string; default: none; group: Throttle

 Throttle checksumming when doing :option:`--replicate`.

 At present there is only one method: \ ``slavelag``\ .  When :option:`--replicate` is
 used, :program:`pt-table-checksum` automatically sets :option:`--throttle-method` to
 \ ``slavelag``\  and discovers every slave and throttles checksumming if any slave
 lags more than :option:`--max-lag`.  Specify \ ``-throttle-method none``\  to disable
 this behavior completely, or specify :option:`--check-slave-lag` and
 :program:`pt-table-checksum` will only check that slave.

 See also :option:`--recurse` and :option:`--recursion-method`.


.. option:: --trim

 Trim \ ``VARCHAR``\  columns (helps when comparing 4.1 to >= 5.0).

 This option adds a \ ``TRIM()``\  to \ ``VARCHAR``\  columns in \ ``BIT_XOR``\  and \ ``ACCUM``\
 modes.

 This is useful when you don't care about the trailing space differences between
 |MySQL| versions which vary in their handling of trailing spaces. |MySQL| 5.0 and
 later all retain trailing spaces in \ ``VARCHAR``\ , while previous versions would
 remove them.


.. option:: --unchunkable-tables

 group: Safety

 Checksum tables that cannot be chunked when :option:`--chunk-size` is specified.

 By default :program:`pt-table-checksum` will not checksum a table that cannot be chunked
 when :option:`--chunk-size` is specified because this might result in a huge,
 non-chunkable table being checksummed in one huge, memory-intensive chunk.

 Specifying this option allows checksumming tables that cannot be chunked.
 Be careful when using this option!  Make sure any non-chunkable tables
 are not so large that they will cause the tool to consume too much memory
 or CPU.

 See also :option:`--chunk-size-limit`.


.. option:: --[no]use-index

 default: yes

 Add FORCE INDEX hints to SQL statements.

 By default \ ` :program:`pt-table-checksum```\  adds an index hint (\ ``FORCE INDEX``\  for |MySQL| v4.0.9 and newer, \ ``USE INDEX``\  for older |MySQL| versions) to each SQL statement to coerce |MySQL| into using the :option:`--chunk-index` (whether the index is
 specified by the option or auto-detected).  Specifying \ ``--no-use-index``\  causes
 :program:`pt-table-checksum` to omit index hints.


.. option:: --user

 short form: -u; type: string; group: Connection

 User for login if not current user.


.. option:: --[no]verify

 default: yes

 Verify checksum compatibility across servers.

 This option runs a trivial checksum on all servers to ensure they have
 compatible CONCAT_WS() and cryptographic hash functions.

 Versions of |MySQL| before 4.0.14 will skip empty strings and NULLs in
 CONCAT_WS, and others will only skip NULLs.  The two kinds of behavior will
 produce different results if you have any columns containing the empty string
 in your table.  If you know you don't (for instance, all columns are
 integers), you can safely disable this check and you will get a reliable
 checksum even on servers with different behavior.


.. option:: --version

 group: Help

 Show version and exit.


.. option:: --wait

 short form: -w; type: time

 Wait this long for slaves to catch up to their master (implies :option:`--lock`
 :option:`--slave-lag`).

 Note: the best way to verify that a slave is in sync with its master is to use
 :option:`--replicate" instead.  The "--wait` option is really only useful if
 you're trying to compare masters and slaves without using :option:`--replicate`,
 which is possible but complex and less efficient in some ways.

 This option helps you get a consistent checksum across a master server and its
 slaves.  It combines locking and waiting to accomplish this.  First it locks the
 table on the master (the first server on the command line).  Then it finds the
 master's binlog position.  Checksums on slaves will be deferred until they reach
 the same binlog position.

 The argument to the option is the number of seconds to wait for the slaves to
 catch up to the master.  It is actually the argument to MASTER_POS_WAIT().  If
 the slaves don't catch up to the master within this time, they will unblock
 and go ahead with the checksum.  You can tell whether this happened by
 examining the STAT column in the output, which is the return value of
 MASTER_POS_WAIT().


.. option:: --where

 type: string

 Do only rows matching this \ ``WHERE``\  clause (disallows :option:`--algorithm` CHECKSUM).

 You can use this option to limit the checksum to only part of the table.  This
 is particularly useful if you have append-only tables and don't want to
 constantly re-check all rows; you could run a daily job to just check
 yesterday's rows, for instance.

 This option is much like the -w option to mysqldump.  Do not specify the WHERE
 keyword.  You may need to quote the value.  Here is an example:


 .. code-block:: perl

    pt-table-checksum --where "foo=bar"


.. option:: --[no]zero-chunk

 default: yes

 Add a chunk for rows with zero or zero-equivalent values.  The only has an
 effect when :option:`--chunk-size` is specified.  The purpose of the zero chunk
 is to capture a potentially large number of zero values that would imbalance
 the size of the first chunk.  For example, if a lot of negative numbers were
 inserted into an unsigned integer column causing them to be stored as zeros,
 then these zero values are captured by the zero chunk instead of the first
 chunk and all its non-zero values.


DSN OPTIONS
===========


These DSN options are used to create a DSN.  Each option is given like
\ ``option=value``\ .  The options are case-sensitive, so P and p are not the
same option.  There cannot be whitespace before or after the \ ``=``\  and
if the value contains whitespace it must be quoted.  DSN options are
comma-separated.  See the percona-toolkit manpage for full details.


  * ``A``

 dsn: charset; copy: yes

 Default character set.


  * ``D``

 dsn: database; copy: yes

 Default database.


  * ``F``

 dsn: mysql_read_default_file; copy: yes

 Only read default options from the given file


  * ``h``

 dsn: host; copy: yes

 Connect to host.


  * ``p``

 dsn: password; copy: yes

 Password to use when connecting.


  * ``p``

 dsn: port; copy: yes

 Port number to use for connection.


  * ``S``

 dsn: mysql_socket; copy: yes

 Socket file to use for connection.


  * ``u``

 dsn: user; copy: yes

 User for login if not current user.


ENVIRONMENT
===========


The environment variable \ ``PTDEBUG``\  enables verbose debugging output to ``STDERR``.
To enable debugging and capture all output to a file, run the tool like:


.. code-block:: perl

    PTDEBUG=1 pt-table-checksum ... > FILE 2>&1


Be careful: debugging output is voluminous and can generate several megabytes
of output.


SYSTEM REQUIREMENTS
===================


You need *Perl* , ``DBI``, ``DBD::mysql``, and some core packages that ought to be
installed in any reasonably new version of *Perl* .


BUGS
====


For a list of known bugs, see `http://www.percona.com/bugs/pt-table-checksum <http://www.percona.com/bugs/pt-table-checksum>`_.

Please report bugs at `https://bugs.launchpad.net/percona-toolkit <https://bugs.launchpad.net/percona-toolkit>`_.


AUTHORS
=======

*Baron Schwartz*


ACKNOWLEDGMENTS
===============

*Claus Jeppesen*, *Francois Saint-Jacques*, *Giuseppe Maxia*, *Heikki Tuuri*,
*James Briggs*, *Martin Friebe*, and *Sergey Zhuravlev*


COPYRIGHT, LICENSE, AND WARRANTY
================================


This program is copyright 2007-2011 *Baron Schwartz*, 2011 Percona Inc.
Feedback and improvements are welcome.


VERSION
=======

:program:`pt-table-checksum` 1.0.1