First finished draft of new pt-upgrade docs/specs/vision.

2026-04-17 01:01:39 +08:00 · 2013-01-05 15:57:51 -07:00
parent eb77847702
commit a511c7e995
1 changed files with 273 additions and 51 deletions
--- a/bin/pt-upgrade
+++ b/bin/pt-upgrade
@@ -12510,38 +12510,69 @@ and reports any negative or signficant changes.  The two servers are
 typically development servers, one running the current production
 version of MySQL and the other running the new version of MySQL.
 =head1 USE CASES
 The tool has two use cases.  The first, canonical case is running "host
 to host".  A log file and two DSN are given on the command line, one for
 each MySQL server.  See the first example in the L<"SYNOPSIS">.  Queries
 are executed and compared on each server as the tool runs.  Any queries
 with differences are saved and reported when the tool finishes.  Unless
 interrupted, nothing is saved except the final report. -- This use case
-requiers less hard disk space, but the queries must be executed on both
+requires less hard disk space, but the queries must be executed on both
-servers if the tool is ran again.  If there are a lot of queries, or
+servers if the tool is ran again, even if one of the servers hasn't
-executing them all takes a long time, and one server doesn't change,
+changed.  If there are a lot of queries or executing them all takes a
-you may want to use the second use case.
+long time, and one server doesn't change, you may want to use the second
 use case.
-The second use case is running "host to reference results".  Reference results
+The second use case is running "host to reference results".  Reference
-are the complete results from a single MySQL server, saved to hard disk.
+results are the complete results from a single MySQL server, saved to
-In this case, you must first generate the reference results, then run
+hard disk.  In this case, you must first generate the reference results,
-the tool a second time with the reference results and the other MySQL
+then run the tool a second time to compare another MySQL server to the
-server.  See the 2nd example in the L<"SYNOPSIS">.  Reference results
+reference results.  See the second example in the L<"SYNOPSIS">.  Reference
-are typically generated for the current version of MySQL which doesn't
+results are typically generated for the current version of MySQL which
-change. -- This use case can require I<a lot> of hard disk space because
+doesn't change. -- This use case can require I<a lot> of hard disk space
-the results (i.e. data rows) from all unique queries must be saved, plus
+because the results (i.e. data rows) from all unique queries must be saved,
-other data about the queries.  If you plan to do many comparisons against
+plus other data about the queries.  If you plan to do many comparisons
-a fixed version of MySQL, this use case is more efficient.  Or if you don't
+against a fixed version of MySQL, this use case is more efficient.  Or if
-have access to both servers at the same time, this use case allows you to
+you don't have access to both servers at the same time, this use case
-"execute now, compare later".
+allows you to "execute now, compare later".
-=head1 IMPORTANT THINGS TO CONSIDER
+=head1 IMPORTANT CONSIDERATIONS
 =head2 CONSISTENCY
 Consistent environments and consistent data are crucial for obtaining
 an accurate report.  pt-upgrade should never be ran on a production
 server or any active server because there is no easy way to ensure
 a synchronous read for each query.  If data is changing on either server
 while pt-upgrade is running, the report could contain more false-positives
 than legitimate changes.  B<pt-upgrade assumes that both MySQL servers
 are static, unchanging (except for any changes made by the tool if ran
 with C<--no-read-only>).>  A read-only workload shouldn't affect the tool,
 except maybe query times, so read-only slaves could be used.
 =head2 COMPARED TO
 The first DSN or reference results is compared to the second DSN.
 Phrases like "or smaller" and "or better" mean the first DSN or reference
 results compared to the second DSN.  Therefore, the first DSN or reference
 results must be the current version of MySQL to which the new (or old,
 if downgrading) version of MySQL is being compared.
 For the query time comparison, for example, if the first DNS or reference
 results value is C<0.01> and the second DSN is C<0.5>, that is a negative
 change that will be reported.  But the reverse is a positive change because
 the query is C<0.49> seconds faster on the second host, so it will not be
 reported.
 =head2 READ-ONLY
 By default, pt-upgrade only executes C<SELECT> and C<SET> statements.
 If you're using recreatable test or development servers and wish to
 compare write statements too (e.g. C<INSERT>, C<UPDATE>, C<DELETE>),
-then specify C<--no-read-only>.  See L<"--[no]read-only">.
+then specify C<--no-read-only>.  If using a binary log, you must
 specify C<--no-read-only> because binary logs don't contain C<SELECT>
 statements.  See L<"--[no]read-only">.
 =head2 TRANSACTIONS
@@ -12561,35 +12592,10 @@ on dedicated testing or development servers.  B<Do not run pt-upgrade
 on production servers!>  Consequently, the tool is CPU, memory, disk, and
 network intensive.  It executes queries as fast as possible.
-=head1 REPORT
+=head1 QUERY CHANGES
-The final report (L<"--save-report">) is a human-readable text file that
+Negative or signficant query changes are determined by comparing the
-details the queries that have negative or signficant changes.  To prevent
+following aspects of each query from both hosts:
 the report from becoming too long, queries are grouped by fingerprint into
 classes.  A query fingerprint is the abstracted form of a query, created by
 removing literal values, normalizing whitespace, etc.  So these queries
 belong to the same class:
   SELECT c FROM t WHERE id = 1
   SELECT c FROM t WHERE id=5
   select  c  from  t  where  id  =  9
 The fingerprint for those queries is:
   select c from t where id=?
 Each query class can have up to L<"--max-query-class-size"> unique queries
 (1,000 by default), but only up to 3 queries are reported.  If all queries
 have the same change (for example, they all have a change in row count),
 then only one query is reported for the class.  But if there are multiple
 changes, then up to 3 queries with different changes are reported.  By
 virtue of being in the same class, one query's change is usually the same
 and representative of all queries in the class.
 =head1 COMPARISONS
 The following aspects of each query from both hosts are compared to determine
 any negative or signficant changes to report:
 =over
@@ -12599,16 +12605,18 @@ The number of rows returned by the query should be the same.
 =item Row data
-The row data returned by the query should be the same.
+The row data returned by the query should be the same.  All changes are
 significant: whitespace, L<"--float-precision">, etc.
-=item Errors and warnings
+=item Warnings
 The query should either not produce any errors or warnings, or produce
 the same errors or warnings.
 =item Query time
-The query execution time should be roughly the same or better.
+A query rarely executes with a constant time, but its execution time
 should be within the same order of magnitude or smaller.
 =item Query plan
@@ -12616,6 +12624,214 @@ The query execution plan (C<EXPLAIN>) should be roughly the same or better.
 =back
 =head1 REPORT
 The final report (L<"--save-report">) is a human-readable text file that
 details the queries with changes (see L<"QUERY CHANGES">).  To prevent
 the report from becoming too long, queries are not reported individually
 but grouped by fingerprint into classes.  A query fingerprint is the
 abstracted form of a query, created by removing literal values, normalizing
 whitespace, etc.  So these queries belong to the same class:
   SELECT c FROM t WHERE id = 1
   SELECT c FROM t WHERE id=5
   select  c  from  t  where  id  =  9
 The fingerprint for those queries is:
   select c from t where id=?
 Each query class can have up to L<"--max-class-size"> unique queries
 (1,000 by default).  Up to L<"--max-change-examples"> are reported for each
 type of change, per query class.  By virtue of being in the same class,
 one query's change is usually representative of all queries with the same
 change, so it's not necessary to report every example.  The total number
 of queries in a class with a particular change is indicated in the report.
 =head2 EXAMPLE
 A report begins with the following three sections:
  #######################################################################
  # Summary
  #######################################################################
  Report                  pt-upgrade-report.1357416568.txt
  Date                    Sat Jan  5 13:15:05 MST 2013
  Log                     /var/lib/mysql/slow.log
  Run time                00:01:00
  Completed               No, 70% complete
  Exit status             1 (--run-time expired)
  #######################################################################
  # Hosts
  #######################################################################
  DSN                     (Reference results) 
  Hostname                foo.domain.com
  MySQL version           5.0.95
  Reference results       Yes, ~/host1/refres
  compared to:
  DSN                     h=127.1,P=12346
  Hostname                bar.domain.com
  MySQL version           5.5.6
  #######################################################################
  # Counters
  #######################################################################
  queries_read            900
  queries_filtered        5
  unique_queries          300
  queries_with_changes    10
  queries_no_changes      290
  query_classes           24
  class_size_exceeded     1
  lost_connection         1
  lock_wait_timeout       1
 The "Summary" section is a summary of the report and run.  The "Hosts"
 section lists which hosts which were compared.  The "Counters" section lists
 values that give an idea of how effective the run was.
 A section for each query class with changes follows, like:
  #######################################################################
  # Query class 1 of 24
  #######################################################################
  Class ID           D7D2F2B7AB4602A4
  Total queries      10
  Unique queries     5
  Discarded queries  0
  select * from t where id in (?)
  ##
  ## Row count changes: 2
  ##
  --- 1.
  3 vs. 2 (-1) rows
  SELECT * FROM t WHERE id IN (1,2,3);
  --- 2.
  3 vs. 1 (-2) rows
  SELECT * FROM t WHERE id IN (10,11,12);
 The first part of a query class report lists the query class ID and counts
 of queries in the class.  The query class ID can be used to L<"--filter">
 and compare only queries in the class on subsequent runs of the tool.  The
 "Total queries" count is the total number of queries that belong to the class
 before duplicates and L<"--max-class-size">.  The "Unique queries"
 count is the number of unique queries in the class; it cannot exceed
 L<"--max-class-size">.  The "Discarded queries" count is the number
 of unique queries discarded due to L<"--max-class-size">.
 The second part of a query class report lists the the fingerprint which
 defines the class.
 The rest of a query class report lists the L<"QUERY CHANGES"> that caused
 the class to be reported.  Each type of change begins with a double hash
 mark header that lists the type and total number of queries in the class
 with the change.  Then up to L<"--max-change-examples"> are listed, numbered
 "-- 1.", "--- 2.", etc.  Each example lists the change (differently
 depending on the type of change) for the first and second hosts (respective
 to the "Hosts" section), followed by a verbatim SQL statement from the C<LOG>
 that should demonstrate the change if executed on both hosts again.
 Here are examples of other changes (without a query class header or the
 first two parts of the query class report):
  ##
  ## Row data changes: 1
  ##
  --- 1.
  col1, col2
  < foo    bar
  ---
  > foox   bar
  SELECT col1, col2 FROM t WHERE id=5;
  ##
  ## Warnings changes: 5
  ##
  --- 1.
  No warnings
  vs.
    Level: Warning
     Code: 1265
  Message: Data truncated for column 'b' at row 1
  INSERT INTO t (b) VALUES ('Hello, world!');
  ##
  ## Query time changes: 50
  ##
  --- 1.
  0.01 vs. 0.5 (+0.49) seconds
  SELECT * FROM a JOIN b ON (id) WHERE a.ts < 555555555;
  --- 2.
  0.04 vs. 0.8 (+0.76) seconds
  SELECT * FROM a JOIN b ON (id) WHERE a.ts < 123456789;
  --- 3.
  0.04 vs. 0.5 (+0.46) seconds  
  SELECT * FROM a JOIN b ON (id) WHERE a.ts < 123456789;
  ##
  ## Query plan changes: 1
  ##
  --- 1.
             id: 1
    select_type: SIMPLE
          table: city
           type: ref 
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 2
            ref: NULL
           rows: 550
          Extra: Using where; Using index
  vs.
             id: 1
    select_type: SIMPLE
          table: city
           type: range
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 2
            ref: NULL
           rows: 214
          Extra: Using where; Using index
  EXPLAIN SELECT city_id FROM sakila.city WHERE city_id > 10\G
 =head1 OUTPUT
 Status information is printed to C<STDOUT> as the tool runs.  L<"--progress">
@@ -12721,11 +12937,17 @@ type: string
 Print all output to this file when daemonized.  This option has no effect
 unless L<"--daemonize"> is used.
-=item --max-query-class-size
+=item --max-change-examples
 type: int; default: 3
 Max number of examples to list for each type of query change.
 =item --max-class-size
 type: int; default: 1000
-Maximum number of unique queries in each query class.  See L<"RERPOT">.
+Max number of unique queries in each query class.  See L<"REPORT">.
 =item --password