TableChunker

TableChunker helps determine how to “chunk” a table.  Chunk are pre-determined ranges of rows defined by boundary values (sometimes also called endpoints) on numeric or numeric-like columns, including date/time types.  Any numeric column type that MySQL can do positional comparisons (<, <=, >, >=) on works.  Chunking on character data is not supported yet (but see issue 568).

Usually chunks range over all rows in a table but sometimes they only range over a subset of rows if an optional where arg is passed to various subs.  In either case a chunk is like “`col` >= 5 AND `col` < 10”.  If col is of type int and is unique, then that chunk ranges over up to 5 rows.

Chunks are included in WHERE clauses by various tools to do work on discrete chunks of the table instead of trying to work on the entire table at once.  Chunks do not overlap and their size is configurable via the chunk_size arg passed to several subs.  The chunk_size can be a number of rows or a size like 1M, in which case it’s in estimated bytes of data.  Real chunk sizes are usually close to the requested chunk_size but unless the optional exact arg is assed the real chunk sizes are approximate.  Sometimes the distribution of values on the chunk colun can skew chunking.  If, for example, col has values 0, 100, 101, ... then the zero value skews chunking.  The zero_chunk arg handles this.

Summary
TableChunkerTableChunker helps determine how to “chunk” a table.
Functions
new
find_chunk_columnsFind chunkable columns.
calculate_chunksCalculate chunks for the given range statistics.
_chunk_numericDetermine how to chunk a numeric column.
_chunk_numericDetermine how to chunk a character column.
_chunk_char
get_first_chunkable_columnGet the first chunkable column in a table.
size_to_rowsConvert a size in rows or bytes to a number of rows in the table, using SHOW TABLE STATUS.
get_range_statisticsDetermine the range of values for the chunk_col column on this table.
inject_chunksCreate a SQL statement from a query prototype by filling in placeholders.
value_to_number
range_num
range_time
range_date
range_datetime
range_timestamp
timestampdiff
get_valid_end_points
_get_valid_end_point
get_first_valid_value
_validate_temporal_value
get_nonzero_value
base_countCount to any number in any base with the given symbols.
_d

Functions

new

sub new

Parameters

$classTableChunker (automatic)
%argsArguments

Required Arguments

QuoterQuoter object
MySQLDumpMySQLDump object

find_chunk_columns

sub find_chunk_columns

Find chunkable columns.

Parameters

%argsArguments

Required Arguments

table_structHashref returned from TableParser::parse()

Optional Arguments

exactbool: Try to support exact chunk sizes (may still chunk fuzzily)

Returns

Array: whether the table can be chunked exactly if requested (zero otherwise), arrayref of columns that support chunking.  Example:

1,
[
  { column => 'id', index => 'PRIMARY' },
  { column => 'i',  index => 'i_idx'   },
]

calculate_chunks

sub calculate_chunks

Calculate chunks for the given range statistics.  Args min, max and rows_in_range are returned from get_range_statistics() which is usually called before this sub.  Min and max are expected to be valid values (NULL is valid).

Parameters

%argsArguments

Required Arguments

dbhdbh
dbdatabase name
tbltable name
tbl_structretval of TableParser::parse()
chunk_colcolumn name to chunk on
minmin col value, from TableChunker::get_range_statistics()
maxmax col value, from TableChunker::get_range_statistics()
rows_in_rangenumber of rows to chunk, from TableChunker::get_range_statistics()
chunk_sizerequested size of each chunk

Optional Arguments

exactUse exact chunk_size?  Use approximates is not.
triesFetch up to this many rows to find a non-zero value
chunk_rangeMake chunk range open (default) or openclosed

Returns

Array of WHERE predicates like “`col` >= ‘10’ AND `col` < ‘20’”, one for each chunk.  All values are single-quoted due to issue 1002.  Example:

`film_id` < '30',
`film_id` >= '30' AND `film_id` < '60',
`film_id` >= '60' AND `film_id` < '90',
`film_id` >= '90',

_chunk_numeric

sub _chunk_numeric

Determine how to chunk a numeric column.

Parameters

%argsArguments

Required Arguments

dbhdbh
dbdatabase name
tbltable name
tbl_structretval of TableParser::parse()
chunk_colcolumn name to chunk on
minmin col value, from TableChunker::get_range_statistics()
maxmax col value, from TableChunker::get_range_statistics()
rows_in_rangenumber of rows to chunk, from TableChunker::get_range_statistics()
chunk_sizerequested size of each chunk

Optional Arguments

exactUse exact chunk_size?  Use approximates is not.
triesFetch up to this many rows to find a non-zero value
zero_chunkAdd an extra chunk for zero values?  (0, 00:00, etc.)

Returns

Array of chunker info that calculate_chunks() uses to create chunks, like:

col             => quoted chunk column name
start_point     => start value (a Perl number)
end_point       => end value (a Perl number)
interval        => interval to walk from start_ to end_point (a Perl number)
range_func      => coderef to return a value while walking that ^ range
have_zero_chunk => whether to include a zero chunk (col=0)

_chunk_numeric

Determine how to chunk a character column.

Parameters

%argsArguments

Required Arguments

dbhdbh
dbdatabase name
tbltable name
tbl_structretval of TableParser::parse()
chunk_colcolumn name to chunk on
minmin col value, from TableChunker::get_range_statistics()
maxmax col value, from TableChunker::get_range_statistics()
rows_in_rangenumber of rows to chunk, from TableChunker::get_range_statistics()
chunk_sizerequested size of each chunk

Returns

Array of chunker info that calculate_chunks() uses to create chunks, like:

col             => quoted chunk column name
start_point     => start value (a Perl number)
end_point       => end value (a Perl number)
interval        => interval to walk from start_ to end_point (a Perl number)
range_func      => coderef to return a value while walking that ^ range

_chunk_char

sub _chunk_char

get_first_chunkable_column

sub get_first_chunkable_column

Get the first chunkable column in a table.  Only a “sane” column/index is returned.  That means that the first auto-detected chunk col/index are used if any combination of preferred chunk col or index would be really bad, like chunk col=x and chunk index=some index over (y, z).  That’s bad because the index doesn’t include the column; it would also be bad if the column wasn’t a left-most prefix of the index.

Parameters

%argsArguments

Required Arguments

tbl_structHashref returned by TableParser::parse()

Optional arguments

chunk_columnPreferred chunkable column name
chunk_indexPreferred chunkable column index name
exactbool: passed to find_chunk_columns()

Returns

List: chunkable column name, chunkable colum index

size_to_rows

sub size_to_rows

Convert a size in rows or bytes to a number of rows in the table, using SHOW TABLE STATUS.  If the size is a string with a suffix of M/G/k, interpret it as mebibytes, gibibytes, or kibibytes respectively.  If it’s just a number, treat it as a number of rows and return right away.

Parameters

%argsArguments

Required Arguments

dbhdbh
dbDatabase name
tblTable name
chunk_sizeChunk size string like “1000” or “50M”

Returns

Array: number of rows, average row size

get_range_statistics

sub get_range_statistics

Determine the range of values for the chunk_col column on this table.

Parameters

%argsArguments

Required Arguments

dbhdbh
dbDatabase name
tblTable name
chunk_colChunk column name
tbl_structHashref returned by TableParser::parse()

Optional arguments

whereWHERE clause without “WHERE” to restrict range
index_hint”FORCE INDEX (...)” clause
triesFetch up to this many rows to find a valid value

Returns

Array: min row value, max row value, rows in range

inject_chunks

sub inject_chunks

Create a SQL statement from a query prototype by filling in placeholders.

Parameters

%argsArguments

Required Arguments

databaseDatabase name
tableTable name
chunksArrayref of chunks from calculate_chunks()
chunk_numIndex into chunks to use
queryQuery prototype returned by TableChecksum::make_checksum_query()

Optional Arguments

index_hint”FORCE INDEX (...)” clause
whereArrayref of WHERE clauses joined with AND

Returns

A SQL statement

value_to_number

sub value_to_number

range_num

sub range_num

range_time

sub range_time

range_date

sub range_date

range_datetime

sub range_datetime

range_timestamp

sub range_timestamp

timestampdiff

sub timestampdiff

get_valid_end_points

sub get_valid_end_points

_get_valid_end_point

sub _get_valid_end_point

get_first_valid_value

sub get_first_valid_value

_validate_temporal_value

sub _validate_temporal_value

get_nonzero_value

sub get_nonzero_value

base_count

sub base_count

Count to any number in any base with the given symbols.  E.g. if counting to 10 in base 16 with symbols 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f the result is “a”.  This is trival for stuff like base 16 (hex), but far less trivial for arbitrary bases with arbitrary symbols like base 25 with symbols B,C,D,...X,Y,Z.  For that, counting to 10 results in “L”.  The base and its symbols are determined by the character column.  Symbols can be non-ASCII.

Parameters

%argsArguments

Required Arguments

count_toNumber to count to
baseBase of special system
symbolsArrayref of symbols for “numbers” in special system

Returns

The “number” (symbol) in the special target base system

_d

sub _d
sub new
sub find_chunk_columns
Find chunkable columns.
sub calculate_chunks
Calculate chunks for the given range statistics.
sub _chunk_numeric
Determine how to chunk a numeric column.
sub _chunk_char
sub get_first_chunkable_column
Get the first chunkable column in a table.
sub size_to_rows
Convert a size in rows or bytes to a number of rows in the table, using SHOW TABLE STATUS.
sub get_range_statistics
Determine the range of values for the chunk_col column on this table.
sub inject_chunks
Create a SQL statement from a query prototype by filling in placeholders.
sub value_to_number
sub range_num
sub range_time
sub range_date
sub range_datetime
sub range_timestamp
sub timestampdiff
sub get_valid_end_points
sub _get_valid_end_point
sub get_first_valid_value
sub _validate_temporal_value
sub get_nonzero_value
sub base_count
Count to any number in any base with the given symbols.
sub _d
Quoter handles value quoting, unquoting, escaping, etc.
MySQLDump gets CREATE TABLE defs from MySQL.
sub parse
Parse SHOW CREATE TABLE.
sub make_checksum_query
Close