SQLParser parses common MySQL SQL statements into data structures. This parser is MySQL-specific and intentionally meant to handle only “common” cases. Although there are many limiations (like UNION, CASE, etc.), many complex cases are handled that no other free, Perl SQL parser at the time of writing can parse, notably subqueries in all their places and varieties.
This package has not been profiled and since it relies heavily on mildly complex regex, so do not expect amazing performance.
See SQLParser.t for examples of the various data structures. There are many and they vary a lot depending on the statment parsed, so documentation in this file is not exhaustive.
This package differs from QueryParser because here we parse the entire SQL statement (thus giving access to all its parts), whereas QueryParser extracts just needed parts (and ignores all the rest).
SQLParser | SQLParser parses common MySQL SQL statements into data structures. |
Variables | |
$quoted_ident | |
$unquoted_ident | |
$ident_alias | |
$table_ident | |
$column_ident | |
Functions | |
new | Create a SQLParser object. |
parse | Parse a SQL statment. |
_parse_clauses | Parse raw text of clauses into data structures. |
clean_query | Remove spaces, flatten, and normalize some patterns for easier parsing. |
normalize_keyword_spaces | Normalize spaces around certain SQL keywords. |
_parse_query | This sub is called by the parse_TYPE subs except parse_insert. |
parse_from | Parse a FROM clause, a.k.a. |
parse_identifiers | Parse an arrayref of identifiers into their parts. |
split_unquote | Split and unquote a table name. |
is_identifier | Determine if something is a schema object identifier. |
sub new
Create a SQLParser object.
%args | Arguments |
Schema | Schema object. Can be set later by calling <set_Schema()>. |
SQLParser object
sub parse
Parse a SQL statment. Only statements of $allowed_types are parsed. This sub recurses to parse subqueries.
$query | SQL statement |
A complex hashref of the parsed SQL statment. All keys and almost all values are lowercase for consistency. The struct is roughly:
{ type => '', # one of $allowed_types clauses => {}, # raw, unparsed text of clauses <clause> => struct # parsed clause struct, e.g. from => [<tables>] keywords => {}, # LOW_PRIORITY, DISTINCT, SQL_CACHE, etc. functions => {}, # MAX(), SUM(), NOW(), etc. select => {}, # SELECT struct for INSERT/REPLACE ... SELECT subqueries => [], # pointers to subquery structs }
It varies, of course, depending on the query. If something is missing it means the query doesn’t have that part. E.g. INSERT has an INTO clause but DELETE does not, and only DELETE and SELECT have FROM clauses. Each clause struct is different; see their respective parse_CLAUSE subs.
sub _parse_clauses
Parse raw text of clauses into data structures. This sub recurses to parse the clauses of subqueries. The clauses are read from and their data structures saved into the $struct parameter.
$struct | Hashref from which clauses are read (%{$struct->{clauses}}) and into which data structs are saved (e.g. $struct->{from}=...). |
This sub is called by the parse_TYPE subs except parse_insert. It does two things: remove, save the given keywords, all of which should appear at the beginning of the query; and, save (but not remove) the given clauses. The query should start with the values for the first clause because the query’s first word was removed in parse(). So for “SELECT cols FROM ...”, the query given here is “cols FROM ...” where “cols” belongs to the first clause “columns”. Then the query is walked clause-by-clause, saving each.
$query | SQL statement with first word (SELECT, INSERT, etc.) removed |
$keywords | Compiled regex of keywords that can appear in $query |
$first_clause | First clause word to expect in $query |
$clauses | Compiled regex of clause words that can appear in $query |
Hashref with raw text of clauses
Parse a FROM clause, a.k.a. the table references. Does not handle nested joins. See http://dev.mysql.com/doc/refman/5.1/en/join.html
$from | FROM clause (with the word “FROM”) |
Arrayref of hashrefs, one hashref for each table in the order that the tables appear, like:
{ name => 't2', -- table's real name alias => 'b', -- table's alias, if any explicit_alias => 1, -- if explicitly aliased with AS join => { -- if joined to another table, all but first -- table are because comma implies INNER JOIN to => 't1', -- table name on left side of join, if this is -- LEFT JOIN then this is the inner table, if -- RIGHT JOIN then this is outer table type => '', -- left, right, inner, outer, cross, natural condition => 'using', -- on or using, if applicable columns => ['id'], -- columns for USING condition, if applicable ansi => 1, -- true of ANSI JOIN, i.e. true if not implicit -- INNER JOIN due to following a comma }, }, { name => 't3', join => { to => 't2', type => 'left', condition => 'on', -- an ON condition is like a WHERE clause so where => [...] -- this arrayref of predicates appears, see -- <parse_where()> for its structure }, },
Parse an arrayref of identifiers into their parts. Identifiers can be column names (optionally qualified), expressions, or constants. GROUP BY and ORDER BY specify a list of identifiers.
$idents | Arrayref of indentifiers |
Arrayref of hashes with each identifier’s parts, depending on what kind of identifier it is.
Split and unquote a table name. The table name can be database-qualified or not, like `db`.`table`. The table name can be backtick-quoted or not.
$db_tbl | Table name |
$default_db | Default database name to return if $db_tbl is not database-qualified |
Array: unquoted database (possibly undef), unquoted table
Determine if something is a schema object identifier. E.g.: `tbl` is an identifier, but “tbl” is a string and 1 is a number. See http://dev.mysql.com
$thing | Name of something, including any quoting as it appears in a query. |
True of $thing is an identifier, else false.
my $quoted_ident
my $unquoted_ident
my $ident_alias
my $table_ident
my $column_ident
Create a SQLParser object.
sub new
Parse a SQL statment.
sub parse
Parse raw text of clauses into data structures.
sub _parse_clauses
Remove spaces, flatten, and normalize some patterns for easier parsing.
sub clean_query
Normalize spaces around certain SQL keywords.
sub normalize_keyword_spaces