* PT-1059 - Tools can't parse index names containing newlines
Fixed regular expressions in TableParser.
Added test case, including test for new lines in the column name
* PT-1059 - Tools can't parse index names containing newlines
Disabled pt-1637.t until PT-2174 is fixed.
Updated number of tables in b/t/pt-table-checksum/issue_1485195.t
* Patch newlines in table columns (#369)
Will accept this change as part of the fix for PT-1059 - Tools cannot parse index names containing new lines. We will later fix the issue with the patch ourselves.
mysql 5.6.40 allows newlines in column names however the following code:
my @defs = $ddl =~ m/^(\s+`.*?),?$/gm;
breaks due to it detecting newlines as line ends. The 'm' argument at the end does this by auto-detecting lines by newline characters.
To correct this issue I've made use of zero-length assertions known as " positive lookback"
https://www.regular-expressions.info/lookaround.html
what does it do?
m/(?:(?<=,\n)|(?<=\(\n))(\s+`(?:.|\n)+?`.+?),?\n/g;
TLDR:
Treat the string as one long string and don't treat \n as the end of a line.
look for (\s+`(?:.|\n)+?`.+?),?\n
if one of those matches look at what precedes the string
if it's ',\n' or ')\n' the string matches. Only save what's in (\s+`(?:.|\n)+?`.+?),?\n
m/ is declaring this a matching regex.
(?:(?<=,\n)|(?<=(\n)) This is an OR statement including two look-behind clauses. The ?: tells the enclosing parentheses to not store the result as a variable. I've put the two look-behinds in this OR statement below this line:
(?<=,\n) Look behind the matched string for a comma followed by a newline, the comma must be there for this look behind to match.
(?<=(\n) Look behind the matched string for a open parentheses followed by a newline, the open parentheses must be there.
(\s+`(?:.|\n)+?`.+?),?\n This is the actual match. Match newline character followed by one or more spaces followed by back-tick followed by a character which can be any character or a newline one or more times, but don't be greedy and take the rest of the match into consideration. Followed by a back tick and any character one or more times. This match stops where there is a comma or failing that a newline following a back tick and some characters.
,?\n match a comma that may not be there followed by a newline.
/g don't stop if this pattern matches keep looking for more patterns to the end of the string.
* PT-1059 - Tools can't parse index names containing newlines
Placed fix from PR-369 into proper place and created test case for this fix.
---------
Co-authored-by: geneguido <31323560+geneguido@users.noreply.github.com>
* PMM-1914 Fixed column parsing having generated
Fixed table parser code that errouneously considered a column as
generated when the default was empty DEFAULT '' and the COMMENT had
the word 'Generated'.
* PMM-1914 Updated TableParser in all programs
* PT-1914 Updated changelog
* PT-1914 Added test
TableParser's parse function was failing while trying to lowercase
column names in the provided 'SHOW CREATE TABLE'.
The problem was it was trying to lowercase everything between backticks
but lines like these:
`field_name` int comment "here is a ` in the comment"
`second_field_name` int
made the original regex to fail, matching `in the coment"` as an
expression to be lowercased while second_file_name was considered as
outside backticks.