It was using maps in each context, which would be merged between
contexts, then injected each time we needed a message to display.
It had a limitation on complicated operator setups: historical
information would be overriden by newer associations.
(e.g, that IP was for node0 yesterday, now it's node1, so associations
have been overwritten and incorrect)
It also introduced complexity, such as forcing to define closures too
many times, merging maps, it would be harder to debug, and every files
were starting from empty translation maps.
Moreover, iterating on maps is guaranteed to be random so it could create
hard-to-debug output variations on complex cases.
Now it is a singleton in translate package, still using maps but now it
associates an array of "units" storing the timestamp with each piece of information.
It is protected by rwmutex, because map are not threadsafe. (there's no
parallel processing for now)
No regressions, and it passes "operator_ambiguous_ips_list_all_no_color"
where the old system failed.
It nows also can be used as an easy to read source of information in
itself
It existed for non-operator setup, but was not working for operators due
to k8s logs not interpreting newlines and tabs
This operator version re-uses existing regular regex handlers directly
It must test multiple times to remove doubts.
As the tool is reading files and relying on maps, their access order are
random. It can impact some translations
When adding "ownip", it was also propagating the new IP to the old hash.
But with operators, when IP where changing hash will also change, so
linking the new IP to old hash is anachronic. It is not wrong, but
depending on the order of map merges, the newest information could have
been overriden depending on the order of events.
That situation was producing X(2*number of conflicts) versions of output for operators, with
different md5sum, which could produce false-positive regression tests
So currently some information are not linked anymore, so some IPs are
not translated even though they could, but it is a limitation of using
maps as source of truth, as they are not versioned
- Changed getIndividualFiles function and Dumper data structure, so we
can specify container name for PXC and other operators which store
logs in the separate container.
- Added option darwin-arm64 into Go tools Makefile
- Changed --delimiter option to its short version that works on all
platforms
- Changed "Mongo tools" comments to "Go tools", because now Go tools are
not only for Mongo
It is a thing: 2 nodes joining at the same time, with 2 JOINERs and 2
DONORs cluster-wide
It can happen on operators with 2 garbd joining at the same time
Before, pt-galera-log-explainer was using SST metadata naively.
Basically if a node was DONOR and we found a "transfer completed"
message, we assumed the donor name we found is the correct one.
So for concurrent SSTs, donors were swapping names.
Now, it is handled by a map, indexed by a donor name. To know if a node
is actual donor or not, it now compare timestamps of events. It assumes
both "selected donor" and "shifting DONOR" messages should have happen
in less than 0.01 secs to avoid any conflict.
Regression tests coming in next commit with an operator logs having
concurrent SSTs. Another conflicts was sometimes breaking the test
depending on the order on which we read files, hence why it's not added
here yet