Commit Graph

17 Commits

Author SHA1 Message Date
Yoann La Cancellera
51ce07ce22 Merge pull request #750 from ylacancellera/PT-2298_pt-galera-log-explainer_superfluous_lines_on_conflicts
PT-2298 pt-galera-log-explainer superfluous lines on conflicts
2024-01-26 12:36:24 +01:00
Yoann La Cancellera
3bdc4b95ba pt-galera-log-explainer: refactor conflict response 2024-01-26 12:25:45 +01:00
Yoann La Cancellera
7dfcbc1ec5 PT-2307 include last inactive check mesages on pt-galera-log-explainer 2024-01-18 11:27:48 +01:00
Yoann La Cancellera
97cc3ebb01 PT-2298 pt-galera-log-explainer superfluous lines on conflicts 2024-01-08 18:37:57 +01:00
Yoann La Cancellera
955fd75ca9 Rename ctx to logCtx, remove any mention of ctx 2023-12-22 23:10:34 +03:00
Yoann La Cancellera
d6d4d30283 Add: parallel on 2 unit tests 2023-12-22 23:10:34 +03:00
Yoann La Cancellera
3fae43123e Fix: typos 2023-12-22 23:10:34 +03:00
Yoann La Cancellera
7876a0511c Remove old comments, dead code 2023-12-22 23:10:34 +03:00
Yoann La Cancellera
2091c1a1f0 Refactoring: migrate translations to singleton
It was using maps in each context, which would be merged between
contexts, then injected each time we needed a message to display.

It had a limitation on complicated operator setups: historical
information would be overriden by newer associations.
(e.g, that IP was for node0 yesterday, now it's node1, so associations
have been overwritten and incorrect)

It also introduced complexity, such as forcing to define closures too
many times, merging maps, it would be harder to debug, and every files
were starting from empty translation maps.
Moreover, iterating on maps is guaranteed to be random so it could create
hard-to-debug output variations on complex cases.

Now it is a singleton in translate package, still using maps but now it
associates an array of "units" storing the timestamp with each piece of information.
It is protected by rwmutex, because map are not threadsafe. (there's no
parallel processing for now)

No regressions, and it passes "operator_ambiguous_ips_list_all_no_color"
where the old system failed.
It nows also can be used as an easy to read source of information in
itself
2023-12-22 23:10:32 +03:00
Yoann La Cancellera
e56fc45a05 Add: inconsistent vote regex corner-case 2023-12-22 23:10:08 +03:00
Yoann La Cancellera
246f875ed9 Add: shortuuid check, new date layout found 2023-12-22 23:10:08 +03:00
Yoann La Cancellera
72fbe7496c Add: operator member assocations regex
It existed for non-operator setup, but was not working for operators due
to k8s logs not interpreting newlines and tabs
This operator version re-uses existing regular regex handlers directly
2023-12-22 23:10:08 +03:00
Yoann La Cancellera
208708a58b Add: concurrent SSTs handling
It is a thing: 2 nodes joining at the same time, with 2 JOINERs and 2
DONORs cluster-wide
It can happen on operators with 2 garbd joining at the same time

Before, pt-galera-log-explainer was using SST metadata naively.
Basically if a node was DONOR and we found a "transfer completed"
message, we assumed the donor name we found is the correct one.
So for concurrent SSTs, donors were swapping names.

Now, it is handled by a map, indexed by a donor name. To know if a node
is actual donor or not, it now compare timestamps of events. It assumes
both "selected donor" and "shifting DONOR" messages should have happen
in less than 0.01 secs to avoid any conflict.

Regression tests coming in next commit with an operator logs having
concurrent SSTs. Another conflicts was sometimes breaking the test
depending on the order on which we read files, hence why it's not added
here yet
2023-11-07 18:08:24 +01:00
Yoann La Cancellera
117d683872 Change: simplify verbose mode 2023-11-07 18:08:24 +01:00
Yoann La Cancellera
161c2af084 Fix: pointer dereference if votes was missing 2023-11-07 18:08:24 +01:00
Yoann La Cancellera
6d6f30372c Fix: error msg with uppercase, usage missed pt- 2023-11-07 18:08:24 +01:00
Yoann La Cancellera
b4cad31e77 Add pt-galera-log-explainer 2023-11-07 18:08:19 +01:00