Files
percona-toolkit/src/go/pt-galera-log-explainer
Sveta Smirnova bf3d2ffab4 Make documentation for Go tools to use symlinks to README.rst
- Currently, documentation for Go tools is duplicated in the user manual
  and README files. This PR makes README files single source of truth
and creates a symlincs inside docs directory.
- Fixed link to the forum in
  config/sphinx-build/_static/_templates/theme/relbar2.html (previously
mentioned PSMDB forum instead of Percona Toolkit's)
- Updated 3.6.0 release date in docs/percona-toolkit.pod
- Updated documentation for option --version in docs/percona-toolkit.pod
2024-09-04 00:37:32 +03:00
..
2023-12-22 23:10:34 +03:00
2024-03-18 18:20:43 +01:00
2023-12-22 23:10:34 +03:00
2023-11-07 18:08:19 +01:00
2023-12-22 23:10:34 +03:00

.. _pt-galera-log-explainer:

==================================
:program:`pt-galera-log-explainer`
==================================

Filter, aggregate and summarize multiple galera logs together.
This is a toolbox to help navigating Galera logs.

Usage
=====

.. code-block:: bash

   pt-galera-log-explainer [--since=] [--until=] [-vv] [--merge-by-directory] [--pxc-operator] <command> <paths ...>


Commands available
==================

list
~~~~

.. code-block:: bash

    pt-galera-log-explainer [flags] list { --all | [--states] [--views] [--events] [--sst] [--applicative] } <paths ...>

List key events in chronological order from any number of nodes (sst, view changes, general errors, maintenance operations)
It will aggregates logs together by identifying them using node names, IPs and internal Galera identifiers. 



It can be from a single node:

.. code-block:: bash

    pt-galera-log-explainer list --all --since 2023-01-05T03:24:26.000000Z /var/log/mysql/*.log

or from multiple nodes.

.. code-block:: bash

    pt-galera-log-explainer list --all *.log

You can filter by type of events

.. code-block:: bash

    pt-galera-log-explainer list --sst --views *.log

whois
~~~~~
Find out information about nodes, using any type of information

.. code-block:: bash

    pt-galera-log-explainer [flags] whois [--json] [--type { nodename | ip | uuid | auto }] <information to search> <paths ...>


.. code-block:: bash

    pt-galera-log-explainer whois '218469b2' mysql.log
    pt-galera-log-explainer whois '172.17.0.3' mysql.log
    pt-galera-log-explainer whois 'galera-node2' mysql.log


conflicts
~~~~~~~~~

List every replication failure votes (Galera 4)

.. code-block:: bash

    pt-galera-log-explainer conflicts [--json|--yaml] *.log

ctx
~~~

Get the tool crafted context for a single log.
It will contain everything the tool extracted from the log file: version, sst information, known uuid-ip-nodename mappings, ...

.. code-block:: bash

    pt-galera-log-explainer ctx mysql.log

regex-list
~~~~~~~~~~

Will print every implemented regexes:
* regex: the regex that will be used against the log files
* internalRegex: the golang regex that will be used to extract piece of information
* type: the regex group it belong to
* verbosity: the required level of verbosity to which it will be printed

.. code-block:: bash

    pt-galera-log-explainer regex-list

Available flags
~~~~~~~~~~~~~~~

``-h``, ``--help``               
    Show help and exit.

``--no-color``
    Remove every color special characters 

``--since``        
    Only list events after this date. It will affect the regex applied to the logs.
    Format: 2023-01-23T03:53:40Z (RFC3339)

``--until``
    Only list events before this date. This is only implemented in the tool loop, it does not alter regexes.
    Format: 2023-01-23T03:53:40Z (RFC3339)

``--merge-by-directory``
    Instead of relying on extracted information, logs will be merged by their base directory 
    It is useful when logs are very sparse and already organized by nodes.

``--skip-merge``
    Disable the ability to merge log files together. Can be used when every nodes have the same ``wsrep_node_name``

``-v``, ``--verbosity``        
    ``-v``: display in the timeline every mysql info the tool used
    ``-vv``: internal tool debug

``--pxc-operator``       
    Analyze logs from Percona PXC operator. Operator logs should be automatically detected (see ``--skip-operator-detection``).
    It will prevent logs from being merged together, add operator specific regexes, and fine-tune regexes for logs taken from ``pt-k8s-debug-collector``.
    Off by default because it negatively impacts performance for non-k8s setups.

``--skip-operator-detection``
    Disable automatic detection of PXC operator logs. When detected, a message will be shown.
    Detection is done using a prefix regex.

``--exclude-regexes``
    Remove regexes from analysis. Use ``pt-galera-log-explainer regex-list | jq .`` to have the list
    
``--grep-cmd``
    grep v3 binary command path. For Darwin systems, it could need to be set to ``ggrep``
    Default: ``grep``

``--version``
    Show version and exit.

``--custom-regexes``
    Add custom regexes, printed in magenta. Format: (golang regex string)=[optional static message to display].
    If the static message is left empty, the captured string will be printed instead. Custom regexes are separated using semi-colon.
    Example: ``--custom-regexes="Page cleaner took [0-9]*ms to flush [0-9]* pages=;doesn't recommend.*pxc_strict_mode=unsafe query used"``


Example outputs
===============

.. code-block:: bash

    $ pt-galera-log-explainer list --all --no-color --since=2023-03-12T19:41:28.493046Z --until=2023-03-12T19:44:59.855491Z tests/logs/upgrade/*
    identifier                    172.17.0.2                                 node2                                   tests/logs/upgrade/node3.log            
    current path                  tests/logs/upgrade/node1.log               tests/logs/upgrade/node2.log            tests/logs/upgrade/node3.log            
    last known ip                 172.17.0.2                                                                                                                 
    last known name                                                          node2                                                                           
    mysql version                 8.0.28                                                                                                                     
                                                                                                                                                             
    2023-03-12T19:41:28.493046Z   starting(8.0.28)                           |                                       |                                       
    2023-03-12T19:41:28.500789Z   started(cluster)                           |                                       |                                       
    2023-03-12T19:43:17.630191Z   |                                          node3 joined                            |                                       
    2023-03-12T19:43:17.630208Z   node3 joined                               |                                       |                                       
    2023-03-12T19:43:17.630221Z   node2 joined                               |                                       |                                       
    2023-03-12T19:43:17.630243Z   |                                          node1 joined                            |                                       
    2023-03-12T19:43:17.634138Z   |                                          |                                       node2 joined                            
    2023-03-12T19:43:17.634229Z   |                                          |                                       node1 joined                            
    2023-03-12T19:43:17.643210Z   |                                          PRIMARY(n=3)                            |                                       
    2023-03-12T19:43:17.648163Z   |                                          |                                       PRIMARY(n=3)                            
    2023-03-12T19:43:18.130088Z   CLOSED -> OPEN                             |                                       |                                       
    2023-03-12T19:43:18.130230Z   PRIMARY(n=3)                               |                                       |                                       
    2023-03-12T19:43:18.130916Z   OPEN -> PRIMARY                            |                                       |                                       
    2023-03-12T19:43:18.904410Z   will receive IST(seqno:178226792)          |                                       |                                       
    2023-03-12T19:43:18.913328Z   |                                          |                                       node1 cannot find donor                 
    2023-03-12T19:43:18.913429Z   node1 cannot find donor                    |                                       |                                       
    2023-03-12T19:43:18.913565Z   |                                          node1 cannot find donor                 |                                       
    2023-03-12T19:43:19.914122Z   |                                          |                                       node1 cannot find donor                 
    2023-03-12T19:43:19.914259Z   node1 cannot find donor                    |                                       |                                       
    2023-03-12T19:43:19.914362Z   |                                          node1 cannot find donor                 |                                       
    2023-03-12T19:43:20.914957Z   |                                          |                                       (repeated x97)node1 cannot find donor   
    2023-03-12T19:43:20.915143Z   (repeated x97)node1 cannot find donor      |                                       |                                       
    2023-03-12T19:43:20.915262Z   |                                          (repeated x97)node1 cannot find donor   |                                       
    2023-03-12T19:44:58.999603Z   |                                          |                                       node1 cannot find donor                 
    2023-03-12T19:44:58.999791Z   node1 cannot find donor                    |                                       |                                       
    2023-03-12T19:44:58.999891Z   |                                          node1 cannot find donor                 |                                       
    2023-03-12T19:44:59.817822Z   timeout from donor in gtid/keyring stage   |                                       |                                       
    2023-03-12T19:44:59.839692Z   SST error                                  |                                       |                                       
    2023-03-12T19:44:59.840669Z   |                                          |                                       node2 joined                            
    2023-03-12T19:44:59.840745Z   |                                          |                                       node1 left                              
    2023-03-12T19:44:59.840933Z   |                                          node3 joined                            |                                       
    2023-03-12T19:44:59.841034Z   |                                          node1 left                              |                                       
    2023-03-12T19:44:59.841189Z   NON-PRIMARY(n=1)                           |                                       |                                       
    2023-03-12T19:44:59.841292Z   PRIMARY -> OPEN                            |                                       |                                       
    2023-03-12T19:44:59.841352Z   OPEN -> CLOSED                             |                                       |                                       
    2023-03-12T19:44:59.841515Z   terminated                                 |                                       |                                       
    2023-03-12T19:44:59.841529Z   former SST cancelled                       |                                       |                                       
    2023-03-12T19:44:59.848349Z   |                                          |                                       node1 left                              
    2023-03-12T19:44:59.848409Z   |                                          |                                       PRIMARY(n=2)                            
    2023-03-12T19:44:59.855443Z   |                                          node1 left                              |                                       
    2023-03-12T19:44:59.855491Z   |                                          PRIMARY(n=2)                            |                        

    $ pt-galera-log-explainer whois 172.17.0.2 --no-color  tests/logs/upgrade/*
    ip:
    └── 172.17.0.2
        ├── nodename:
        │   └── node1 (2023-03-12 19:35:07.644683 +0000 UTC)
        │
        └── uuid:
            ├── 1d3ea8f5 (2023-03-12 07:24:13.789261 +0000 UTC)
            ├── 54ab931e (2023-03-12 07:43:08.563339 +0000 UTC)
            ├── fecde235 (2023-03-12 08:46:48.963504 +0000 UTC)
            ├── a07872e1 (2023-03-12 08:49:41.206124 +0000 UTC)
            ├── 60da0bf9-aa9c (2023-03-12 12:29:48.873397 +0000 UTC)
            ├── 35b62086-902c (2023-03-12 13:04:23.979636 +0000 UTC)
            ├── ca2c2a5f-a82a (2023-03-12 19:35:05.878879 +0000 UTC)
            └── eefb9c8a-b69a (2023-03-12 19:43:17.133756 +0000 UTC)



Requirements
============

grep, version 3
On Darwin based OS, grep is only version 2 due to license limitations. --grep-cmd can be used to point the correct grep binary, usually ggrep


Compatibility
=============

* Percona XtraDB Cluster: 5.5 to 8.0
* MariaDB Galera Cluster: 10.0 to 10.6
* logs from PXC operator pods (error.log, recovery.log, post.processing.log)

Known issues
============

* Nodes sharing the same ip, or nodes with identical names are not supported
* Sparse files identification can be missed, resulting in many columns displayed. ``--merge-by-directory`` can be used, but files need to be organized already in separate directories
  This is mainly when the log file does not contain enough information.
* Some information will seems missed. Depending on the case, it may be simply unimplemented yet, or it was disabled later because it was found to be unreliable (node index numbers are not reliable for example)
* Columns width are sometimes too large to be easily readable. This usually happens when printing SST events with long node names
* When some display corner-cases seems broken (events not deduplicated, ...), it is because of extra hidden internal events.

Authors
=======

Yoann La Cancellera