Parse rows from PROCESSLIST to make events when queries finish.
Parameters
Required Arguments
code | Callback that returns an arrayref of rows from SHOW PROCESSLIST. Replication threads and $dbh->{mysql_thread_id} should be removed from the return value. |
Returns
Hashref of a completed event.
Technical Details
Connections (cxn) are tracked in a hashref ($self->{active_cxn}) by their Id from the proclist. Each poll of the proclist (i.e. each call to the code callback) causes the current cxn/queries to be compared to the saved (active) cxn. One of three things can happen: a new cxn appears, a query ends/changes/restarts, or a cxn ends (and thus ends its query).
When a new connect appears, we only begin tracking it when the Info column from the proclist is not null, indicating that the cxn is executing a query. The full proclist for this cxn is saved for comparison with later polls. This is $prev in the code which really references $self->{active_cxn}.
For existing cxn, if the Info is the same (i.e. same query), and the Time hasn’t decreased, and the query hasn’t restarted (look below in the code for how we detect this), then the cxn is still executing the same query. So we do nothing. But if any one of those 3 conditions is false, that signals a new query. So we make an event based on saved info from the last poll, then updated the cxn for the new query already in progress.
When a previously active cxn no longer appears in a poll, then that cxn has ended and so did it’s query, so we make an event for the query and then delete the cxn from $self->{active_cxn}. This is checked in the PREVIOUSLY_ACTIVE_CXN loop.
The default MySQL server has one-second granularity in the Time column. This means that a statement that starts at X.9 seconds shows 0 seconds for only 0.1 second. A statement that starts at X.0 seconds shows 0 secs for a second, and 1 second up until it has actually been running 2 seconds. This makes it tricky to determine when a statement has been restarted. Further, this program and MySQL may have some clock skew. Even if they are running on the same machine, it’s possible that at X.999999 seconds we get the time, and at X+1.000001 seconds we get the snapshot from MySQL. (Fortunately MySQL doesn’t re-evaluate now() for every process, or that would cause even more problems.) And a query that’s issued to MySQL may stall for any amount of time before it’s executed, making even more skew between the times.
One worst case is,
- The processlist measures time at 100.01 and it’s 100.
- We measure the time. It says 100.02.
- A query was started at 90. Processlist says Time=10.
- We calculate that the query was started at 90.02.
- Processlist measures it at 100.998 and it’s 100.
- We measure time again, it says 100.999.
- Time has passed, but the Time column still says 10.
Another is,
- We get the processlist, then the time.
- A second later we get the processlist, but it takes 2 sec to fetch.
- We measure the time and it looks like 3 sec have passed, but proclist says only one has passed. This is why etime is necessary. What should we do? Well, the key thing to notice here is that a new statement has started if a) the Time column actually decreases since we last saw the process, or b) the Time column does not increase for 2 seconds, plus the etime of the first and second measurements combined!