Some MySql/zabbix findings



This past week I've been looking in to creating some reports out of our zabbix database that we host in an Azure MySQL PaaS service (i.e. a proper PaaS where everything is managed via a web gui/api calls - there is no 'server' that can be logged in to - at least not by end PaaS consumers).

Now my MySQL experience is pretty limited - I've installed it in the past and run some simple SQL queries - but little more than that - now however i had to write a reasonably complex report so i had to do some syntax digging to allow me to do that.

The first thing i had to do was identify a good tool to use for this - after a bit of digging I realized I could use the MySQL Workbench product - this has nice highlighting/formatting features as well as schema browsing and intellisense. It also has a nice diagramming tool - in fact for the zabbix schema I was interested in there is a file published in this format for this tool which was very useful - this can be found here: https://share.zabbix.com/databases/mysql/zabbix-database-model

Here is a quick screenshot of what that looks like - so similar to sqldeveloper data modeller from what I remember



The report I needed to write was to identify how much outage there had been on the IaaS servers we are monitoring in the past month - there is some functionality on the app itself for this using services but we wanted to make the data available via other tooling. Grafana had a feature that could fetch this data via the rest api and looked like  it could have worked but it couldn't cope with a parent/child hierarchy of display (i.e. group up the prod servers and show an overall sla figure).

So i was back to writing the SQL directly against the database.

The data model confused me slightly to start with but once i'd got my head round it I could see what data I needed to work with - then I just had to 'massage' it into the format I wanted using mysql syntax.

Let me pick out the syntax elements i made use of before pasting the full SQL

1) REPLACE SQL function - this is exactly the same as oracle so no comment to really make there
2) FROM_UNIXTIME function - this converts the zabbix data held in 'unix time' format into a more normal date format - i.e. human understandable :-)
3) TIMESTAMPDIFF - used for working out differences between two dates - slightly different approach to oracle where dates are able to be directly subtracted from each other for example
4) joins - as you would expect ANSI style is what has to be used - none of the oracle style syntax for this is allowed - personally i still prefer the oracle way but i think that's just because i used it for so long
5) DATE_SUB - used for subtracting a time period from a date
6) Lag function - now in the PaaS version of MySQL this doesn't exist (though i understand it's in the latest version of normal MySQL) - there is a workaround using the @ variable though - which is somehow giving the same kind of functionality - though to be honest i still didn't 100% get my head round exactly what it is doing - see some example of it here https://explainextended.com/2009/03/10/analytic-functions-first_value-last_value-lead-lag/
7) set operators - i wanted to use the minus set operator - this does not exist so instead you have to do a left join and then just filter out using the null elements - there is a simple example of this here: http://www.mysqltutorial.org/mysql-minus/

So having said all that what does my end result query look like?



SELECT 
    slas.host, slas.sla, s.goodsla, s2.name AS parent
FROM
    (SELECT 
        REPLACE(host, '.com', '') AS host,
            100 - ((SUM(outfor) / MAX(totsecs)) * 100) AS sla
    FROM
        (SELECT 
        rich.*,
            TIMESTAMPDIFF(SECOND, wentdown, cameup) outfor,
            TIMESTAMPDIFF(SECOND, DATE_SUB(CURDATE(), INTERVAL '1' MONTH), CURDATE()) AS totsecs
    FROM
        (SELECT 
        h.host,
            FROM_UNIXTIME(e.clock) AS dt,
            CASE
                WHEN e.value = 1 THEN 'DOWN'
                WHEN e.value = 0 THEN 'UP'
                ELSE e.value
            END AS flag,
            @r AS wentdown,
            (@r:=FROM_UNIXTIME(e.clock)) AS cameup
    FROM
        events e
    INNER JOIN triggers t ON t.triggerid = e.objectid
    INNER JOIN functions f ON f.triggerid = t.triggerid
    INNER JOIN items i ON i.itemid = f.itemid
    INNER JOIN hosts h ON h.hostid = i.hostid
    WHERE
        e.source = 0
            AND t.description = 'Zabbix agent on {HOST.NAME} is unreachable for 10 minutes'
            AND h.host NOT LIKE 'abc%'
    ORDER BY 1 , 2 DESC) rich
    WHERE
        flag = 'UP'
            AND dt > DATE_SUB(CURDATE(), INTERVAL '1' MONTH)) source
    GROUP BY host UNION SELECT 
        hst.host, 100 AS sla
    FROM
        hosts hst
    LEFT JOIN (SELECT 
        host
    FROM
        (SELECT 
        rich.*,
            TIMESTAMPDIFF(SECOND, wentdown, cameup) outfor,
            TIMESTAMPDIFF(SECOND, DATE_SUB(CURDATE(), INTERVAL '1' MONTH), CURDATE()) AS totsecs
    FROM
        (SELECT 
        h.host,
            FROM_UNIXTIME(e.clock) AS dt,
            CASE
                WHEN e.value = 1 THEN 'DOWN'
                WHEN e.value = 0 THEN 'UP'
                ELSE e.value
            END AS flag,
            @r AS wentdown,
            (@r:=FROM_UNIXTIME(e.clock)) AS cameup
    FROM
        events e
    INNER JOIN triggers t ON t.triggerid = e.objectid
    INNER JOIN functions f ON f.triggerid = t.triggerid
    INNER JOIN items i ON i.itemid = f.itemid
    INNER JOIN hosts h ON h.hostid = i.hostid
    WHERE
        e.source = 0
            AND t.description = 'Zabbix agent on {HOST.NAME} is unreachable for 10 minutes'
            AND h.host NOT LIKE 'abc%'
    ORDER BY 1 , 2 DESC) rich
    WHERE
        flag = 'UP'
            AND dt > DATE_SUB(CURDATE(), INTERVAL '1' MONTH)) source
    GROUP BY host) x ON x.host = hst.host
    WHERE
        x.host IS NULL AND available = 1
            AND status = 0
            AND hst.host NOT LIKE 'abc%') slas
        INNER JOIN
    services s ON s.name = slas.host
        INNER JOIN
    services_links sl ON sl.servicedownid = s.serviceid
        INNER JOIN
    services s2 ON s2.serviceid = sl.serviceupid


This gives a resultset something like this:



I can then turn this into a nice Power BI report with drill down from the parent group (and an average figure displayed at the parent level) - i still have to figure out how to do this.....

This parent groups and the target SLA level all come from the services definition inside zabbix - though it would be fairly simple to not have this and make use of host groups with a generic SLA set at that level rather than having this additional sla/service object defined.

This runs in about 0.5 secs with well over 100 zabbix agents installed and uploading data so performance seems OK

The query is maybe longer than it needs to be - it's maybe my luck of MySQL understanding that has made it more complicated - it gives the result i want though.

Maybe the zabbix query helps others too as there didn't seem to be many example dealing with this kind of thing.


Comments