This past week I've been looking in to creating some reports out of our zabbix database that we host in an Azure MySQL PaaS service (i.e. a proper PaaS where everything is managed via a web gui/api calls - there is no 'server' that can be logged in to - at least not by end PaaS consumers).
Now my MySQL experience is pretty limited - I've installed it in the past and run some simple SQL queries - but little more than that - now however i had to write a reasonably complex report so i had to do some syntax digging to allow me to do that.
The first thing i had to do was identify a good tool to use for this - after a bit of digging I realized I could use the MySQL Workbench product - this has nice highlighting/formatting features as well as schema browsing and intellisense. It also has a nice diagramming tool - in fact for the zabbix schema I was interested in there is a file published in this format for this tool which was very useful - this can be found here: https://share.zabbix.com/databases/mysql/zabbix-database-model
Here is a quick screenshot of what that looks like - so similar to sqldeveloper data modeller from what I remember
The report I needed to write was to identify how much outage there had been on the IaaS servers we are monitoring in the past month - there is some functionality on the app itself for this using services but we wanted to make the data available via other tooling. Grafana had a feature that could fetch this data via the rest api and looked like it could have worked but it couldn't cope with a parent/child hierarchy of display (i.e. group up the prod servers and show an overall sla figure).
So i was back to writing the SQL directly against the database.
The data model confused me slightly to start with but once i'd got my head round it I could see what data I needed to work with - then I just had to 'massage' it into the format I wanted using mysql syntax.
Let me pick out the syntax elements i made use of before pasting the full SQL
1) REPLACE SQL function - this is exactly the same as oracle so no comment to really make there
2) FROM_UNIXTIME function - this converts the zabbix data held in 'unix time' format into a more normal date format - i.e. human understandable :-)
3) TIMESTAMPDIFF - used for working out differences between two dates - slightly different approach to oracle where dates are able to be directly subtracted from each other for example
4) joins - as you would expect ANSI style is what has to be used - none of the oracle style syntax for this is allowed - personally i still prefer the oracle way but i think that's just because i used it for so long
5) DATE_SUB - used for subtracting a time period from a date
6) Lag function - now in the PaaS version of MySQL this doesn't exist (though i understand it's in the latest version of normal MySQL) - there is a workaround using the @ variable though - which is somehow giving the same kind of functionality - though to be honest i still didn't 100% get my head round exactly what it is doing - see some example of it here https://explainextended.com/2009/03/10/analytic-functions-first_value-last_value-lead-lag/
7) set operators - i wanted to use the minus set operator - this does not exist so instead you have to do a left join and then just filter out using the null elements - there is a simple example of this here: http://www.mysqltutorial.org/mysql-minus/
So having said all that what does my end result query look like?
SELECT slas.host, slas.sla, s.goodsla, s2.name AS parent FROM (SELECT REPLACE(host, '.com', '') AS host, 100 - ((SUM(outfor) / MAX(totsecs)) * 100) AS sla FROM (SELECT rich.*, TIMESTAMPDIFF(SECOND, wentdown, cameup) outfor, TIMESTAMPDIFF(SECOND, DATE_SUB(CURDATE(), INTERVAL '1' MONTH), CURDATE()) AS totsecs FROM (SELECT h.host, FROM_UNIXTIME(e.clock) AS dt, CASE WHEN e.value = 1 THEN 'DOWN' WHEN e.value = 0 THEN 'UP' ELSE e.value END AS flag, @r AS wentdown, (@r:=FROM_UNIXTIME(e.clock)) AS cameup FROM events e INNER JOIN triggers t ON t.triggerid = e.objectid INNER JOIN functions f ON f.triggerid = t.triggerid INNER JOIN items i ON i.itemid = f.itemid INNER JOIN hosts h ON h.hostid = i.hostid WHERE e.source = 0 AND t.description = 'Zabbix agent on {HOST.NAME} is unreachable for 10 minutes' AND h.host NOT LIKE 'abc%' ORDER BY 1 , 2 DESC) rich WHERE flag = 'UP' AND dt > DATE_SUB(CURDATE(), INTERVAL '1' MONTH)) source GROUP BY host UNION SELECT hst.host, 100 AS sla FROM hosts hst LEFT JOIN (SELECT host FROM (SELECT rich.*, TIMESTAMPDIFF(SECOND, wentdown, cameup) outfor, TIMESTAMPDIFF(SECOND, DATE_SUB(CURDATE(), INTERVAL '1' MONTH), CURDATE()) AS totsecs FROM (SELECT h.host, FROM_UNIXTIME(e.clock) AS dt, CASE WHEN e.value = 1 THEN 'DOWN' WHEN e.value = 0 THEN 'UP' ELSE e.value END AS flag, @r AS wentdown, (@r:=FROM_UNIXTIME(e.clock)) AS cameup FROM events e INNER JOIN triggers t ON t.triggerid = e.objectid INNER JOIN functions f ON f.triggerid = t.triggerid INNER JOIN items i ON i.itemid = f.itemid INNER JOIN hosts h ON h.hostid = i.hostid WHERE e.source = 0 AND t.description = 'Zabbix agent on {HOST.NAME} is unreachable for 10 minutes' AND h.host NOT LIKE 'abc%' ORDER BY 1 , 2 DESC) rich WHERE flag = 'UP' AND dt > DATE_SUB(CURDATE(), INTERVAL '1' MONTH)) source GROUP BY host) x ON x.host = hst.host WHERE x.host IS NULL AND available = 1 AND status = 0 AND hst.host NOT LIKE 'abc%') slas INNER JOIN services s ON s.name = slas.host INNER JOIN services_links sl ON sl.servicedownid = s.serviceid INNER JOIN services s2 ON s2.serviceid = sl.serviceupid
This gives a resultset something like this:
I can then turn this into a nice Power BI report with drill down from the parent group (and an average figure displayed at the parent level) - i still have to figure out how to do this.....
This parent groups and the target SLA level all come from the services definition inside zabbix - though it would be fairly simple to not have this and make use of host groups with a generic SLA set at that level rather than having this additional sla/service object defined.
This runs in about 0.5 secs with well over 100 zabbix agents installed and uploading data so performance seems OK
The query is maybe longer than it needs to be - it's maybe my luck of MySQL understanding that has made it more complicated - it gives the result i want though.
Maybe the zabbix query helps others too as there didn't seem to be many example dealing with this kind of thing.
Comments
Post a Comment