Azure carbon footprint calculator


 


Sustainability is rightly at the top of many companies agendas right now, for us as an energy producer we are acutely aware of our responsibilities here and we have made a number of key statements and directives of how our generation assets and the wider businesses will make improvements here.

I'm in IT though and we can sometimes feel removed from what impacts we can directly have in this area. We've been looking in to how we can contribute and there are a number of initiatives and improvements planned. What I wanted to talk about here though is Azure and what we are doing here.

So where to start? As the eminent Victorian scientist Lord Kelvin put it:

"If you cannot measure it, you cannot improve it"

So how do we measure it? - we have loads of information on what we consume in Azure from all our billing reports - but how does that equate to actual carbon and other greenhouse gas emissions? Until last year we could only guess as we had no insight in to what our usage was producing. That all changed though when Microsoft released their sustainability calculator (currently still in public preview) :

https://aka.ms/SustainabilityCalculator

This free tool (though you will need power bi pro to be able to use it) shows the carbon emissions from our company specific usage of Azure (you can see our real screens below - with figures blanked out..)



It's not just a high level set of figures though we can drill down by a number of dimensions - namely subscription, azure service, region and then year/month to allow us to then analyse where our generation is coming from.

This is a huge step forward in our ability to see the impact of our usage - however it didn't go as far as we wanted. We had a vision to take it down to the application level - so we can understand the real world impact of individual applications.

In our deployment model the resource group is the logical unit that encompasses the application so we needed to get the figures at that level - but how is that possible we are not given the information at that level? ( I have given this feedback to the MS product team so it may make it into the dataset at some future date hopefully). Well the answer is to combine it with information we do have about our usage - the billing data.

For example we know we are spending 'x' on virtual machines for resource group 'y'. Resource group 'y' makes up 50% of the total virtual machine costs for subscription 'z' so we can make the rough assumption that therefore 50% of the emissions for subscription 'z' belong to resource group 'y'. In our model the resource group directly correlates to application 'a' so we know how much carbon application 'a' is generating by then linking back to the actual figures from the above reports (this won't be 100% accurate as some machine types will be more efficient than others, reservations may be involved etc - but it's good enough for what we want to achieve here).

So to produce a report per application we just need to take:

1. An extract from the report above (downloadable as excel from the report above)
2. The billing/usage data for our company from the cost management services (again downloadable/extractable in many formats)
3. Application related information - from our CMDB

We have all of this data it just needs to be combined and then a nice front end stuck on it.

So that's what we did - we took the 3 datasets above and loaded them into a relational database - in our case we used snowflake as it was available and already contained some of the information we needed (any RDBMS will do here though - it just needs to support some minor analytic functions to do some of the calculations ).

I won't cover the whole process of how that was done as it would take too long to write up all the steps - essentially though it's just taking the data extracts and loading them into some simple relational tables - it's not a hugely complicated process.

Once that data is loaded I then needed to create the SQL to combine the data into an easily consumable form for the front end. The simple way to do this is just to create a view that does all the heavy lifting . The view that I then created looks like this:




The underlying SQL for that looking like this:

CREATE VIEW VW_AZURE_USAGE AS with usage as (
    select
        subscriptionname,
        subscriptionid,
        resourcelocation,
        resourcegroup,
        EAM_ID,
        applicationname,
        tot,
        ratio_to_report(tot) over (partition by subscriptionname) as perc
    from
        (
            Select
                subscriptionname,
                subscriptionid,
                resourcelocation,
                resourcegroup,
                EAM_ID,
                applicationname,
                round(sum(costinbillingcurrency)) as tot
            from
                (
                    SELECT
                        AZ.SUBSCRIPTIONNAME,
                        AZ.RESOURCEGROUP,
                        AZ.subscriptionid,
                        RM.EAM_ID,
                        PE."NAME" AS ApplicationName,
                        PE.LOB_PARENT AS LineOfBusiness,
                        PE.OWNERSHIP_BU,
                        PE.BUID,
                        AZ.resourcelocation,
                        AZ.COSTINBILLINGCURRENCY
                    FROM
                        T_RAW_AZURE_COST AZ
                        LEFT JOIN (
                            SELECT
                                owner,
                                owner_email,
                                eam_id,
                                RESOURCEGROUP,
                                subname
                            FROM
                                T_AZURE_RES_MAPPING
                        ) RM ON upper(RM.RESOURCEGROUP) = upper(AZ.RESOURCEGROUP)
                        AND UPPER(RM.subname) = UPPER(AZ.SUBSCRIPTIONNAME)
                        LEFT JOIN (
                            SELECT
                                NAME,
                                EAM_ID,
                                LOB_PARENT,
                                OWNERSHIP_BU,
                                BUID
                            FROM
                                T_UPMX_PORTFOLIO_ELEMENTS
                        ) PE ON to_char(RM.EAM_ID) = to_char(PE.EAM_ID)
                    where
                        AZ.metercategory = 'Virtual Machines'
                        and AZ.resourcelocation = 'westeurope'
                )
            group by
                subscriptionname,
                subscriptionid,
                resourcelocation,
                resourcegroup,
                EAM_ID,
                applicationname
        )
)
select
    subscriptionname,
    resourcegroup,
    scope,
    emission_mtco2e,
    perc,
    eam_id,
    applicationname,
    tot,(perc * emission_mtco2e) as co2output
from
    usage
    left join T_AZURE_EMISSION EM on EM.SUBSCRIPTION_ID = SUBSCRIPTIONID
where
    AZURE_REGION = 'West Europe'
    and AZURE_SERVICE = 'Virtual Machines'
    and reportdate = '2021-08-01'
    and applicationname is not null;


The view is maybe simpler than it looks at first glance - it's only really made complex by the use of the ratio_to_report analytic function that allows me to easily work out what ratio of the total emissions belongs to each resource group/application.

Once I have that view all I then need to do is make that look good in a front end tool - this is not my forte but Powerbi allows you to fairly easily create a half decent visual even if you don't really know what you are doing....

The screen is then just essentially displaying "Select * from view" - the only extra bit I do is convert the co2 value in to something more 'relatable' - in this case tree's growing to absorb the carbon emitted

The end result is shown below (redacted to remove the app details - but you still get the idea)


We then have a view for each application of its environmental impact.

So what you might say?

Well referring back to the top of the post it all starts with measuring - we now have a baseline and we can see trends over time so we can track (hopefully) improvements.

How can I improve though is your likely next question - I don't control how MS runs the datacentre. That is of course true - but you can control how you use services within that - for example you could:

1) Right size
2) Shutdown when not in use
3) Modernize away from Virtual machines to more massively shared PaaS services where the relative emissions are less
4) Build non prod environments on demand

etc...

You get the idea - there are things you can control to influence this - the added bonus being that there is a direct correlation to cost - savings in carbon directly relate to actual monetary savings - this is a "no brainer" to be doing these things as both the company and the environment benefit.

I would encourage you all to at least get the MS tool installed and see the impact you are already having as a first step and start on your co2 measuring and reducing journey.



Comments