Cloud control simple high availability in Azure

Over the past few weeks I've been doing various installs and configurations of things in Azure, this past 2 weeks I've been building up a cloud control instance to manage the oracle components we have there.

In our current on premise install the setup is very simple, we have a single server running the OMS and database and everything works reasonably well, we don't however have any real HA/DR capability - in the on premise case the live systems are maintained by a third party who use a different tool for monitoring and HA was therefore not such a critical requirement.

In azure though the support model will be different and our monitoring/management tool needs to be highly available.

So i started doing some research and actually found it quite hard to just find a simple HA example - what i basically wanted to have was 2 servers, these two servers would both be capable of running the OMS and both be capable of running the repository database.

Everything i found seemed to be pointing to RAC/cluster solutions to make this work and I really didn't want to go down that route, i got burned many years ago with OPS and again in 10g RAC and i never really liked the complexity that RAC brought - i much prefer simple solutions - complexity is the enemy of reliability and all that (or whatever that quote is by Geer).

So what to do - the four main issues to solve were

1) i needed a single address that all agents could upload to
2) i needed a single address that the em console could be reached at
3) the repository connection needed to be handle dataguard
4) i needed some shared storage medium for the OMS to allow some functionality to work

So what did i do......

Well the picture of what i ended up with is here:

I won't go through a complete step by step process of exactly what i did as
a) that would take ages to write up
b) i didn't capture any screenshots
c) see above

Instead i'll pick out the key architectural points that make this work

1) The load balancer - thats one of these from the azure portal

This allows me to configure a pool of servers to which i want to send traffic - presenting a single address for both access to the main login page and for agent upload

2) Azure file storage - thats this bit from with storage accounts

This allows a shared volume to be presented to multiple servers using cifs and provides the shared volume that is needed for more than one OMS.

3) The final piece of the puzzle was how to make the OMS setup handle dataguard - this didn't seem to be that well advertised and indeed just going from what the GUI install presents you with you might think it isn't possible - but it is - you just have to figure out the right syntax - for my setup the command was like this

 emctl config oms -store_repos_details -repos_user sysman -repos_conndesc '(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=hosta)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=hostb)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dataguard_service_name_you_define)(SERVER=DEDICATED)))'

This syntax actually took me ages to figure out and has subtle differences between versions and between platforms it seems - once you get it right you are prompted of sysman password and away you go (has to be run on all nodes by the way and the oms has to be in a certain state when you do this)

So will all this building blocks in place i went about building the setup shown in the picture above, i initially built it on 13CR1 but literally 10 minutes after i finished 13CR2 came out so i rebuilt it with that version - both seemed to work fine i can report.

The high level steps are as follows:

1) provision 2 linux boxes from Azure  in an availability set(other clouds are available.....)
2) provision file storage and assign to both nodes
3) provision load balancer
4) Install database software on both nodes
5) Create repository database using database template from OTN as normal single instance
6) Install single OMS (as normal single host OMS)
7) Install management agent on second node
8) clone the initial OMS into a new ohs on the second host using the procedure for the procedure library within cloud control - see picture of that here - that gives is a live/live oms setup

9) build load balancer rules/probes for EM standard ports - only having the original server in the pool at this point - these can be found in the EM install docs
10) use emctl to repoint OMS to the load balanced address for both console and upload
emctl secure oms -sysman_pwd sysmanpasswordhere -reg_pwd agentregpasswordhere -host -secure_port 4903 -slb_port 4903 -slb_console_port 7802 -slb_jvmd_https_port 7301
11) allow unsecured access for the local agents to upload to the non load balanced port (this seems to be a quirk of azure that the servers being load balanced cannot actually talk to the load balanced addressed - maybe this creates a feedback loop or something). If you are using other load balancer types this may not be necessary
emctl secure unlock -console -upload
12) At this point you have a live/live OMS setup for console access any additional agents you add should be configured to the load balanced address which will also be highly available.
13) Now we just add dataguard
 (annoyingly the wizard in 13CR2 seemed to be broken and i had to do it manually)
14) Now we tell both OMS's the new data guard style connect string it should use for the repo - repeat from above
 emctl config oms -store_repos_details -repos_user sysman -repos_conndesc '(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=hosta)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=hostb)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dataguard_service_name_you_define)(SERVER=DEDICATED)))'
15) And thats pretty much it.....

I did a whole series of tests with database switchovers and stopping/starting OMS's and having one up and one down etc and I'm happy to report that the whole thing works pretty seamlessly - intact very impressively so a complete switchover of database and stopping of one of the OMS's and the system just carried on working - from an end use point of view you wouldn't have even known!

Quite impressive i thought - this is a relatively simple architecture that makes the system very reliable - no RAC, no clusters but we do have a cloud....... :-)

I would imagine this same architectural possibility holds true for Oracle cloud, Amazon AWS etc and even of course good old 'On Premise'..........

quick caveat on all of this (thanks Cameron and others for reminding me not everyone has a ULA...) - remember to consider what licence impact (from an oracle point of view this can have) - take a look at the special licence case for cloud control and what is and isn't free - and if you can turn it into plain english (specifically the web logic part) that would be great....


  1. This is really good and a powerful one which shows that EM13cR2 can be installed on Azure platform with HA setup.

    This really shows that platform is not really a constraint for EM and EM can be deployed on Cloud & On-premise.