We are generally pretty concerned with HA in our environment and although we don't have it all sorted out for Nimsoft we have a few things in place.
First we tend to deploy a pair of hubs in each of our availability zones. Our hubs in these zones run nas and handle alert escalation for that availability zone. This gives us n+1 for nas in each availability zone and it also provides fault isolation between availability zones since each has it's own dedicated hubs with nas.
Now we have to deal with two bits of data QoS and Alarm.
Each of these distributed hubs puts all of the QoS data into an attach queue. We then have two geographically separate systems that are capable of performing get(s) from these queues. Only one is active at a time since attach queues can only have one client. These systems act as QoS aggregators and tend to be fairly busy since we are collecting data from 180+ hubs at the moment. In the event of a failure on one of these aggregation hubs we would activate the get queues on the other standby aggregation hub. The part we have not completed is we intend to have a replicated mssql db to go with these aggregation hubs that we would run in a master/slave config then cut over at the same time as we enable the get queues on the standby hub. We would also of course have a matching SDP to go with this db and everything should just *work* since the DB holds all of the magic for making SDP work.
The alarm data is a bit easier to deal with since the nas keeps it's own db. We simply have each of our distributed hubs perform nas replication to nas on our aggregation hubs. Since our alarm clears all happen from the bottom up, this works for us and replicates alarms and clears up to the two aggregation servers simultaneously.
I think some other customers have been toying with similar setups, but I'm not sure if anyone has an all inclusive HA/DR strat yet. We just focused on making sure our alarm escalations had this capability as a first pass.
Thx, I will investigate you recommendations a little further as well as any other solutions that may be presented.
We are implementing just the same with our new installation, and are just in the final tests for our MSSQL-Replication.
The plan is, that our NMS-Server has only a view Policies out (VMWare,NAS,MSSQL-DB) and our "Workhorses" are differen VM-Machines which collect all other Data.
The NMS-sErver runs as Master/Slave both wih their own DB-Machines, for better performance wie simply use a cross-Over Cable between NMS and DB, and we have commited ourselfs if a outage the master NMS occur we also switch the DB, so switching occures always as whole package.
The SDP runs on at least 2 Machines as version 2.6 behind two LoadBalancers
Is an implementation like yours documented anywhere. Thx