Hi All
Looking for other companies who have moved Workload Automation in Azure using SQL MI for the database. We currently just moved 2 environments into Azure using SQL MI and have been having a few issues and are thinking about a possible rebuild so I wanted to see if anyone out there currently has a WLA environment running without issues in Azure.
We (as a team) do not have a ton of experience with Azure so hoping this all makes sense. Here is what we currently have and what we may be doing to try and resolve our issues.
We are using SQL MI and that seems to be one of our biggest pain points. We are currently a general subscription in Microsoft Azure and that means:
During maintenance, databases remain available, but some updates might require a failover. The system default maintenance window (5pm to 8am) limits most activities to this time, but urgent updates might occur outside of it. To ensure all updates occur only during the maintenance window, select a non-default option.
You can adjust the window for maintenance updates to a time suitable to your Azure SQL resources by choosing from two non-default maintenance window slots:
- Weekday window: 10:00 PM to 6:00 AM local time, Monday - Thursday (WE FALL CURRENTLY HERE)
- Weekend window: 10:00 PM to 6:00 AM local time, Friday - Sunday
We are now planning to be moved to a business-critical subscription. In summary, per Microsoft, upgrading to business-critical tier for SQL MI can significantly but not entirely eliminate transient issues.
Switching to a business-critical service tier for SQL managed instance can help reduce the frequency and impact to transient issues due to its higher availability and resilience features. However, it cannot guarantee 100% prevention od transient issues.
This piece is still concerning since WLA is very sensitive to database outages (even millisecond ones). Is there a setting that we can do on the WLA side to handle small outages?
On our application server side, we are thinking about doing the following:
Migration Plan:
- Shutdown VM
2. Take Backup of original VM
3. Restore the backup as a new VM in new RG (where it will ask for vnet/subnet information) - Private IP will be automatically fetched.
4. Change IP on newly created VM (we can set static IP but without performing POC we are unable to confirm whether it gets changed at DHCP leases) Note: As checked with build team they confirms that manually changing IP will get reflected on DHCP as well because hostname is going to be different in restored machine.
5. Power on the VM
6. Perform Health check.
In short, we can move the existing app servers to new subscription with minimal downtime, no need for rebuild, and since we can manually configure the existing IP address after the move, we do not need to reconfigure firewall.
We also received a note from an architect which stated
After reviewing the Azure SQL MI, service and its options/configurations.
I think we need to discuss this in detail and come up with a plan before doing anything.
WLA is a high memory usage application/database with critical SLAs. We need uptime and vert fast RTO with little to no interruptions.
We don't want any impact when patching is taking place, even if its seconds because we have seen how WLA acts when (even if milliseconds) we have a loss of connection to the database. Application restarts itself.
Now I know this is a lot mentioned and thought since Broadcom has stated you have other customers in Azure we wanted to get your feedback on how our environments should have been built.
Should we be looking at business critical tier, with active geo-replication? What is the best Azure solution for running Workload Automation in Azure.