ESXi

 View Only
Expand all | Collapse all

Server Monitoring Tools - Advice, Direction, help please

  • 1.  Server Monitoring Tools - Advice, Direction, help please

    Posted Jul 29, 2010 10:30 PM

    Hello there,

    First off let me say ESXI is amazing and flat out works awesome! I'm a nub when it comes to vmware and I just finished upgrading/migrating our entire company from a SBS2000 shop to Server 2008 standard VM's hosted in ESXI. Let me tell you, setting up the ESXI hosts was the easiest part of that whole ordeal lol. I'm looking for a tool that can monitor ESXI hosts and send alerts via email/sms and or potentially shut down VM's. For example: Say the AC unit dies on an extremely hot day on the weekend and our machines are overheating. Is there a tool that will alert via smtp/sms and initiate a shut down command to our VM's and then the host PMs? I'm finding it hard to grasp how this is possible because I don't see how software could run in a VM and be able to "access" the actual ESXI host hardware conditions.. At a minimum I'm looking for a tool that can send alerts via email/sms if there is a problem so I'd be able to address the issue before any damage occured.

    We currently have two ESXI 4.0 hosts on Dell PowerEdge T610 Servers. I used the ESXI version specifically for Dell servers (if that helps).

    Three virtual machines, two Server 2008 Standard and one Server 2008 R2. If you know of any tools that would help me achieve what I've mentioned above I would greatly appreciate it. I did a fair amount of searching on the web but did not find a solution to fit all our needs yet. It can be open source or payware.

    Thanks for your time,

    Dustin



  • 2.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Jul 29, 2010 10:43 PM

    This should help: http://blog.peacon.co.uk/hardware-health-alerting-with-esxi/ Thermal shutdown is entirely possible but you need to consider the rate of rise, the best way is to test this by turning off the AC and timing how long the room takes to reach a critical point (perhaps 40*C) during a maintenance window.

    Since you have the Dell version installed you can probably also manage them the Dell OpenManage Server Assistant?

    http://blog.peacon.co.uk

    Please award points to any useful answer.



  • 3.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Jul 30, 2010 03:43 PM

    damn cool.

    We have a mix of free and paid versions and this sure would be helpful.



  • 4.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 06, 2010 07:20 PM

    Hey, thanks for the link. I couldn't get the perl script from the Peacon blog to work, kept getting compile errors, but I was able to use the original script that William Lam wrote. It's pretty limited/simple but is better than nothing.

    I'm looking into the UPS shutdown info but it doesn't look like there is any easy way to use it for ESXi 4.



  • 5.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 06, 2010 07:39 PM

    As far as ESXi logging and setting up alerts for certain issues. I am using syslog to push event logs to a server with Splunk installed. You should be able to setup alerts with Splunk based on the event. Also it makes for a great and easy way to search through the logs.

    Hope this helps.



  • 6.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 07, 2010 10:36 PM

    What it sounds like you're looking to do is just one of the things you can do with a vCenter Server... Since you have two hosts, you could go with either the vCenter Server Foundation license, or the vCenter Server for Essentials license, depending on which license set you already purchased. If you already have vCenter Server, then you can spec out the Alarms to send you email's when an even occurs. You can also modify the Alarm settings, or even create new alarms that you want to have logged, or send you an email.

    A lot of what you'll have for options depends on which licenses you purchased for the ESXi servers...

    For the thermal type events, a lot depends on how the host reports the temperatures to vCenter... But, if you're having concerns about the AC unit failing during hot days, you have bigger issues you need to resolve.

    For power management functions, within the vCenter/vSphere configuration, you would need to have the license that includes DPM, but even then it's not going to power down hosts due to thermal events, just when the demand is low enough for everything to run on the other host. To do what I think you're asking for, you'll need to configure a monitoring system (or get thermal monitoring hardware, such as those offered by IT Watch Dogs). Depending on what you're using for UPS devices, you might be able to create scripts that will monitor the thermal device, and then promt the UPS to start powering off the hosts. You will need the correct API for the UPS in place so that you can have the host vMotion critical VM's from one host to another, to keep company critical servers running...

    Personally, I'd rather not automate such things. I'd rather get an email alert that the temps are above acceptible parameters, and then remote in and start the migration of VM's to one host and power off the other. Or power off both hosts, and other hardware too...

    A solution that will probably help you sleep better at night would be to get a better AC unit for the server room. Or a portable unit to augment the existing unit until you can get that resolved/replaced...

    VMware VCP4

    Consider awarding points for "helpful" and/or "correct" answers.



  • 7.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 08, 2010 05:13 PM

    Re performing a host shutdown, esxi-control.pl has this functionality, or you could enable SSH and use plink.

    Please post any info on the compile errors you found in esx-health.pl here or PM me or post on[peacon blog comments|http://blog.peacon.co.uk/hardware-health-alerting-with-esxi/#comments]:)

    Cheers

    http://blog.peacon.co.uk

    Please award points to any useful answer.



  • 8.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 10, 2010 06:42 PM

    Hey J1mbo,

    I've got esxi-health working great now, nice job on that! The problem I had earlier was operator error on my part, sorry about that.

    I think I've got an idea of how I’m going to accomplish an automated & graceful shut down using esxi-control.pl. The one part I'm still trying to figure out is how to initiate esxi-control.pl based off a specified thermal range/condition. I was thinking of using Veeam's monitor tool to run it if a hardware alarm enters the "alert" state. But unless I can initiate the alarm based solely off the temp sensor (which you can't) veeam isn't going to work for us.

    Like I said I'm a nub to esxi and a total nub to dealing with perl, but I'm trying to learn! The time and effort you're putting in to try and help me is much appreciated!

    Thanks.



  • 9.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 10, 2010 06:31 PM

    Thanks golddiggie. I've been looking into the vCenter Essentials license but we don't need 90% of the features you get which makes it hard to justify the cost. Even though it's really not much money, I don't get to make the final decision on that. We haven't purchased any licenses for ESXi thus far, just running the free stuff.

    I'm not so much having concerns about the AC unit failing, it's brand new and actually lives in the basement under the floor of the server cabinet. I'd just like to have a procedure in place in case it ever does fail, it is a machine with moving parts, so it definitely can fail.

    I checked out IT Watch Dogs and they have some nice stuff, thanks for that link! I'm sure we could use some of their equipment in the future.

    I've been playing with VEEAM's free monitoring tool and it works pretty well so far. It has the ability to run scripts automatically when and alarm is tripped if needed. The problem with veeam monitor is the hardware monitor is very broad, I'd like to set an alarm based strictly off the case temp in our esxi hosts. I'm going to give the full version a try for 60 days and see if it adds more specific definititions for alarms/notifications. We certainly could remote in and start shutting down machines bases off a notification alarm. But what happens if my boss or myself are not able to get to a computer with internet access in time? It wouldn't take long for the servers to heat up if the AC unit were to fail.



  • 10.  RE: Server Monitoring Tools - Advice, Direction, help please
    Best Answer

    Posted Aug 10, 2010 06:47 PM

    I use a a temperature monitor device http://www.temperaturealert.com/ It would require some scripting but J1mbo's could be combined to create a proper shutdown.

    There are probably hundreds of things to be worried about. Don't get too hung up on a single one.



  • 11.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Dec 05, 2012 09:47 PM

    DSTAVERT wrote:

    I use a a temperature monitor device http://www.temperaturealert.com/ It would require some scripting but J1mbo's could be combined to create a proper shutdown.

    There are probably hundreds of things to be worried about. Don't get too hung up on a single one.

    This is exactly what we did and it works great.



  • 12.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Aug 24, 2010 01:15 AM

    Check the eG VM Monitor - http://www.eginnovations.com/web/vmware.htm

    Regards,

    Simon



  • 13.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Jul 12, 2011 01:53 PM

    Give a try to VMware plug-in for Verax NMS (http://www.veraxsystems.com/en/products/nms). Notifcations can be send out using e-mails or SMS and applying processing rules you can perform actions triggered by alarms.



  • 14.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted Dec 05, 2012 08:17 PM

    So after a long time of leaving this issue on the back burner, I was finally able to spend some time and come up with an acceptable solution.

    I used a combination of the esxi-control.pl script, coupled with a powershell script I made, along with a third party vendor named Temperature Alert's product.  We used their $129 USB version:  http://www.temperaturealert.com/Wireless-Temperature-Store/Temperature-Alert-USB-Sensor.aspx

    Here's a quick run down of our environment and how this works.

    In our server cabinet we have four machines, currently two ESXi 4 hosts using Dell poweredge T610 hardware,  one older dell server to run our accounting system and finally our voicemail server:

    -Host #1: has one server 2008 vm which is our PDC/DNS/DHCP/File/Print server.

    -Host #2: has two vm's. One server 2008 that's our SQL 2008 DB server and one server 2008 r2 that's our exchange 2010 server.

    Our fear was that one day the small A/C unit, which is mounted to the ceiling in the basement below the server cabinet might fail and damage hardware/cause a failue while the business(s) are closed.

    I first line of defence was notification.  The temp@lert usb product satisfied this issue with the ability to send out an email when the temperature exceeds a difined level.  In this case, if the temp was getting out of control, my boss or I could log in remotely and shut down the machines.  But, what happens if neither one of us are available to log in remotely?  This motivated us to find a way to automatically, gracefully shutdown both the vm's and the hosts based off the cabinet temperature.

    The temperature alert device has the ability to execute a powershell script each time it reads the temperature; I initially wanted the temp alert software to execute a batch file upon reaching a certain temperature, that would call the perl script, thus gracefully shutting down our esxi hosts and vm's.  Nope, not happening.  The temp alert software is limited only to PowerShell scripts and runs it everytime it samples the temperature, in our case, every 120 seconds.  I contacted tech support for temp alert and I was provided a PS script example that would record the temp data to a XML file which could later be parsed as needed.  After tweaking the PS script I received from them, I then went on the write a new powershell script to parse the data in the XML file.  The PS script is run as a scheduled task on our accounting server and is capable of the following:  Sends out a warning email when the temp reaches defined level, shuts down the ESXi hosts and shuts down the accounting server. 

    I used an old workstation with and an evalution copy of ESXi 4 to test with and last weekend we tested it on our production servers.  We shut off the A/C and basically sat there and watched the temp rise.  It worked perfectly!  It only took about 50 minutes for the cabinet to exceed 95*F.  Warning email went out, email with shutdown immenent message went out and all machines shutdown cleanly.  I also added our Verizon mobile phone numbers @vtext.com to a new exchange distribution group so we receive sms messages with the temperature/shutdown information; this is in case we are somewhere outside of data coverage.

    Basically the order of operation is this:

    Temp Alert software reads the temp >> writes/overwrites said xml file with temperature information every 120 seconds

    >> Scheduled task runs PS script every 4 minutes on accounting server which is where the temp alert usb device is installed.

    PowerShell script parses data from XML, if the values in the PS script are satisfied, a warning email, shutdown immenent email/host shutdown will occur.

    To shutdown the hosts, the PS script executes batch files, which call the esxi-control.pl script and provide the credentials neccessary to perform the shutdown of the host.  And since the vm's have the VMware Tools installed, the host is able to cleanly shutdown the guest machines.  Not too bad for only having to spend $129 on a simple usb temperature device.  



  • 15.  RE: Server Monitoring Tools - Advice, Direction, help please

    Posted May 14, 2013 07:17 AM

    Check the VMware Monitor from MindArray IPM (http://www.mindarraysystems.com/vmware-performance-monitoring-tools.php). This allows setting threshold on temperatures, fan and other Hardware sensors. And also does performance monitoring out of the box.