Really appreciate the presentation of this content.
To highlight a couple things:
I am really jealous of your attention to detail regarding documentation of your efforts. I strive to match it but understand that I'll never come close (I'm too lazy I think....)
I struggled a long time trying to get the new QOS metrics thing to work right and never got it where I needed it to be. I read through your code and it looks just like mine so I'll have to go back again and compare. The main issue I was having is the inability to specify a source that wasn't the robot running the script. Not sure if that comes into play here. Regardless having a working example is a huge benefit given the number of errors and omissions in the documentation and physical implementation. For those who do try this, discovery_server has to vacuum up the new metric ids created locally before you'll see correct results which could be a long time depending on how you have discovery configured to run.
Were I writing this code for my environment I'd make these additions:
Nothing in Nimsoft happens reliably 100% of the time - especially when the number of hubs gets largish and/or you have something more than robots talking to a hub or a hub talking to a hub. As soon as you hit hub talking to hub talking to hub the likelihood of any communication failing (in my personal experience) starts being a two digit number. Ultimately I've wound up wrapping all the nimsoft.* calls in a retry loop so that can be addressed.
I haven't tested with the latest Lua 5.4 (I think that's the current version) but the 5.2 version was very slow when accessing global variables. There's not a lot of looping in this code but when you get a couple hundred accesses to a global variable you can measure with a stop watch the runtime difference compared with a local. I've gotten in the habit of wrapping all code within a do/end pair to force any variable creation to be local rather than global. I've not looked at the actual Lua interpreter code but the argument is that the global variables are searched sequentially and that the locals are in a hash table so runtime is comparable with a small number of items but very quickly favors local variables as the number increases.
Similar to the above everything in Lua winds up being a table or a list so when you execute nimbus.request for instance, the code has to locate the global nimbus table by looking in the global table of globals and then find the table entry named "request" in that nimbus table and then it executes that value as a function. It's a lot of lookups. You can save a bunch of that lookup time with something like "local NimbusRequest = nimbus.request" and then use the local NimbusRequest() to invoke the function. Again, it makes little difference if you hit it 10 times in a script but if you do it a thousand times you can measure it.
Thank you again
Garin
Original Message:
Sent: Jun 08, 2022 07:02 AM
From: Luc Christiaens
Subject: Hub queue check tool
Attached LUA script will:
- monitor all hub and nas queues in your environment
- send alarms when thresholds' are reached
- send email when thresholds are reached
- create qos metrics to create dashboard overviews or Jaspersoft report
- sample dashboard is included
This is a UIM 20.4 version that can create valid QOS entries with metricid when launched via NSA 20.30 or higher
The attached zip contains: LUA script, doc file, sample dashboard zip that can be imported, script used to email
Sample dashboard:
Sample qos/nas graph:
All comments, ideas are very welcome