DX Unified Infrastructure Management

 View Only
  • 1.  Hub queue check tool

    Posted Jun 08, 2022 07:03 AM
    Edited by Luc Christiaens Mar 08, 2023 01:02 AM

    Attached LUA script: queuecheck.lua will: (version: 2.9.2)
    - monitor all hub and nas queues in your environment
    - send alarms when thresholds' are reached (optional)
    - send email when thresholds are reached (optional)
    - create qos metrics to create Dashboard overviews or Jaspersoft report (optional)
    - sample dashboard/prd/listview/cabi is included
    This is a UIM 20.4 version that can create valid QOS entries with metricid when launched via NSA 20.30 or higher
    The attached zip contains: LUA script, doc file, sample dashboard zip that can be imported, script used to email
    Sample dashboard:

    You can now select a queue to obtain more detail:

    In version 2.9.1:

    • you can run now without creating qos metrics, only generating alarms
    • you can run now without generating alarms, only creating qos metrics for reporting
    • there are now 3 threshold levels for queued alarms
    • like recommended modified script to make use of local variables
    • added example Listview report
    • each type of alarm can now have a custom severity level (or the value: 'n' to not generate that alarm)
    • added threshold and alarm for the internal nas queues
    • tested on 20.4.6 (cu6)

    In version 2.9.2:

    • added another idea to create a controlled loop to access the hubs (instead of only 1 retry). You can specify in a variable the maximum iterations.

    Updated package 2.9.2 can be found at the end.
    All comments, ideas are very welcome

    #uim #queue #check #tool #lua #threshold #monitor #hub #nas #script​​​​​​​​​​​



  • 2.  RE: Hub queue check tool

    Posted Jun 08, 2022 09:58 AM
    Seems to be a delay on posts.....


  • 3.  RE: Hub queue check tool

    Posted Jun 09, 2022 09:57 AM
    Zip does not contain example dashboard if you set the parameter:
    target_source='target'
    Note: If you want to create PRD graphs of the creates qos metrics you need to set this target_source parameter to 'target'
    This means that the qos will be created
    - source column will be your robot name
    - target column will be the queue name prefixed by hubq_ or nasq_ and optionally the hubname (if parameter long_qos='y' is set)


  • 4.  RE: Hub queue check tool

    Posted Jun 10, 2022 10:16 AM
    Really appreciate the presentation of this content. 

    To highlight a couple things:

    I am really jealous of your attention to detail regarding documentation of your efforts. I strive to match it but understand that I'll never come close (I'm too lazy I think....)

    I struggled a long time trying to get the new QOS metrics thing to work right and never got it where I needed it to be. I read through your code and it looks just like mine so I'll have to go back again and compare. The main issue I was having is the inability to specify a source that wasn't the robot running the script. Not sure if that comes into play here. Regardless having a working example is a huge benefit given the number of errors and omissions in the documentation and physical implementation. For those who do try this, discovery_server has to vacuum up the new metric ids created locally before you'll see correct results which could be a long time depending on how you have discovery configured to run.

    Were I writing this code for my environment I'd make these additions:

    Nothing in Nimsoft happens reliably 100% of the time - especially when the number of hubs gets largish and/or you have something more than robots talking to a hub or a hub talking to a hub. As soon as you hit hub talking to hub talking to hub the likelihood of any communication failing (in my personal experience) starts being a two digit number. Ultimately I've wound up wrapping all the nimsoft.* calls in a retry loop so that can be addressed.

    I haven't tested with the latest Lua 5.4 (I think that's the current version) but the 5.2 version was very slow when accessing global variables. There's not a lot of looping in this code but when you get a couple hundred accesses to a global variable you can measure with a stop watch the runtime difference compared with a local. I've gotten in the habit of wrapping all code within a do/end pair to force any variable creation to be local rather than global. I've not looked at the actual Lua interpreter code but the argument is that the global variables are searched sequentially and that the locals are in a hash table so runtime is comparable with a small number of items but very quickly favors local variables as the number increases.

    Similar to the above everything in Lua winds up being a table or a list so when you execute nimbus.request for instance, the code has to locate the global nimbus table by looking in the global table of globals and then find the table entry named "request" in that nimbus table and then it executes that value as a function. It's a lot of lookups. You can save a bunch of that lookup time with something like "local NimbusRequest = nimbus.request" and then use the local NimbusRequest() to invoke the function. Again, it makes little difference if you hit it 10 times in a script but if you do it a thousand times you can measure it.

    Thank you again

    Garin


  • 5.  RE: Hub queue check tool

    Posted Jun 10, 2022 10:16 AM
    Really appreciate the presentation of this content. 

    To highlight a couple things:

    I am really jealous of your attention to detail regarding documentation of your efforts. I strive to match it but understand that I'll never come close (I'm too lazy I think....)

    I struggled a long time trying to get the new QOS metrics thing to work right and never got it where I needed it to be. I read through your code and it looks just like mine so I'll have to go back again and compare. The main issue I was having is the inability to specify a source that wasn't the robot running the script. Not sure if that comes into play here. Regardless having a working example is a huge benefit given the number of errors and omissions in the documentation and physical implementation. For those who do try this, discovery_server has to vacuum up the new metric ids created locally before you'll see correct results which could be a long time depending on how you have discovery configured to run.

    Were I writing this code for my environment I'd make these additions:

    Nothing in Nimsoft happens reliably 100% of the time - especially when the number of hubs gets largish and/or you have something more than robots talking to a hub or a hub talking to a hub. As soon as you hit hub talking to hub talking to hub the likelihood of any communication failing (in my personal experience) starts being a two digit number. Ultimately I've wound up wrapping all the nimsoft.* calls in a retry loop so that can be addressed.

    I haven't tested with the latest Lua 5.4 (I think that's the current version) but the 5.2 version was very slow when accessing global variables. There's not a lot of looping in this code but when you get a couple hundred accesses to a global variable you can measure with a stop watch the runtime difference compared with a local. I've gotten in the habit of wrapping all code within a do/end pair to force any variable creation to be local rather than global. I've not looked at the actual Lua interpreter code but the argument is that the global variables are searched sequentially and that the locals are in a hash table so runtime is comparable with a small number of items but very quickly favors local variables as the number increases.

    Similar to the above everything in Lua winds up being a table or a list so when you execute nimbus.request for instance, the code has to locate the global nimbus table by looking in the global table of globals and then find the table entry named "request" in that nimbus table and then it executes that value as a function. It's a lot of lookups. You can save a bunch of that lookup time with something like "local NimbusRequest = nimbus.request" and then use the local NimbusRequest() to invoke the function. Again, it makes little difference if you hit it 10 times in a script but if you do it a thousand times you can measure it.

    Thank you again

    Garin


  • 6.  RE: Hub queue check tool

    Posted Jun 10, 2022 11:17 AM
    Only took 48 hours for those posts to be moderated - wonder what happened to cause that.....


  • 7.  RE: Hub queue check tool

    Posted Aug 01, 2022 02:34 AM
    To have this LUA script generate correct QoS metrics (that can also be used in OC metric view) you will need:
    nsa 20.50 build 190


  • 8.  RE: Hub queue check tool

    Posted Feb 26, 2023 05:46 AM

    Version 2.9.1 LUA package (doc file included in zip file)




  • 9.  RE: Hub queue check tool

    Posted Mar 08, 2023 01:06 AM
      |   view attached

    Version 2.9.2 LUA package (doc and dashboard included in the zip)

    See initial post for details


    Attachment(s)

    zip
    queuecheck_2.9.2.zip   1.43 MB 1 version