DX Infrastructure Management

Expand all | Collapse all

How do we monitor probe status

Jump to Best Answer
  • 1.  How do we monitor probe status

    Posted 01-24-2017 06:48 AM

    I had a situation where probes were showing as red in infrastructure manager console and I had to restart nimbus agent to make it green.  Is there any way we can monitor the probes status. 



  • 2.  Re: How do we monitor probe status

    Posted 01-24-2017 09:26 AM

    Please see TEC000003617 -> How to view all probes that are either in a deactivated or errored state in a domain

     

    http://www.ca.com/us/support/ca-support-online/product-content/knowledgebase-articles/tec000003617.aspx

     

    Please note that the controller will normally generate an alarm if any probes are in an error state.

     

    Please also refer to:

     

    https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.tec000004644.html

     

    Steve



  • 3.  Re: How do we monitor probe status

    Posted 01-25-2017 12:21 PM

    Hi Stephen,

     

    Thank you very much.  As you mentioned that when probes goes down, it alerts. In our case, it did not generate any alert,

    it could be  missed from configuration. Since I am totally new to nimsoft, where should I check to see if we set to alert. Can you please help. 



  • 4.  Re: How do we monitor probe status

    Posted 01-25-2017 01:48 PM

    Ragesh,

     

    Here is some further clarification:

     

    Using the robot controller callback 'probe_list' is helpful to differentiate probes that have been deactivated before a restart of the local robot and probes that 'stopped' running during the time the robot was running on the local machine - which also includes probes that run into an error state (red in IM) and probes that are deactivated after the robot has been started.

     

    For the first category, probes that have been deactivated previously before a restart of robot, the 'process_state' returned will have a value of 'none'.

     

    For the second category, the 'process_state' returned will have value of 'stopped', and for problem probes, 'last_started' that will be the time the probe was last started in number of seconds since 1970, here is an example for the output in the pu command, and restful calls will return the same value but in a different format.

     

    -- db2 is with a red icon in IM, a pu call to controller/probe_list with probe name as parameter returns the following:

     

    db2 PDS_PDS 516
    name PDS_PCH 4 db2
    description PDS_PCH 12 db2 monitor
    group PDS_PCH 9 Database
    active PDS_I 2 0
    type PDS_I 2 2
    command PDS_PCH 12 db2_monitor
    arguments PDS_PCH 1
    config PDS_PCH 16 db2_monitor.cfg
    datafile PDS_PCH 1
    logfile PDS_PCH 16 db2_monitor.log
    workdir PDS_PCH 20 probes/database/db2
    timespec PDS_PCH 1
    times_activated PDS_I 2 1
    last_action PDS_I 11 1481189432
    pid PDS_I 3 -1
    times_started PDS_I 3 12
    last_started PDS_I 11 1481189432
    pkg_name PDS_PCH 4 db2
    expires_at PDS_I 11 1506105000
    pkg_version PDS_PCH 5 4.10
    pkg_build PDS_PCH 3 18
    process_state PDS_PCH 8 stopped
    port PDS_I 3 -1
    is_marketplace PDS_I 2 0
    marketpl_block PDS_I 2 0

     

    -- iostat is deactivated before robot restart:

     

    iostat PDS_PDS 496
    name PDS_PCH 7 iostat
    description PDS_PCH 46 Generate disk QoS based on output from iost
    group PDS_PCH 7 System
    active PDS_I 2 0
    type PDS_I 2 2
    command PDS_PCH 18 ../../../bin/perl
    arguments PDS_PCH 10 iostat.pl
    config PDS_PCH 11 iostat.cfg
    logfile PDS_PCH 11 iostat.log
    workdir PDS_PCH 21 probes/system/iostat
    timespec PDS_PCH 1
    times_activated PDS_I 2 0
    last_action PDS_I 2 0
    pid PDS_I 3 -1
    times_started PDS_I 2 0
    last_started PDS_I 2 0
    pkg_name PDS_PCH 7 iostat
    pkg_version PDS_PCH 5 1.10
    pkg_build PDS_PCH 3 01
    process_state PDS_PCH 5 none
    port PDS_I 3 -1
    is_marketplace PDS_I 2 0
    marketpl_block PDS_I 2 0

     

    --- processes is deactivated without restart of the robot:

     

    processes PDS_PDS 478
    name PDS_PCH 10 processes
    description PDS_PCH 25 Process monitoring probe
    group PDS_PCH 7 System
    active PDS_I 2 0
    type PDS_I 2 2
    command PDS_PCH 10 processes
    arguments PDS_PCH 1
    config PDS_PCH 14 processes.cfg
    logfile PDS_PCH 14 processes.log
    workdir PDS_PCH 24 probes/system/processes
    timespec PDS_PCH 1
    times_activated PDS_I 2 0
    last_action PDS_I 2 0
    pid PDS_I 3 -1
    times_started PDS_I 2 2
    last_started PDS_I 2 0
    pkg_name PDS_PCH 10 processes
    pkg_version PDS_PCH 5 4.31
    pkg_build PDS_PCH 4 227
    process_state PDS_PCH 8 stopped
    port PDS_I 3 -1
    is_marketplace PDS_I 2 0
    marketpl_block PDS_I 2 0

     

    Here is an example of the pu command when using probe_list callback but Ive removed and replaced sensitive information:

     

    C:\>"C:\Program Files (x86)"\Nimsoft\bin\pu.exe -u administrator -p <password>
    <UIM_domain>/<UIM_hub>/<UIM_Robot>/controller probe_list controller
    Jan 25 13:41:43:879 pu: SSL - init: mode=0, cipher=DEFAULT, context=OK
    Jan 25 13:41:43:880 pu: nimCharsetSet() - charset=
    ======================================================
    Address: <UIM_domain>/<UIM_hub>/<UIM_Robot>/controller probe_list controller
    Request: probe_list
    ======================================================
    controller      PDS_PDS         465
     name            PDS_PCH          11 controller
     description     PDS_PCH          34 Robot process and port controller
                     ~
     group           PDS_PCH          15 Infrastructure
     active          PDS_I             2 1
     type            PDS_I             2 0
     command         PDS_PCH          15 controller.exe
     config          PDS_PCH          10 robot.cfg
     logfile         PDS_PCH          15 controller.log
     workdir         PDS_PCH           6 robot
     timespec        PDS_PCH           1
     times_activated PDS_I             2 0
     last_action     PDS_I             2 0
     pid             PDS_I             5 3304
     times_started   PDS_I             2 1
     last_started    PDS_I            11 1484088410
     pkg_name        PDS_PCH          13 robot_update
     pkg_version     PDS_PCH           5 7.80
     process_state   PDS_PCH           8 running
     port            PDS_I             6 48000
     is_marketplace  PDS_I             2 0
     marketpl_block  PDS_I             2 0

     

    In this last case, notice that the process state is 'running'



  • 5.  Re: How do we monitor probe status

    Posted 01-24-2017 09:34 AM

    I think this is a good question and I believe that we generatlly receive alert for probe issues.

     

    I am using one perl script to check the communication issue between the agent and hub. Generally i run this script once in a day and find out the communication issues. In this practice i have found couple of issues where the agent is showing green on the console however in reality there was communication issue between agent and hub when checked. Sometimes the issue persist because of mis configuration in robot.cfg and sometimes firewall etc.

     

    Regards,

    IK



  • 6.  Re: How do we monitor probe status

    Posted 01-24-2017 11:28 AM

    We have Lua scripts that check the communication to the robot and pings the device as well. UIM wasn't giving us all the alarms when a probe had failed.



  • 7.  Re: How do we monitor probe status

    Posted 01-25-2017 12:23 PM

    Thanks Jason, I had downloaded Lua script. However not sure about how to execute. Can you please help me. 



  • 8.  Re: How do we monitor probe status

    Posted 01-26-2017 03:52 PM

    There are two ways to run the Lua. 

    1. Within NAS - open the nas probe and go to the scheduler tab.

    2. NSA.exe - you can deploy the nsa compiler from your archive to any robot in your environment. That will enable you to run the script from the command line using the \Nimsoft\sdk\nsa\nsa.exe command. 



  • 9.  Re: How do we monitor probe status

    Posted 02-06-2017 10:26 AM

    If you search for a complete solution i recently created these probes to answer serious self-monitoring need on CA UIM.

     

    GitHub - fraxken/selfmonitoring: CA UIM Self monitoring probe 

    GitHub - fraxken/robots_checker: CA UIM Robots_checker (check probes, and do callback on it) 

     

    New update are comming soon ( Supp_key, Alarm Enrichment , automatic clear when resolved ). 

     

    Just some work to install Perl lib and configure the framework for your system : 

     

    Release Light R4.0 · fraxken/perluim · GitHub 

    Starter guide · fraxken/perluim Wiki · GitHub 

     

    Actually running well on a 20,000+ servers environment. ( and ~ 40 hubs ). 



  • 10.  Re: How do we monitor probe status
    Best Answer

    Posted 02-14-2017 01:29 PM

    My NAS version 

     

    GitHub - fraxken/checkconfig_lua: CA UIM Checkconfig LUA for NAS 

     

    Or you can execute a SQL request on the discovery_server tables too..  



  • 11.  Re: How do we monitor probe status

    Posted 03-08-2018 03:44 AM

    Hi Thomas,

     

    Would you please help me how to use the robot_checker.cfg ? Need it to be deployed on to local hub or any other method is there to use ?

     

    Thanks in advance.

    Nikhil



  • 12.  Re: How do we monitor probe status

    Posted 03-16-2018 09:44 AM

    Hi ! 

     

    Yeah sure ! I'm busy with a another UIM project but i'm available full time in April.

     

    If you have any questions or if you need help to install the probe  

     

    Best Regards,

    Thomas



  • 13.  Re: How do we monitor probe status

    Posted 03-16-2018 09:49 AM

    Thanks Thomas. I have deployed the robot_checker.cfg file, It is saying installed . I just created a package and added this cfg file.How to know if it is working or not ?



  • 14.  Re: How do we monitor probe status

    Posted 03-16-2018 10:23 AM

    Hi,

     

    Dont forget to take the version from the "release" tab (from my github). Latest is 1.5 :

    Release Sariah (R1.5) · fraxken/robots_checker · GitHub 

     

    The Perluim framework is bundled with probe. If you work on a NIX system you will have to re-configure the package (at the time it was only edited for Windows).

     

    To know more about the cfg and keys look at the readme on github. 

     

    Best Regards,

    Thomas