DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

data engine is getting queued so often

  • 1.  data engine is getting queued so often

    Posted Jul 19, 2019 02:28 AM
    What is the resolution for this queue in CA UIM in data engine ? We are facing this data queue issue continuously ? 

    We have followed the data engine best practices kb as well . Is there anyway for resolving this permanently ?



  • 2.  RE: data engine is getting queued so often

    Posted Jul 22, 2019 07:58 AM
    Please let me know the resolution for this ....

    What is the resolution for this queue in CA UIM in data engine ? We are facing this data queue issue continuously ? 


    We have followed the data engine best practices kb as well . Is there anyway for resolving this permanently ?


  • 3.  RE: data engine is getting queued so often

    Posted Jul 22, 2019 03:21 PM
    Hey,
    So couple of questions to start off. What the size of your environment? How many hubs, how are the structured, how many robots and metrics are you grabbing?
    What queue(s) are getting backed up that you see an issue with? 
    Regarding your SQL DB Box is it running on fast storage? Check the iops on that box to see if its crazy high and if your storage can handle it. If not then that's usually your main reason for data_engine processing backups. 


    ------------------------------
    Daniel Blanco
    Enterprise Tools Architect
    Alphaserve Technologies
    ------------------------------



  • 4.  RE: data engine is getting queued so often

    Posted Jul 22, 2019 05:17 PM
    Is the queue normally fine but backs up at specific times such as at night or on the weekend which would indicate db or system maintenance causing the database to slow down. What is needed a deep dive into why it is happening, is it db server end, at the db itself, the network connection, primary hub performance.

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 5.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 03:05 AM
    Is the queue normally fine but backs up at specific times such as at night or on the weekend which would indicate db or system maintenance causing the database to slow down. What is needed a deep dive into why it is happening, is it db server end, at the db itself, the network connection, primary hub performance.


    I dont have idea DAvid ...

    Regards
    Amar



  • 6.  RE: data engine is getting queued so often

    Posted Jul 23, 2019 01:58 AM
    Can you post the results of this query? This will ansswer some of the previous questions.
    select '01. # qos definitions' as item, COUNT(*) as cnt from S_QOS_DEFINITION with(nolock)
    union
    select '02. # qos views', COUNT(name) #view from sysobjects with(nolock) where xtype = 'V' and name like 'V_QOS%'
    union
    select '03. # qos objects', COUNT(*) from S_QOS_DATA with(nolock)
    union
    select '04. # robots', COUNT(*) from CM_NIMBUS_ROBOT with(nolock)
    union
    select '05. # robots (except hubs)', COUNT(*) from CM_NIMBUS_ROBOT with(nolock) where is_hub = 0
    union
    select '06. # hubs', COUNT(*) from CM_NIMBUS_ROBOT with(nolock) where is_hub = 1
    union
    select '07. # computer systems', COUNT(*) from CM_COMPUTER_SYSTEM with(nolock)
    union
    select '08. # todays alarms in db', COUNT(*) #msg from NAS_TRANSACTION_LOG with(nolock) where CAST(time as DATE) BETWEEN CAST(GETDATE() AS DATE) AND DATEADD(DAY, 1, CAST(GETDATE() AS DATE)) group by CAST(time as DATE)
    union
    select '09. # total alarms in db', COUNT(*) #msg from NAS_TRANSACTION_LOG with(nolock)
    union
    select '10. # todays open alarms', COUNT(*) #msg from NAS_ALARMS with(nolock) where CAST(time_origin as DATE) BETWEEN CAST(GETDATE() AS DATE) AND DATEADD(DAY, 1, CAST(GETDATE() AS DATE)) group by CAST(time_origin as DATE)
    union
    select '11. # todays open critical alarms', COUNT(*) #msg from NAS_ALARMS with(nolock) where CAST(time_origin as DATE) BETWEEN CAST(GETDATE() AS DATE) AND DATEADD(DAY, 1, CAST(GETDATE() AS DATE)) and level = '5' group by CAST(time_origin as DATE)
    union
    SELECT '12. db: CA_UIM used space MB', CAST( ((SUM(Size)* 8) / 1024.0) AS DECIMAL(18,2) ) 'Size_MB' FROM sys.master_files with(nolock) WHERE database_id = DB_ID('CA_UIM') and type_desc = 'ROWS'
    union
    SELECT '13. db: CA_UIM Log used space', CAST( ((SUM(Size)* 8) / 1024.0) AS DECIMAL(18,2) ) 'Size_MB' FROM sys.master_files with(nolock) WHERE database_id = DB_ID('CA_UIM') and type_desc = 'LOG'




  • 7.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 02:53 AM
    item cnt
    computer systems 3223
    db: CA_UIM Log used space 21159.38
    db: CA_UIM used space MB 422661.13
    hubs 2
    qos definitions 534
    qos objects 1886320
    qos views 377
    robots 1419
    robots (except hubs) 1417
    todays alarms in db 437477
    todays open alarms 5425
    todays open critical alarms 1050
    total alarms in db 55737936



  • 8.  RE: data engine is getting queued so often

    Broadcom Employee
    Posted Jul 22, 2019 05:57 PM
    Hi Madanraj,

    It would be very helpful to see your data_engine.cfg.

    How often does the queue get backed up and when does it occur?

    What data_engine version is running?

    What type of storage, Tier 1? Please describe.

    Best Regards,
    Steve

    ------------------------------
    [Designation]
    [City]
    ------------------------------



  • 9.  RE: data engine is getting queued so often

    Posted Jul 25, 2019 02:25 AM
    pls find data_engine.cfg 

    <setup>
    loglevel = 5
    logfile = data_engine.log
    logsize = 200
    locale = English
    data_management_active = yes
    data_management_time_spec = RRULE:FREQ=DAILY;INTERVAL=1;COUNT=1;BYHOUR=22;BYMINUTE=0
    data_management_timeout = 65536
    auto_reindex = yes
    hub_bulk_size = 1750
    thread_count_insert = 8
    data_management_compress = yes
    delete_raw_samples = 60
    delete_history_samples = 150
    raw_data_extra = 30
    history_avg_age_extra = 30
    daily_avg_age = 744
    daily_avg_age_extra = 30
    number_of_subpartitions = 5
    data_management_partition = yes
    provider = SQLOLEDB
    database = CA_UIM
    user = sa
    password = MTWE1nhsW1WfADB61q8FpA==
    parameters = Network Library=dbmssocn;Language=us_english
    min_free_space = 10
    monitor_interval = 300
    alarm_severity = 5
    mysql_buffer_increase = 5000
    mysql_buffer_size = 5000
    log_bulk_stats = 0
    log_inserted_rows = 1
    log_lsv_rows = 0
    table_maintenance_mode = 2
    table_maintenance_online_mode = 0
    table_maintenance_loglevel = 0
    statistics_age = 24
    statistics_pattern = RN_QOS_DATA%
    statistics_loglevel = 0
    qos_probes = no
    lsv_sleep = 3
    bucket_flush_size = 5000
    bucket_flush_time = 5
    show_admin_upgrade_primary = yes
    server = 192.168.233.8,1433
    port = 1433
    servicename =
    db_plugin = Microsoft
    data_engine_id = 1
    threads = 0
    index_frag_low = 5
    index_frag_high = 30
    compress_indexes_only = no
    index_pattern =
    queue_limit_total = 100000
    update_metric_id = yes
    </setup>
    <scripts>
    version_regxp_sp = /(exec|call).spn_bas_SetVersion.'initialize'/
    version_regexp_version = /^\d{1,2}\.\d{1,4}/
    version_split = ,
    version_version_pos = 2
    version_module_pos = 1
    version_table = tbnVersion
    <script1>
    check = yes
    filename = $DBMS_nis_base_create.sql
    </script1>
    <script2>
    check = yes
    filename = $DBMS_slm_create.sql
    </script2>
    <script3>
    check = yes
    filename = $DBMS_dataengine_create.sql
    </script3>
    </scripts>

    Regards
    AMar


  • 10.  RE: data engine is getting queued so often

    Broadcom Employee
    Posted Jul 25, 2019 09:06 AM
    I would like to webex with you and make some changes to your data_engine.cfg - what is a good time?

    Steve

    ------------------------------
    [Designation]
    [City]
    ------------------------------



  • 11.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 03:03 AM
    Pls call me @ 3 pm today on 9962104689


    And also share the webex link here ... 


    Thanks
    Amar​


  • 12.  RE: data engine is getting queued so often

    Posted Jul 29, 2019 03:21 AM
    Is it possible to take webex today @ 3 pm ...


    Please share the webex link here and reach me @ 9962104689


    Regards
    Amar​​


  • 13.  RE: data engine is getting queued so often

    Posted Jul 25, 2019 04:33 PM
    loglevel = 5  < this level is rarely used and puts significant load on the probe, set it to 1
    logsize = 200 < this sets it for a very small 200k log size forcing it to constantly rollover the log & again increasing the load
    hub_bulk_size = 1750 < as per the best practices KB this can be setup up to 2000

    beyond that there is the results from Luc's query and unanswered questions which would provide valuable details and clarity into the root cause.

    Another indicator of db health is if the nightly pruning is completing.
    if ms sql
    https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=34940

    if oracle
    https://comm.support.ca.com/kb/uim-how-to-verify-data-engine-maintenance-for-oracle/kb000128474

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 14.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 02:59 AM
    logsize should I keep more than 200 ??


  • 15.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 03:15 AM
    Best is to set:
    loglevel=1
    logsize=10000 (or something bigger than 200, because else the log will contain not much data)


  • 16.  RE: data engine is getting queued so often

    Posted Jul 29, 2019 03:26 AM
    This is LUC query ...

    item cnt
    computer systems 3223
    db: CA_UIM Log used space 21159.38
    db: CA_UIM used space MB 422661.13
    hubs 2
    qos definitions 534
    qos objects 1886320
    qos views 377
    robots 1419
    robots (except hubs) 1417
    todays alarms in db 437477
    todays open alarms 5425
    todays open critical alarms 1050
    total alarms in db 55737936

    Currently I have set  below config in data engine and still the data engine is is under queue  . Please let me know furtherly what is needed to be done .
     
    loglevel = 1
    logsize = 1000
    hub_bulk_size = 2000

    I have executed the below query in backend in ms sql server and not able to get the data after 2019-07-27 12:58:00.000 time .

    select RN_QOS_DATA_0015.sampletime,RN_QOS_DATA_0015.samplevalue,S_QOS_DATA.source from RN_QOS_DATA_0015 JOIN s_qos_data on
    S_QOS_DATA.table_id=RN_QOS_DATA_0015.table_id where source like'%cheaepsdb02%' AND QOS LIKE '%QOS_MEMORY_PHYSICAL_PERC%' and RN_QOS_DATA_0015.sampletime between '2019-07-27 00:00:00' and '2019-07-27 23:59:00' order by 1 desc;





  • 17.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 03:13 AM
    Perhaps you could install/activate a tool/script like: queuecount: https://community.broadcom.com/enterprisesoftware/viewdocument/queuecheck-lua-script-v22
    This will create QoS metrics for all of your queues + contains a sample list + prd report to give you a graphical view of your queueing (so you can see when and how much you are queueing)
    note: with the number of qos metrics you have your sql server must be tuned very well to handle the IO


  • 18.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 06:23 AM
    Hi Luc ,

    1.)If I run these scripts whether there will be any load on nas as it is single threaded !!!
    2.)If at all I use these scripts is it fine for nas ?

    I need above 2 points info as data engine is getting queued in primary hub and please let me know ...


  • 19.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 06:36 AM
    Edited by Luc Christiaens Aug 05, 2019 03:11 AM
    Normally you don't run any script from the nas (if possible)
    Such LUA scripts must be run with the NSA tool (see UIM archive) and than scheduled via logmon on a regular base so that you can have a clear view on teh queues (in the beginning each 5 minutes to help finding the problem)
    Note1: the first executions can be done via command line where you execute nsa + reference to your lua script
    Note2: you must edit the lua script to update some settings
    Note3: this information is also written more detailed in the doc file in the zip of the utility


  • 20.  RE: data engine is getting queued so often

    Posted Jul 26, 2019 06:38 AM
    In logmon I schedule each 5 minutes a bat file, and this bat file contains:
    C:\Nimsoft\sdk\nsa\nsa C:\Nimsoft\probes\service\nas\scripts\Luc\get_hub_queue_status.lua


  • 21.  RE: data engine is getting queued so often

    Posted Jul 29, 2019 03:43 AM
    Perhaps you can execute following query, it will show, by probe, the number of qos entries.  This can help you in determining what probe settings can temporary be changed to limit the # of generated qos:
    ---
    select probe,origin, de.name, source, COUNT(distinct target)'numtarget'
    from s_qos_definition de with(nolock), s_qos_data da with(nolock), cm_configuration_item_metric co with(nolock), cm_configuration_item it with(nolock), cm_device dv
    with(nolock) where de.qos_def_id = da.qos_def_id
    and co.ci_id = it.ci_id
    and it.dev_id = dv.dev_id
    group by probe, de.name, origin
    , source order by probe, de.name,source, origin  ​
    ----
    The number of entries in your table: nas_transaction_log seems to be very high, the following query can help you to check if teh daily delete in this table is working correctly:
    -------------------
    select CAST(time as DATE) time,COUNT(*) #msg
    from NAS_TRANSACTION_LOG with(nolock)
    group by CAST(time as DATE) order by CAST(time as DATE)



  • 22.  RE: data engine is getting queued so often

    Posted Aug 05, 2019 03:21 AM
    I saw in the output of the query that you have only 2 hubs.
    In another place you made the remark that you have the primary hub and secondary hub both active (data_engine, udm_manager, ...) (I don't think that this is a supported architecture/setup?)
    I suppose that all your robots are directly connected to your primary hub?  In a normal architecture we would:
    - add a hub layer
    - connect all your robots to this new hub
    - this new hub queues are GET from both main hubs (but inactive on the secondary hub), with a HA probe on the secondary hub, this way a failover would take only minimal time because the robots don't have to find a new hub.
    - the main hub has now more time to do his work because he has not to check the status of your 1400 robots anymore