DX NetOps

 View Only
Expand all | Collapse all

many gaps in all graphics

  • 1.  many gaps in all graphics

    Posted Feb 24, 2021 03:37 PM
    Hi All,

    I would to know if there is something to fix ou increase the process on data repository maintenance because all the graphs during the night period is full of gap's.You can see something else going on that can cause longer threshold process time on poll cycle and I know the daily rollups starting at 03:00 UTC. 


    thank you

    Valéria


  • 2.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Feb 24, 2021 05:03 PM

    Adjust these for local timezone of DA..

    At 2AM UTC we run a Data Retention job to drop old partitions, do some cleanup, etc.

    Daily rollups kick off at 12:30 AM UTC.  Weekly at 12:30 AM UTC on thursdays.

    At 3AM UTC, we do Analyze All Objects in vertica to recalc stats used to optimize queries.  We run a smaller scope analyze every hour.

    At 12AM UTC, we run vetica delete vector purge command, which removes stuff marked for delete during the day.

    I think those are our DR maint tasks.
    You can check DA karaf.log to see if any Daily/Weekly rollup runs, or DeleteVector, MaintenanceTaskProcessor, or PartitionMaintenance is running.




  • 3.  RE: many gaps in all graphics

    Posted Feb 25, 2021 01:35 PM
    Hello Jeffrey,

    I'm attached here some logs and I can't see any error :

    INFO | 44c1382_Worker-2 | 2021-02-24 05:00:06,611 | DeleteVectorPurger | .vertica.impl.DeleteVectorPurger 440 | re.database.irep.vertica | | Time to retrieve delete vector status for all projections: 0:00:06.610
    INFO | 44c1382_Worker-2 | 2021-02-24 05:00:06,612 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table item: 0:00:00.000
    INFO | 44c1382_Worker-2 | 2021-02-24 05:00:12,606 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table poll_item: 0:00:00.000
    INFO | 44c1382_Worker-2 | 2021-02-24 05:00:15,306 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table item_facet: 0:00:00.000
    INFO | 44c1382_Worker-1 | 2021-02-25 05:00:00,001 | DeleteVectorMaintenanceScheduler | leteVectorMaintenanceScheduler$1 163 | re.database.irep.vertica | | Performing full purge...
    INFO | 44c1382_Worker-1 | 2021-02-25 05:00:05,433 | DeleteVectorPurger | .vertica.impl.DeleteVectorPurger 440 | re.database.irep.vertica | | Time to retrieve delete vector status for all projections: 0:00:05.423
    INFO | 44c1382_Worker-1 | 2021-02-25 05:00:05,433 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table item: 0:00:00.000
    INFO | 44c1382_Worker-1 | 2021-02-25 05:00:11,109 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table poll_item: 0:00:00.000
    INFO | 44c1382_Worker-1 | 2021-02-25 05:00:13,314 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table device: 0:00:00.000
    INFO | 44c1382_Worker-1 | 2021-02-25 05:00:13,514 | DeleteVectorPurger | ertica.impl.DeleteVectorPurger$4 396 | re.database.irep.vertica | | Time to purge table item_facet: 0:00:00.000



    INFO | ository-thread-9 | 2021-02-20 00:00:00,005 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 264 | ger.core.aggregator.impl | | Starting task 'Analysis Of All Objects'
    INFO | ository-thread-9 | 2021-02-20 00:33:23,917 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 268 | ger.core.aggregator.impl | | Completed task 'Analysis Of All Objects'. Time to complete:0:33:23.911
    INFO | itory-thread-152 | 2021-02-20 23:00:00,001 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 264 | ger.core.aggregator.impl | | Starting task 'Analysis Of All Objects'
    INFO | itory-thread-152 | 2021-02-20 23:36:23,928 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 268 | ger.core.aggregator.impl | | Completed task 'Analysis Of All Objects'. Time to complete:0:36:23.923
    INFO | itory-thread-221 | 2021-02-21 23:00:00,012 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 264 | ger.core.aggregator.impl | | Starting task 'Analysis Of All Objects'
    INFO | itory-thread-221 | 2021-02-21 23:35:12,636 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 268 | ger.core.aggregator.impl | | Completed task 'Analysis Of All Objects'. Time to complete:0:35:12.623
    INFO | itory-thread-322 | 2021-02-22 23:00:00,002 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 264 | ger.core.aggregator.impl | | Starting task 'Analysis Of All Objects'
    INFO | itory-thread-322 | 2021-02-22 23:32:02,963 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 268 | ger.core.aggregator.impl | | Completed task 'Analysis Of All Objects'. Time to complete:0:32:02.956
    INFO | itory-thread-477 | 2021-02-23 23:00:00,011 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 264 | ger.core.aggregator.impl | | Starting task 'Analysis Of All Objects'
    INFO | itory-thread-477 | 2021-02-23 23:33:20,942 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 268 | ger.core.aggregator.impl | | Completed task 'Analysis Of All Objects'. Time to complete:0:33:20.931
    INFO | itory-thread-880 | 2021-02-24 23:00:00,007 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 264 | ger.core.aggregator.impl | | Starting task 'Analysis Of All Objects'
    INFO | itory-thread-880 | 2021-02-24 23:30:35,770 | toryMaintenanceTaskProcessorImpl | toryMaintenanceTaskProcessorImpl 268 | ger.core.aggregator.impl | | Completed task 'Analysis Of All Objects'. Time to complete:0:30:35.760

    INFO | Timer-2 | 2021-02-19 10:07:39,646 | OsgiServiceFactoryBean | r.support.OsgiServiceFactoryBean 373 | pringframework.osgi.core | | Unregistered service [ServiceRegistrationWrapper for {com.ca.im.dm.core.database.dao.interfaces.PartitionMaintenanceDAO}={org.springframework.osgi.bean.name=partitionMaintenanceImpl, Bundle-SymbolicName=com.ca.im.data-manager.core.database.dao.impl, Bundle-Version=20.2.1.RELEASE-238, service.id=621}]
    INFO | xtenderThread-72 | 2021-02-19 10:08:09,468 | DependencyServiceManager | startup.DependencyServiceManager 288 | gframework.osgi.extender | | Adding OSGi service dependency for importer [&partitionMaintenanceService] matching OSGi filter [(objectClass=com.ca.im.dm.core.database.dao.interfaces.PartitionMaintenanceDAO)]
    INFO | tenderThread-107 | 2021-02-19 10:10:43,457 | BaseSpringDAO | ase.dao.interfaces.BaseSpringDAO 73 | .database.dao.interfaces | | Changing active data source for class com.ca.im.dm.core.database.dao.impl.PartitionMaintenanceImpl to ROUND_ROBIN
    INFO | tenderThread-107 | 2021-02-19 10:10:43,461 | OsgiServiceFactoryBean | r.support.OsgiServiceFactoryBean 301 | pringframework.osgi.core | | Publishing service under classes [{com.ca.im.dm.core.database.dao.interfaces.PartitionMaintenanceDAO}]
    INFO | tenderThread-112 | 2021-02-19 10:10:43,541 | OsgiServiceProxyFactoryBean | al.aop.ServiceDynamicInterceptor 493 | pringframework.osgi.core | | Looking for mandatory OSGi service dependency for bean [partitionMaintenanceService] matching filter (objectClass=com.ca.im.dm.core.database.dao.interfaces.PartitionMaintenanceDAO)

    How can I see if the Daily/Weekly rollup is working fine?
    I think the problem should be the time that all tasks it takes to finish because the poll cycle percent hits 138.24.

    thank you,

    Valéria




  • 4.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Feb 25, 2021 01:57 PM
    How can I see if the Daily/Weekly rollup is working fine?
    cd /opt/IMDataAggregator/apache-karaf-2.4.3/data/log
    grep rollup karaf.log
    grep DeleteVector karaf.log
    grep MaintenanceTaskProcessor karaf.log

    ------------------------------
    Sr Support Engineer
    Broadcom
    ------------------------------



  • 5.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Feb 25, 2021 06:54 PM

    So would appear the analyze all objects is what is causing the spikes in DR maint tasks, and looks to start thresholding on the upward trend of time to process.
    It's taking 30+ mins to analyze all objects.  That is us calling ANALYZE_HISTOGRAM on each table for a few item_id and tstamp.

    But the spike for thresholding seem to be related delete vectors where vertica tries and goes and cleans up all those delete vectors.
    It's only 6 secs, which doesn't sound like it should cause the thesholding increased time to process.

    Not really sure there is a way to speed either of these up and reduce their impact to the system.  We are calling internal vertica funcs that are meant to speed up the system/queries.

    As for daily/weekly, look in karaf.log for "Daily/Weekly rollup work starting for" and "Daily/Weekly rollup work complete".  The diff in start/complete is how long it took.  I believe you can also look By Component at DA Rollup metric family data to see how long various hourly/daily/weekly rollups took for diff metric families.




  • 6.  RE: many gaps in all graphics

    Posted Feb 26, 2021 05:19 PM
    Hello Jeffrey,

    I think that process about daily/weekly rollup is taking a long time and you can see in some logs:

    INFO | -consumer-thread | 2021-02-19 22:30:00,291 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [6] ...
    INFO | kerTask-thread-2 | 2021-02-19 22:43:28,828 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete
    INFO | -consumer-thread | 2021-02-20 22:30:00,067 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [6] ...
    INFO | kerTask-thread-9 | 2021-02-20 22:49:03,746 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete
    INFO | -consumer-thread | 2021-02-21 21:30:00,070 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [3987, 3987] ...
    INFO | erTask-thread-13 | 2021-02-21 21:44:19,001 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete
    INFO | -consumer-thread | 2021-02-22 21:30:00,118 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [6] ...
    INFO | erTask-thread-17 | 2021-02-22 21:43:11,238 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete
    INFO | -consumer-thread | 2021-02-23 21:30:00,074 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [6] ...
    INFO | erTask-thread-25 | 2021-02-23 21:42:50,727 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete
    INFO | -consumer-thread | 2021-02-24 21:30:00,144 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [6] ...
    INFO | erTask-thread-29 | 2021-02-24 21:50:18,585 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete
    INFO | -consumer-thread | 2021-02-25 21:30:00,067 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 171 | .ca.im.aggregator.loader | | Daily/Weekly rollup work starting for dcmIds = [6] ...
    INFO | erTask-thread-33 | 2021-02-25 21:45:49,191 | DailyWeeklyRollupManagerImpl | mpl.DailyWeeklyRollupManagerImpl 143 | .ca.im.aggregator.loader | | Daily/Weekly rollup work complete

    Is there anyway to optimize this process on vertica database?
    This database is very old about 5 or 4 years and I think that could be the size of it that increased a lot too.


    I saw some metrics like these ,it seems could be the problem and in this case I have to take out from CAPC ?




    thank you,


  • 7.  RE: many gaps in all graphics

    Posted Mar 01, 2021 09:39 AM
    Hello Jeffrey,

    I've realized that Data Aggregator Active Broker and Data Aggregator Throughput had a high peaks as you can see:


    Can I optimize this or only with upgrade?

    Thank you

    Valéria


  • 8.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 01, 2021 12:16 PM
    The peaks of the AMQ are really attributed to 80ms increase in response time.  Possibly due to a small garbage collection or something other reason.

    The large spike enqueue messages could be due to DA not being available to DC(s), and DC cached data and when it reconnected, it had a bigger block of messages to process and added a lot of end of cycle messages to the AMQ for rollup processing.  
    It could be there was some devices that came back available, and we backfilled lots of availability data which resulted in many end of cycle messages being added to AMQ for rollup processing.


  • 9.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 01, 2021 01:43 PM

    As far as speeding up daily/weekly rollups, that sounds like a matter of vertica resources.

    What are the specs on DR nodes?  memory/cpu?

    Vertica has some tools it ships with.
    https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-netops/20-2/Performance-Monitoring-with-DX-Performance-Management/administrating/data-repository-administration/run-data-repository-diagnostic-utilities.html

    You can run vcpuperf, and vnetperf. 
    You can run vioperf, but be sure to run at the same time on all 3 nodes in case you are using shared storage between 3 nodes.  Also, run it passing location of the data partition so we can test how fast we can read and write to DB files.




  • 10.  RE: many gaps in all graphics

    Posted Mar 02, 2021 12:17 PM
    Hello Jeffrey,

    I did all the tasks that you told me and so here the results:

    NODE 1:

    Compiled with: 4.8.2 20140120 (Red Hat 4.8.2-15)
    Expected time on Core 2, 2.53GHz: ~9.5s
    Expected time on Nehalem, 2.67GHz: ~9.0s
    Expected time on Xeon 5670, 2.93GHz: ~8.0s

    This machine's time:
    CPU Time: 16.530000s
    Real Time:16.570000s

    Some machines automatically throttle the CPU to save power.
    This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
    Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

    This machine's high load time: 107 microseconds.
    This machine's low load time: 361 microseconds.


    The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster. The recommended I/O is 40 MB/s per physical core on each node. For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.

    Using direct io (buffer size=1048576, alignment=512) for directory "/opt/CA/catalog"

    test | directory | counter name | counter value | counter value (10 sec avg) | counter value/core | counter value/core (10 sec avg) | thread count | %CPU | %IO Wait | elapsed time (s)| remaining time (s)
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Write | /opt/CA/catalog | MB/s | 333 | 333 | 27.75 | 27.75 | 12 | 46 | 42 | 10 | 5
    Write | /opt/CA/catalog | MB/s | 357 | 400 | 29.75 | 33.3333 | 12 | 49 | 41 | 15 | 0
    ReWrite | /opt/CA/catalog | (MB-read+MB-write)/s| 266+266 | 266+266 | 22.1667+22.1667 | 22.1667+22.1667 | 12 | 39 | 50 | 10 | 5
    ReWrite | /opt/CA/catalog | (MB-read+MB-write)/s| 229+229 | 155+155 | 19.0833+19.0833 | 12.9167+12.9167 | 12 | 29 | 64 | 15 | 0
    Read | /opt/CA/catalog | MB/s | 403 | 403 | 33.5833 | 33.5833 | 12 | 45 | 43 | 10 | 5
    Read | /opt/CA/catalog | MB/s | 409 | 425 | 34.0833 | 35.4167 | 12 | 47 | 42 | 15 | 0
    SkipRead | /opt/CA/catalog | seeks/s | 2928 | 2928 | 244 | 244 | 12 | 33 | 29 | 10 | 5
    SkipRead | /opt/CA/catalog | seeks/s | 2985 | 3097 | 248.75 | 258.083 | 12 | 33 | 33 | 15 | 0



    NODE 2:

    Compiled with: 4.8.2 20140120 (Red Hat 4.8.2-15)
    Expected time on Core 2, 2.53GHz: ~9.5s
    Expected time on Nehalem, 2.67GHz: ~9.0s
    Expected time on Xeon 5670, 2.93GHz: ~8.0s

    This machine's time:
    CPU Time: 17.050000s
    Real Time:17.160000s

    Some machines automatically throttle the CPU to save power.
    This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
    Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

    This machine's high load time: 129 microseconds.
    This machine's low load time: 356 microseconds.


    The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster. The recommended I/O is 40 MB/s per physical core on each node. For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.

    Using direct io (buffer size=1048576, alignment=512) for directory "/opt/CA/catalog"

    test | directory | counter name | counter value | counter value (10 sec avg) | counter value/core | counter value/core (10 sec avg) | thread count | %CPU | %IO Wait | elapsed time (s)| remaining time (s)
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Write | /opt/CA/catalog | MB/s | 500 | 500 | 41.6667 | 41.6667 | 12 | 45 | 42 | 10 | 5
    Write | /opt/CA/catalog | MB/s | 504 | 502 | 42 | 41.8333 | 12 | 42 | 41 | 15 | 0
    ReWrite | /opt/CA/catalog | (MB-read+MB-write)/s| 317+317 | 317+317 | 26.4167+26.4167 | 26.4167+26.4167 | 12 | 20 | 61 | 10 | 5
    ReWrite | /opt/CA/catalog | (MB-read+MB-write)/s| 323+323 | 328+328 | 26.9167+26.9167 | 27.3333+27.3333 | 12 | 18 | 64 | 15 | 0
    Read | /opt/CA/catalog | MB/s | 480 | 480 | 40 | 40 | 12 | 22 | 65 | 10 | 5
    Read | /opt/CA/catalog | MB/s | 483 | 489 | 40.25 | 40.75 | 12 | 19 | 67 | 15 | 0
    SkipRead | /opt/CA/catalog | seeks/s | 2378 | 2378 | 198.167 | 198.167 | 12 | 10 | 46 | 10 | 5
    SkipRead | /opt/CA/catalog | seeks/s | 2446 | 2580 | 203.833 | 215 | 12 | 7 | 48 | 15 | 0


    NODE 3 

    Compiled with: 4.8.2 20140120 (Red Hat 4.8.2-15)
    Expected time on Core 2, 2.53GHz: ~9.5s
    Expected time on Nehalem, 2.67GHz: ~9.0s
    Expected time on Xeon 5670, 2.93GHz: ~8.0s

    This machine's time:
    CPU Time: 16.120000s
    Real Time:16.160000s

    Some machines automatically throttle the CPU to save power.
    This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
    Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

    This machine's high load time: 129 microseconds.
    This machine's low load time: 442 microseconds.

    The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster. The recommended I/O is 40 MB/s per physical core on each node. For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.

    Using direct io (buffer size=1048576, alignment=512) for directory "/opt/CA/data"

    test | directory | counter name | counter value | counter value (10 sec avg) | counter value/core | counter value/core (10 sec avg) | thread count | %CPU | %IO Wait | elapsed time (s)| remaining time (s)
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Write | /opt/CA/data | MB/s | 447 | 447 | 37.25 | 37.25 | 12 | 38 | 50 | 10 | 5
    Write | /opt/CA/data | MB/s | 465 | 498 | 38.75 | 41.5 | 12 | 41 | 49 | 15 | 0
    ReWrite | /opt/CA/data | (MB-read+MB-write)/s| 287+287 | 287+287 | 23.9167+23.9167 | 23.9167+23.9167 | 12 | 25 | 58 | 10 | 5
    ReWrite | /opt/CA/data | (MB-read+MB-write)/s| 288+288 | 293+293 | 24+24 | 24.4167+24.4167 | 12 | 25 | 58 | 15 | 0
    Read | /opt/CA/data | MB/s | 424 | 424 | 35.3333 | 35.3333 | 12 | 33 | 53 | 10 | 5
    Read | /opt/CA/data | MB/s | 439 | 465 | 36.5833 | 38.75 | 12 | 30 | 58 | 15 | 0
    SkipRead | /opt/CA/data | seeks/s | 1929 | 1929 | 160.75 | 160.75 | 12 | 4 | 46 | 10 | 5
    SkipRead | /opt/CA/data | seeks/s | 1981 | 2079 | 165.083 | 173.25 | 12 | 3 | 51 | 15 | 0

    I can't run the vnetperf on the all NODES I didn't know what was the reason that i can find the host's name on repository server.

    So you can see there is a problem when I run vcpuperf in all NODES because the CPU Time is very high.Is It could be the reason the gap's in charts and others problems like response to do all the tasks in CAPC?

    thank you

    Valéria








  • 11.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 02, 2021 12:28 PM

    Starting with CPU, it appears from the low/high load difference of over 200ms, that cpu throttling may be enabled.  This can affect performance of vertica.

    Dynamic CPU frequency scaling (also known as CPU throttling) is a technique in computer architecture where a processor is run at a less-than-maximum frequency in order to conserve power.

    In dr_validate.sh we check for cpu throttling by checking /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor files for performance value being selected.
    You can see it that cpufreq dir exists under each cpuXX directory.
    Also, you can run this to see what it says about CPU throttling...   sudo cpupower frequency-info
    What is the clock speed of your DR CPUs?  I agree 16.1 secs is high.


    With Disk speed, is your data and catalog on the same disk?  I see you ran it for /opt/CA/catalog.
    Is that a local disk/RAID or mount?
    The speeds per core look good to me.




  • 12.  RE: many gaps in all graphics

    Posted Mar 02, 2021 01:26 PM
    Hello Jeffrey,

    How can I disable the throttling on CPU ? 
    Do you think that it can resolve the gap's in the charts?  


    i'm sending you the information about NODE 1   Using cat /proc/cpuinfo:

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 46
    model name : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz
    stepping : 6
    microcode : 4294967295
    cpu MHz : 2128.873
    cache size : 24576 KB
    physical id : 0
    siblings : 12
    core id : 0
    cpu cores : 12
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 11
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
    bogomips : 4257.74
    clflush size : 64
    cache_alignment : 64
    address sizes : 42 bits physical, 48 bits virtual
    power management:


    CPU NODE 2 Using cat /proc/cpuinfo:

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 46
    model name : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz
    stepping : 6
    microcode : 4294967295
    cpu MHz : 2170.272
    cache size : 24576 KB
    physical id : 0
    siblings : 12
    core id : 0
    cpu cores : 12
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 11
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
    bogomips : 4340.54
    clflush size : 64
    cache_alignment : 64
    address sizes : 42 bits physical, 48 bits virtual
    power management:



    CPU NODE 3 Using cat /proc/cpuinfo

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 46
    model name : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz
    stepping : 6
    microcode : 4294967295
    cpu MHz : 2193.307
    cache size : 24576 KB
    physical id : 0
    siblings : 12
    core id : 0
    cpu cores : 12
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 11
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
    bogomips : 4386.61
    clflush size : 64
    cache_alignment : 64
    address sizes : 42 bits physical, 48 bits virtual
    power management:

    About your question on the Disk.We have 3 NODES and all repository servers are in the same node and in the same RAID.

    Thank you for your support

    Valéria


  • 13.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 02, 2021 02:39 PM
    So first, 2.26GHz is well below vertica recommended to us that a minimum of 2.6GHz should be used if possible.

    https://www.vertica.com/kb/GenericHWGuide/Content/Hardware/GenericHWGuide.htm
    Note: this is intended for large organizations using vertica, so they do try and push for higher specs. 

    https://ftpdocs.broadcom.com/WebInterface/phpdocs//7/8568/sizer/index.html  This is the sizing guide we use for PM/vertica based on your scale.  We mention 2.6GHz per core.  For high end customers, we recommend 3.0GHz+.

    As far as cpu throttling disablement... we try and do it via the linux settings for cpufreq/scaling_governor but some systems (like physical h/w) require BIOS changes.  You'd have to check with h/w vendor on how to disable as it's vendor/system specific.




  • 14.  RE: many gaps in all graphics

    Posted Mar 03, 2021 02:23 PM
    Hello Jeffrey,

    First of all, I really appreciate your help and you helped me a lot during these days.
    Could you send me a guide about the cpu throttling disablement? because I didn't do something like that or you send me some examples that can help me about this issue.
    Do you have anything about to delete repeated devices on vertica dabatase or bests practices that can help me to improve operational performance in vertica database.

    Thank you

    Valéria


  • 15.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 03, 2021 07:17 PM
    Sorry, I don't have any docs on cpu throttling disablement.  It's a BIOS thing and all BIOS are different.  Here is an example from googling for "disable cpu throttling in bios":
        https://www.thomas-krenn.com/en/wiki/Disable_CPU_Power_Saving_Management_in_BIOS 

    Depending on the BIOS maker for your PC's running vertica, setting options may change in BIOS.

    We should only be discovering a device once per IP domain, so there really isn't anything to cleanup unless you are seeing same IP for multiple devices in Monitored Devices.
    There is a script called cleanupDeletedItems.sh (you can contact support for), that will remove some table entries for items deleted from the item table.  We didn't cleanup the other tables until 20.2.6 automatically.  We've seen if this cleans up many rows, it can help with query performance.  20.2.6+ now has a nightly job to remove those deleted items from other tables in the DB.
    Also, if you don't sync Not Present items to PC, on the DA is /opt/IMDataAggregator/scripts/remove_not_present_items.sh that can be used to remove those items.  Depending on number of Not Present, that can help with query performance as there are less rows to filter. 
    20.2.6+ now also has a nightly job that deletes Not Present items that have no data ever, and then cleanup deleted items runs right after that.  So we removed the need to run either of those scripts.


  • 16.  RE: many gaps in all graphics

    Posted Mar 09, 2021 08:58 AM
    Jeffrey,

    I was reading how to use the ./remove_not_present_items.sh and about it's documentation ./remove_not_present_items.sh -h host_name
    the host_name would be the data aggregator ip?

    thank you

    Valéria


  • 17.  RE: many gaps in all graphics

    Posted Mar 09, 2021 09:13 AM
    Jeffrey,

    Today I saw these graphics and logs above below:





    tail LongRunningAndFailedQueries.log
    2021-03-08 16:01:28,0372021-03-08 16:01:28,037 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=30, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .EndTime(300), .UtilizationIn.Avg, .UtilizationOut.Avg, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE (((.PollItem.ID = '231705','231708','231702','231706','231742','231704','231731','2291861','231701','2291860','231728','231743','231717','231707','231709','231725','231718','231724','231712','231749','231713','231740','231722','231750'))) AND .EndTime(300) > 1614741300 AND .EndTime(300) <= 1614827700 GROUPBY .PollItem.ID, .EndTime(300) ORDERBY .PollItem.ID DESC, .EndTime(300) ASC, sqlQuery=select /*+label(RIBQuery_7bd23c29_2a96_4ddf_a822_ebfa4a42377a)*/ p4.item_id as ".PollItem.ID", IFSTATS.tstamp-mod(IFSTATS.tstamp-1,300)+300-1 as ".EndTime(300)", AVG(IFSTATS.im_UtilizationIn) as ".UtilizationIn.Avg", AVG(IFSTATS.im_UtilizationOut) as ".UtilizationOut.Avg", GREATEST(300,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE ((((p4.item_id) IN (('231705'),('231708'),('231702'),('231706'),('231742'),('231704'),('231731'),('2291861'),('231701'),('2291860'),('231728'),('231743'),('231717'),('231707'),('231709'),('231725'),('231718'),('231724'),('231712'),('231749'),('231713'),('231740'),('231722'),('231750')))) AND IFSTATS.tstamp > 1614741000 AND IFSTATS.tstamp <= 1614827700) GROUP BY p4.item_id, IFSTATS.tstamp-mod(IFSTATS.tstamp-1,300)+300-1 ORDER BY p4.item_id DESC, IFSTATS.tstamp-mod(IFSTATS.tstamp-1,300)+300-1 ASC, queryId=RIBQuery_7bd23c29_2a96_4ddf_a822_ebfa4a42377a, pageId=e5a417b4-fc02-4f1b-b7db-6d314880fc8a|(2000500), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=4, numberOfConcurrentRealTimeQueries=4, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-08 16:04:02,1312021-03-08 16:04:02,131 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=19, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .Item.Name, .Resolution.Returned, .Utilization.Avg, .EndTime(300) FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE ((.PollItem.ID = '285086','285091','285101','285039','285093','285092','285097','285096','285099','285121','285098','285100','285094','285095','3069425','285132','3022773','2928493','2629885','2917157','3023575','285130','2631618','285106','3046277','285041','285010','285048','285131','3091235','2416437','285040','285105','285088','2648777','285087','285108','285089','285104','285109','285107','285103','285053','3021383','285102','285032','285059','3008192','2683457','285033')) AND .EndTime(300) > 1614741300 AND .EndTime(300) <= 1614827700 GROUPBY .PollItem.ID, .Item.Name, .EndTime(300) ORDERBY .PollItem.ID ASC, sqlQuery=select /*+label(RIBQuery_e4d17270_c8e4_47b5_8a55_bb8540d5bcc9)*/ p4.item_id as ".PollItem.ID", i.item_name as ".Item.Name", GREATEST(300,MAX(IFSTATS.rinterval)) as ".Resolution.Returned", AVG(IFSTATS.im_Utilization) as ".Utilization.Avg", IFSTATS.tstamp-mod(IFSTATS.tstamp-1,300)+300-1 as ".EndTime(300)" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) INNER JOIN item i ON (i.item_id = IFSTATS.item_id) WHERE (((p4.item_id) IN (('285086'),('285091'),('285101'),('285039'),('285093'),('285092'),('285097'),('285096'),('285099'),('285121'),('285098'),('285100'),('285094'),('285095'),('3069425'),('285132'),('3022773'),('2928493'),('2629885'),('2917157'),('3023575'),('285130'),('2631618'),('285106'),('3046277'),('285041'),('285010'),('285048'),('285131'),('3091235'),('2416437'),('285040'),('285105'),('285088'),('2648777'),('285087'),('285108'),('285089'),('285104'),('285109'),('285107'),('285103'),('285053'),('3021383'),('285102'),('285032'),('285059'),('3008192'),('2683457'),('285033'))) AND IFSTATS.tstamp > 1614741300 AND IFSTATS.tstamp <= 1614827700) GROUP BY p4.item_id, i.item_name, IFSTATS.tstamp-mod(IFSTATS.tstamp-1,300)+300-1 ORDER BY p4.item_id ASC, queryId=RIBQuery_e4d17270_c8e4_47b5_8a55_bb8540d5bcc9, pageId=fcbe5aa2-840d-48a8-a9ac-9316a0f0cac1|(2000500), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=8, numberOfConcurrentRealTimeQueries=8, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 09:43:06,2962021-03-09 09:43:06,296 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=30, time_SQLExecution_ms=62099, time_QueryTotal_ms=62099, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=1, numberOfColumnsReturned=3, sort_Nature=TIME_SERIES_SORTED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=HOURLY_ROLLUP, tableRequestedByMetricFactTableTypeProperty=HOURLY_ROLLUP, tableChosen=HOURLY_ROLLUP, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(3600), .Availability.Avg, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(3600) > 1615279260 AND .EndTime(3600) <= 1615293660 GROUPBY .EndTime(3600) ORDERBY .EndTime(3600) ASC, sqlQuery=select /*+label(RIBQuery_33015b1b_d330_48e1_ac6c_055846fb8a60)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 as ".EndTime(3600)", AVG(IFSTATS.im_Availability) as ".Availability.Avg", GREATEST(3600,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_ltd IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615276800 AND IFSTATS.tstamp <= 1615294800) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 ASC, queryId=RIBQuery_33015b1b_d330_48e1_ac6c_055846fb8a60, pageId=98b76651-d0cb-42cd-baa9-4fc78aa1443e|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=2, numberOfConcurrentRealTimeQueries=2, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 09:43:06,3382021-03-09 09:43:06,338 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=11, time_SQLExecution_ms=62158, time_QueryTotal_ms=62158, time_ToDetermineSortCharacteristic=1, numberOfRowsReturned=69, numberOfColumnsReturned=3, sort_Nature=TIME_SERIES_SORTED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=7, ribQuery=SELECT .EndTime(60), .BitsIn.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(60) > 1615279260 AND .EndTime(60) <= 1615293660 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_fd12ea45_f69a_4191_b928_f1bd46a66f25)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsIn/IFSTATS.duration) as ".BitsIn.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615279200 AND IFSTATS.tstamp <= 1615293660) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_fd12ea45_f69a_4191_b928_f1bd46a66f25, pageId=98b76651-d0cb-42cd-baa9-4fc78aa1443e|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=3, numberOfConcurrentRealTimeQueries=1, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 09:43:44,0182021-03-09 09:43:44,018 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=20, time_SQLExecution_ms=78333, time_QueryTotal_ms=78333, time_ToDetermineSortCharacteristic=1, numberOfRowsReturned=69, numberOfColumnsReturned=3, sort_Nature=TIME_SERIES_SORTED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=7, ribQuery=SELECT .EndTime(60), .BitsIn.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(60) > 1615279320 AND .EndTime(60) <= 1615293720 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_d862d630_ffb1_4cbb_9806_2bdb5f182374)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsIn/IFSTATS.duration) as ".BitsIn.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615279260 AND IFSTATS.tstamp <= 1615293720) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_d862d630_ffb1_4cbb_9806_2bdb5f182374, pageId=8643731e-b6a7-4ff5-8c3f-eddcb7c4582c|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=6, numberOfConcurrentRealTimeQueries=6, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 09:43:45,8702021-03-09 09:43:45,870 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=22, time_SQLExecution_ms=80191, time_QueryTotal_ms=80191, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=1, numberOfColumnsReturned=6, sort_Nature=NOT_TIMESERIES, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .PollItem.DevDisplayName, .Item.DisplayName, .Item.DisplayDescription, .SpeedIn.Avg, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(60) > 1615279320 AND .EndTime(60) <= 1615293720 GROUPBY .PollItem.ID, .PollItem.DevDisplayName, .Item.DisplayName, .Item.DisplayDescription ORDERBY .PollItem.DevDisplayName DESC LIMIT 10, sqlQuery=select /*+label(RIBQuery_de71904c_d770_4e5a_841b_4a15cfc7f1a6)*/ p4.item_id as ".PollItem.ID", p4.device_display_name as ".PollItem.DevDisplayName", i.item_name as ".Item.DisplayName", i.item_display_description as ".Item.DisplayDescription", AVG(IFSTATS.im_SpeedIn) as ".SpeedIn.Avg", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) INNER JOIN item i ON (i.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615279320 AND IFSTATS.tstamp <= 1615293720) GROUP BY p4.item_id, p4.device_display_name, i.item_name, i.item_display_description ORDER BY p4.device_display_name IS NULL ASC, p4.device_display_name DESC LIMIT 10, queryId=RIBQuery_de71904c_d770_4e5a_841b_4a15cfc7f1a6, pageId=8643731e-b6a7-4ff5-8c3f-eddcb7c4582c|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=5, numberOfConcurrentRealTimeQueries=5, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 09:43:57,2592021-03-09 09:43:57,259 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=13, time_SQLExecution_ms=91602, time_QueryTotal_ms=91602, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=1, numberOfColumnsReturned=3, sort_Nature=TIME_SERIES_SORTED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=HOURLY_ROLLUP, tableRequestedByMetricFactTableTypeProperty=HOURLY_ROLLUP, tableChosen=HOURLY_ROLLUP, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(3600), .Availability.Avg, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(3600) > 1615279320 AND .EndTime(3600) <= 1615293720 GROUPBY .EndTime(3600) ORDERBY .EndTime(3600) ASC, sqlQuery=select /*+label(RIBQuery_ed2c250e_e9d3_4905_a2fe_cb5e7473b785)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 as ".EndTime(3600)", AVG(IFSTATS.im_Availability) as ".Availability.Avg", GREATEST(3600,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_ltd IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615276800 AND IFSTATS.tstamp <= 1615294800) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 ASC, queryId=RIBQuery_ed2c250e_e9d3_4905_a2fe_cb5e7473b785, pageId=8643731e-b6a7-4ff5-8c3f-eddcb7c4582c|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=4, numberOfConcurrentRealTimeQueries=4, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 10:08:56,8052021-03-09 10:08:56,805 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=8, time_SQLExecution_ms=60610, time_QueryTotal_ms=60610, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=1, numberOfColumnsReturned=3, sort_Nature=TIME_SERIES_SORTED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=HOURLY_ROLLUP, tableRequestedByMetricFactTableTypeProperty=HOURLY_ROLLUP, tableChosen=HOURLY_ROLLUP, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(3600), .Availability.Avg, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(3600) > 1615280820 AND .EndTime(3600) <= 1615295220 GROUPBY .EndTime(3600) ORDERBY .EndTime(3600) ASC, sqlQuery=select /*+label(RIBQuery_9654bb47_840e_474c_be57_9876eee8b882)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 as ".EndTime(3600)", AVG(IFSTATS.im_Availability) as ".Availability.Avg", GREATEST(3600,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_ltd IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615280400 AND IFSTATS.tstamp <= 1615298400) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,3600)+3600-1 ASC, queryId=RIBQuery_9654bb47_840e_474c_be57_9876eee8b882, pageId=ac58d30a-2a6d-42f9-ad8c-9f5c8e247bfa|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=6, numberOfConcurrentRealTimeQueries=6, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 10:08:57,6462021-03-09 10:08:57,646 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=12, time_SQLExecution_ms=61456, time_QueryTotal_ms=61456, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=1, numberOfColumnsReturned=6, sort_Nature=NOT_TIMESERIES, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .PollItem.DevDisplayName, .Item.DisplayName, .Item.DisplayDescription, .SpeedIn.Avg, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '252986' AND .EndTime(60) > 1615280820 AND .EndTime(60) <= 1615295220 GROUPBY .PollItem.ID, .PollItem.DevDisplayName, .Item.DisplayName, .Item.DisplayDescription ORDERBY .PollItem.DevDisplayName DESC LIMIT 10, sqlQuery=select /*+label(RIBQuery_6fbbeaa4_2161_4314_bdfa_ca9f77d83c8a)*/ p4.item_id as ".PollItem.ID", p4.device_display_name as ".PollItem.DevDisplayName", i.item_name as ".Item.DisplayName", i.item_display_description as ".Item.DisplayDescription", AVG(IFSTATS.im_SpeedIn) as ".SpeedIn.Avg", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) INNER JOIN item i ON (i.item_id = IFSTATS.item_id) WHERE (p4.item_id = '252986' AND IFSTATS.tstamp > 1615280820 AND IFSTATS.tstamp <= 1615295220) GROUP BY p4.item_id, p4.device_display_name, i.item_name, i.item_display_description ORDER BY p4.device_display_name IS NULL ASC, p4.device_display_name DESC LIMIT 10, queryId=RIBQuery_6fbbeaa4_2161_4314_bdfa_ca9f77d83c8a, pageId=ac58d30a-2a6d-42f9-ad8c-9f5c8e247bfa|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=4, numberOfConcurrentRealTimeQueries=5, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 10:34:32,1592021-03-09 10:34:32,159 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=420, time_SQLExecution_ms=60656, time_QueryTotal_ms=60656, time_ToDetermineSortCharacteristic=1, numberOfRowsReturned=100, numberOfColumnsReturned=8, sort_Nature=NOT_TIMESERIES, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=HOURLY_BASELINE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=HOURLY_BASELINE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .PollItem.DeviceID, .PollItem.DevDisplayName, .Item.Name, .Item.DisplayDescription, .UtilizationIn.Avg, .UtilizationIn.DevBaselineNorm, .UtilizationIn.DevBaselinePct FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .Group.GroupID = '3997' AND .DeviceType.Type = 'device' AND (.DeviceType.Subtype = 'router') AND .EndTime(60) > 1615293180 AND .EndTime(60) <= 1615296780 GROUPBY .PollItem.ID, .PollItem.DeviceID, .Item.Name, .Item.DisplayDescription ORDERBY .UtilizationIn.DevBaselinePct DESC LIMIT 100, sqlQuery=select /*+label(RIBQuery_6b085697_b485_4e10_8d43_7f1265520dc6)*/ v1.".ID",v1.".PollItem.DeviceID",p4.device_display_name ".PollItem.DevDisplayName",i.item_name ".Item.Name",i.item_display_description ".Item.DisplayDescription",v1.".UtilizationIn.Avg", v2.".UtilizationIn.DevBaselineNorm" , CASE WHEN v2.".UtilizationIn.DevBaselineNorm" <> 0 THEN ((v1.".UtilizationIn.Avg" - v2.".UtilizationIn.DevBaselineNorm")/v2.".UtilizationIn.DevBaselineNorm"*100) ELSE 0 END AS ".UtilizationIn.DevBaselinePct" FROM (SELECT p4.item_id as ".ID", p4.device_item_id as ".PollItem.DeviceID", AVG(IFSTATS.im_UtilizationIn) as ".UtilizationIn.Avg" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) INNER JOIN ( SELECT distinct device_item_id, item_id from v_etl_poll_item p3 inner join (SELECT distinct dgRW.member_item_id from v_etl_group_membership dgRW WHERE ( dgRW.group_item_id IN (2936843,2906124,2936838,1977334,3007517,4088,4089,2936849,1121302,3049450,3049445,2903016,1977361,2696167,1977359,4113,4114,4115,4116,4117,4118,3032058,99356,99355,3014643,99358,99357,99364,99363,99366,99365,99368,2962507,99367,99369,2636866,2920519,2697283,99378,99377,3052635,1298471,2926683,2268252,98363,98366,98365,2936874,2920511,2658356,325733,100451,325731,684130,1951857,1980530,2986141,328817,2961560,1980515,2920594,3015826,3030125,2936944,1977499,2936939,1977497,1977498,914571,3029095,1977495,3030120,914568,2934887,1985685,1977491,1977489,1977490,1977485,2922624,1977486,914586,1977481,1977479,1977480,325785,914580,3015888,3014861,2920656,3015883,2920651,2920646,2263244,2920641,3049665,914618,3015900,2263254,881848,3015894,2967725,3029166,914634,2961576,914626,2985122,791746,3049660,2972859,3049655,2988213,773330,2972850,914670,2638089,1923326,2881808,914666,3049739,1923322,2881807,3030281,1923320,3049734,2636058,1081570,2268436,2881817,914681,3006748,2881811,3049752,123131,2881816,3006740,914675,3040530,3031279,2994412,3015914,3014887,3015909,1263898,914689,3036386,2641148,2268407,2704638,3015929,3015924,2994417,2262337,2904394,2268485,3011910,2669891,3056992,2964783,1303895,1042759,1046852,2634023,3028288,3038608,1263987,2268546,100711,2995589,2951577,2658718,2961816,3034515,124287,2995565,124289,3040622,124291,2631022,3040617,2631023,2631024,2631009,915847,2631013,3040612,3049853,2631037,2631038,3040632,2976117,2919798,1290640,3040627,2631029,2951537,2631030,2986356,2891123,2631031,2909641,3050956,2950604,3016138,915883,117159,3037640,2951621,3016131,3037635,2950593,2926018,2929109,2950614,2950609,3050961,3016146,2555299,2948523,2925989,3037608,3037601,2926013,3016123,1210824,2700721,2948535,2926008,781784,2948530,3005965,2951688,2414096,2414091,2923033,2923032,2671128,1291800,2637295,2414062,790042,790038,1971717,790040,1977919,1977920,1977915,1977916,2971209,1977911,1977912,792098,1977905,792100,1977903,1258020,3022431,1977904,1258019,2891358,751162,1258017,1258021,2703953,2923054,2950698,2923049,2936361,1983067,2950697,2700845,2951724,2950694,2950696,2950695,1977939,2936356,2923044,1977933,1977931,116310,1977932,3049020,1977929,1977930,1977925,2986551,1977923,1977924,2957954,781926,2296462,2692754,2692753,2692755,1633945,3054192,1633947,2957936,1633948,3086954,1633951,3056228,2689640,2932324,124561,3007103,1633934,2414206,3007095,1421967,688791,368282,2907853,2328259,3049165,3040974,2412226,3040969,2419396,3049160,2925256,3040964,2413270,3040989,1981097,2925276,1981098,1633953,2910933,3040984,2995928,2925271,101051,3007186,792241,3040979,2413275,2943664,3025577,2936486,1951448,1981139,2936481,1981138,3040959,2943674,2943669,779985,3025587,3046159,2691852,2691851,2691853,882412,3008262,773858,2907911,2992898,3043100,2699029,2652950,2691818,2691817,3007212,2925291,3046119,2692836,2700004,2925281,3046113,2691815,2691834,2943741,2691833,2691835,2983679,2632446,2922229,2943736,2978640,2313032,2416450,2692944,2982722,2691912,2313037,2636635,2636638,3002204,3046233,2699091,3045204,2987859,2992941,3043120,2917161,1258326,792389,2987815,3028771,2965281,1261380,3046208,1261379,1261377,3046203,2692925,2992953,1261381,1258316,3029809,1261389,1980226,2297731,2297730,2297729,2944905,2987916,1977210,2962306,3028865,2888603,2268057,2636649,2697066,2944878,2697065,1977243,2697070,2697058,2697057,3033958,2697059,3036003,2697064,2636648,689055,3028860,2636671,2700146,2697073,2697078,2636662,2697080,2697079,2689994,2689993,4002,3027915,4005,4007,4008,2936773,4009,4010,4011,689057,4012,4013,1977268,4014,2689992,326571,4015,2689991,4016,4017,3049440,3027936,4018,100275,3049435,4022,100277,2700248,2978772,2413531,2631595,3048366,1977307,3027884,3048361,1980373,2978727,2927550,2927545,2945979,2991031,2702262,3048371) )) dgRW ON (dgRW.member_item_id = p3.device_item_id) UNION SELECT distinct device_item_id, item_id from v_etl_poll_item i3 inner join (SELECT distinct dgRW.member_item_id from v_etl_group_membership dgRW WHERE ((dgRW.group_item_id) IN ((3997),(2936843),(2906124),(2936838),(1977334),(3007517),(4088),(4089),(2936849),(1121302),(3049450),(3049445),(2903016),(1977361),(2696167),(1977359),(4113),(4114),(4115),(4116),(4117),(4118),(3032058),(99356),(99355),(3014643),(99358),(99357),(99364),(99363),(99366),(99365),(99368),(2962507),(99367),(99369),(2636866),(2920519),(2697283),(99378),(99377),(3052635),(1298471),(2926683),(2268252),(98363),(98366),(98365),(2936874),(2920511),(2658356),(325733),(100451),(325731),(684130),(1951857),(1980530),(2986141),(328817),(2961560),(1980515),(2920594),(3015826),(3030125),(2936944),(1977499),(2936939),(1977497),(1977498),(914571),(3029095),(1977495),(3030120),(914568),(2934887),(1985685),(1977491),(1977489),(1977490),(1977485),(2922624),(1977486),(914586),(1977481),(1977479),(1977480),(325785),(914580),(3015888),(3014861),(2920656),(3015883),(2920651),(2920646),(2263244),(2920641),(3049665),(914618),(3015900),(2263254),(881848),(3015894),(2967725),(3029166),(914634),(2961576),(914626),(2985122),(791746),(3049660),(2972859),(3049655),(2988213),(773330),(2972850),(914670),(2638089),(1923326),(2881808),(914666),(3049739),(1923322),(2881807),(3030281),(1923320),(3049734),(2636058),(1081570),(2268436),(2881817),(914681),(3006748),(2881811),(3049752),(123131),(2881816),(3006740),(914675),(3040530),(3031279),(2994412),(3015914),(3014887),(3015909),(1263898),(914689),(3036386),(2641148),(2268407),(2704638),(3015929),(3015924),(2994417),(2262337),(2904394),(2268485),(3011910),(2669891),(3056992),(2964783),(1303895),(1042759),(1046852),(2634023),(3028288),(3038608),(1263987),(2268546),(100711),(2995589),(2951577),(2658718),(2555296),(2961816),(3034515),(124287),(2631017),(2995565),(124289),(2631018),(2631019),(3040622),(124291),(2631020),(2631021),(2631022),(3040617),(2631023),(2631024),(2631009),(915847),(2631012),(2631013),(3040612),(2631014),(2631015),(2631016),(2631033),(2631034),(3049853),(2631035),(2631036),(2631037),(2631038),(2631025),(3040632),(2976117),(2919798),(2631026),(2631027),(2631028),(1290640),(3040627),(2631029),(2951537),(2631030),(2986356),(2891123),(2631031),(2631032),(2687433),(2909641),(3050956),(2950604),(3016138),(915883),(117159),(3037640),(2951621),(3016131),(3037635),(2950593),(2926018),(2929109),(2950614),(2950609),(3050961),(3016146),(2555301),(2555302),(2555299),(2555300),(2948523),(2555297),(2555298),(2925989),(3037608),(3037601),(2926013),(3016123),(1210824),(2700721),(2948535),(2926008),(781784),(2948530),(3005965),(2951688),(2414096),(2414091),(2923033),(2923032),(2671128),(1291800),(2637295),(2414062),(790042),(790038),(1971717),(790040),(1977919),(1977920),(1977915),(1977916),(2971209),(1977911),(1977912),(792098),(1977905),(792100),(1977903),(1258020),(3022431),(1977904),(1258019),(2891358),(751162),(1258017),(1258021),(2703953),(2923054),(2950698),(2923049),(2936361),(1983067),(2950697),(2700845),(2951724),(2950694),(2950696),(2950695),(1977939),(2936356),(2923044),(1977933),(1977931),(116310),(1977932),(3049020),(1977929),(1977930),(1977925),(2986551),(1977923),(1977924),(2957954),(781926),(2296462),(2692754),(2692753),(2692755),(1633945),(3054192),(1633947),(2957936),(1633948),(3086954),(1633951),(3056231),(3056232),(3056229),(3056230),(3056228),(2689640),(2932324),(124561),(3007103),(1633934),(2414206),(3007095),(1421967),(688791),(368282),(2907853),(2328259),(3049165),(3040974),(2412226),(3040969),(2419396),(3049160),(2925256),(3040964),(2413270),(3040989),(1981097),(2925276),(1981098),(1633953),(2910933),(3040984),(2995928),(2925271),(101051),(3007186),(792241),(3040979),(2413275),(2943664),(3025577),(2936486),(1951448),(1981139),(2936481),(1981138),(3040959),(2943674),(2943669),(779985),(3025587),(3046159),(2691852),(2691851),(2691853),(882412),(3008262),(773858),(2907911),(2992898),(3043100),(2699029),(2652950),(2691818),(2691817),(3007212),(2925291),(3046119),(2692836),(2700004),(2925281),(3046113),(2691815),(2691834),(2943741),(2691833),(2691835),(2983679),(2632446),(2922229),(2943736),(2978640),(2313032),(2416450),(2692944),(2982722),(2691912),(2313037),(2636635),(2636638),(3002204),(3046233),(2699091),(3045204),(2987859),(2992941),(3043120),(2917161),(1258326),(792389),(2987815),(3028771),(2965281),(1261380),(3046208),(1261379),(1261377),(3046203),(2692925),(2992953),(1261381),(1258316),(3029809),(1261389),(1980226),(2297731),(2297730),(2297729),(2944905),(2987916),(1977210),(2962306),(3028865),(2888603),(2268057),(2636649),(2697066),(2944878),(2697065),(1977243),(2697070),(2697058),(2697057),(3033958),(2697059),(3036003),(2697064),(2636648),(689055),(3028860),(2636671),(2700146),(2697073),(2697078),(2636662),(2697080),(3999),(2697079),(4000),(2689994),(4001),(2689993),(4002),(4003),(4004),(3027915),(4005),(4006),(4007),(4008),(2936773),(4009),(4010),(4011),(689057),(4012),(4013),(1977268),(4014),(2689992),(326571),(4015),(2689991),(4016),(4017),(3049440),(3027936),(4018),(100275),(3049435),(4022),(100277),(2700248),(2978772),(2413531),(2631595),(3048366),(1977307),(3027884),(3048361),(1980373),(2978727),(2927550),(2927545),(2945979),(2991031),(2702262),(3048371)))) dgRW ON (dgRW.member_item_id = i3.item_id) ) p3 ON (p3.item_id = IFSTATS.item_id) INNER JOIN v_etl_poll_item p2 ON (p2.item_id = IFSTATS.item_id) INNER JOIN (SELECT distinct item_id FROM v_etl_item_type dit WHERE (dit.item_type_name = 'device' AND (dit.item_subtype_name = 'router'))) dit ON (dit.item_id = p2.device_item_id) WHERE (IFSTATS.tstamp > 1615293180 AND IFSTATS.tstamp <= 1615296780) GROUP BY p4.item_id, p4.device_item_id ) v1 INNER JOIN (SELECT p4.item_id as ".ID", p4.device_item_id as ".PollItem.DeviceID", AVG(IFSTATS_BL.im_UtilizationIn_mean_x) as ".UtilizationIn.DevBaselineNorm" FROM IFSTATS_hourly_baselines_v IFSTATS_BL INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS_BL.item_id) INNER JOIN ( SELECT distinct device_item_id, item_id from v_etl_poll_item p3 inner join (SELECT distinct dgRW.member_item_id from v_etl_group_membership dgRW WHERE ( dgRW.group_item_id IN (2936843,2906124,2936838,1977334,3007517,4088,4089,2936849,1121302,3049450,3049445,2903016,1977361,2696167,1977359,4113,4114,4115,4116,4117,4118,3032058,99356,99355,3014643,99358,99357,99364,99363,99366,99365,99368,2962507,99367,99369,2636866,2920519,2697283,99378,99377,3052635,1298471,2926683,2268252,98363,98366,98365,2936874,2920511,2658356,325733,100451,325731,684130,1951857,1980530,2986141,328817,2961560,1980515,2920594,3015826,3030125,2936944,1977499,2936939,1977497,1977498,914571,3029095,1977495,3030120,914568,2934887,1985685,1977491,1977489,1977490,1977485,2922624,1977486,914586,1977481,1977479,1977480,325785,914580,3015888,3014861,2920656,3015883,2920651,2920646,2263244,2920641,3049665,914618,3015900,2263254,881848,3015894,2967725,3029166,914634,2961576,914626,2985122,791746,3049660,2972859,3049655,2988213,773330,2972850,914670,2638089,1923326,2881808,914666,3049739,1923322,2881807,3030281,1923320,3049734,2636058,1081570,2268436,2881817,914681,3006748,2881811,3049752,123131,2881816,3006740,914675,3040530,3031279,2994412,3015914,3014887,3015909,1263898,914689,3036386,2641148,2268407,2704638,3015929,3015924,2994417,2262337,2904394,2268485,3011910,2669891,3056992,2964783,1303895,1042759,1046852,2634023,3028288,3038608,1263987,2268546,100711,2995589,2951577,2658718,2961816,3034515,124287,2995565,124289,3040622,124291,2631022,3040617,2631023,2631024,2631009,915847,2631013,3040612,3049853,2631037,2631038,3040632,2976117,2919798,1290640,3040627,2631029,2951537,2631030,2986356,2891123,2631031,2909641,3050956,2950604,3016138,915883,117159,3037640,2951621,3016131,3037635,2950593,2926018,2929109,2950614,2950609,3050961,3016146,2555299,2948523,2925989,3037608,3037601,2926013,3016123,1210824,2700721,2948535,2926008,781784,2948530,3005965,2951688,2414096,2414091,2923033,2923032,2671128,1291800,2637295,2414062,790042,790038,1971717,790040,1977919,1977920,1977915,1977916,2971209,1977911,1977912,792098,1977905,792100,1977903,1258020,3022431,1977904,1258019,2891358,751162,1258017,1258021,2703953,2923054,2950698,2923049,2936361,1983067,2950697,2700845,2951724,2950694,2950696,2950695,1977939,2936356,2923044,1977933,1977931,116310,1977932,3049020,1977929,1977930,1977925,2986551,1977923,1977924,2957954,781926,2296462,2692754,2692753,2692755,1633945,3054192,1633947,2957936,1633948,3086954,1633951,3056228,2689640,2932324,124561,3007103,1633934,2414206,3007095,1421967,688791,368282,2907853,2328259,3049165,3040974,2412226,3040969,2419396,3049160,2925256,3040964,2413270,3040989,1981097,2925276,1981098,1633953,2910933,3040984,2995928,2925271,101051,3007186,792241,3040979,2413275,2943664,3025577,2936486,1951448,1981139,2936481,1981138,3040959,2943674,2943669,779985,3025587,3046159,2691852,2691851,2691853,882412,3008262,773858,2907911,2992898,3043100,2699029,2652950,2691818,2691817,3007212,2925291,3046119,2692836,2700004,2925281,3046113,2691815,2691834,2943741,2691833,2691835,2983679,2632446,2922229,2943736,2978640,2313032,2416450,2692944,2982722,2691912,2313037,2636635,2636638,3002204,3046233,2699091,3045204,2987859,2992941,3043120,2917161,1258326,792389,2987815,3028771,2965281,1261380,3046208,1261379,1261377,3046203,2692925,2992953,1261381,1258316,3029809,1261389,1980226,2297731,2297730,2297729,2944905,2987916,1977210,2962306,3028865,2888603,2268057,2636649,2697066,2944878,2697065,1977243,2697070,2697058,2697057,3033958,2697059,3036003,2697064,2636648,689055,3028860,2636671,2700146,2697073,2697078,2636662,2697080,2697079,2689994,2689993,4002,3027915,4005,4007,4008,2936773,4009,4010,4011,689057,4012,4013,1977268,4014,2689992,326571,4015,2689991,4016,4017,3049440,3027936,4018,100275,3049435,4022,100277,2700248,2978772,2413531,2631595,3048366,1977307,3027884,3048361,1980373,2978727,2927550,2927545,2945979,2991031,2702262,3048371) )) dgRW ON (dgRW.member_item_id = p3.device_item_id) UNION SELECT distinct device_item_id, item_id from v_etl_poll_item i3 inner join (SELECT distinct dgRW.member_item_id from v_etl_group_membership dgRW WHERE ((dgRW.group_item_id) IN ((3997),(2936843),(2906124),(2936838),(1977334),(3007517),(4088),(4089),(2936849),(1121302),(3049450),(3049445),(2903016),(1977361),(2696167),(1977359),(4113),(4114),(4115),(4116),(4117),(4118),(3032058),(99356),(99355),(3014643),(99358),(99357),(99364),(99363),(99366),(99365),(99368),(2962507),(99367),(99369),(2636866),(2920519),(2697283),(99378),(99377),(3052635),(1298471),(2926683),(2268252),(98363),(98366),(98365),(2936874),(2920511),(2658356),(325733),(100451),(325731),(684130),(1951857),(1980530),(2986141),(328817),(2961560),(1980515),(2920594),(3015826),(3030125),(2936944),(1977499),(2936939),(1977497),(1977498),(914571),(3029095),(1977495),(3030120),(914568),(2934887),(1985685),(1977491),(1977489),(1977490),(1977485),(2922624),(1977486),(914586),(1977481),(1977479),(1977480),(325785),(914580),(3015888),(3014861),(2920656),(3015883),(2920651),(2920646),(2263244),(2920641),(3049665),(914618),(3015900),(2263254),(881848),(3015894),(2967725),(3029166),(914634),(2961576),(914626),(2985122),(791746),(3049660),(2972859),(3049655),(2988213),(773330),(2972850),(914670),(2638089),(1923326),(2881808),(914666),(3049739),(1923322),(2881807),(3030281),(1923320),(3049734),(2636058),(1081570),(2268436),(2881817),(914681),(3006748),(2881811),(3049752),(123131),(2881816),(3006740),(914675),(3040530),(3031279),(2994412),(3015914),(3014887),(3015909),(1263898),(914689),(3036386),(2641148),(2268407),(2704638),(3015929),(3015924),(2994417),(2262337),(2904394),(2268485),(3011910),(2669891),(3056992),(2964783),(1303895),(1042759),(1046852),(2634023),(3028288),(3038608),(1263987),(2268546),(100711),(2995589),(2951577),(2658718),(2555296),(2961816),(3034515),(124287),(2631017),(2995565),(124289),(2631018),(2631019),(3040622),(124291),(2631020),(2631021),(2631022),(3040617),(2631023),(2631024),(2631009),(915847),(2631012),(2631013),(3040612),(2631014),(2631015),(2631016),(2631033),(2631034),(3049853),(2631035),(2631036),(2631037),(2631038),(2631025),(3040632),(2976117),(2919798),(2631026),(2631027),(2631028),(1290640),(3040627),(2631029),(2951537),(2631030),(2986356),(2891123),(2631031),(2631032),(2687433),(2909641),(3050956),(2950604),(3016138),(915883),(117159),(3037640),(2951621),(3016131),(3037635),(2950593),(2926018),(2929109),(2950614),(2950609),(3050961),(3016146),(2555301),(2555302),(2555299),(2555300),(2948523),(2555297),(2555298),(2925989),(3037608),(3037601),(2926013),(3016123),(1210824),(2700721),(2948535),(2926008),(781784),(2948530),(3005965),(2951688),(2414096),(2414091),(2923033),(2923032),(2671128),(1291800),(2637295),(2414062),(790042),(790038),(1971717),(790040),(1977919),(1977920),(1977915),(1977916),(2971209),(1977911),(1977912),(792098),(1977905),(792100),(1977903),(1258020),(3022431),(1977904),(1258019),(2891358),(751162),(1258017),(1258021),(2703953),(2923054),(2950698),(2923049),(2936361),(1983067),(2950697),(2700845),(2951724),(2950694),(2950696),(2950695),(1977939),(2936356),(2923044),(1977933),(1977931),(116310),(1977932),(3049020),(1977929),(1977930),(1977925),(2986551),(1977923),(1977924),(2957954),(781926),(2296462),(2692754),(2692753),(2692755),(1633945),(3054192),(1633947),(2957936),(1633948),(3086954),(1633951),(3056231),(3056232),(3056229),(3056230),(3056228),(2689640),(2932324),(124561),(3007103),(1633934),(2414206),(3007095),(1421967),(688791),(368282),(2907853),(2328259),(3049165),(3040974),(2412226),(3040969),(2419396),(3049160),(2925256),(3040964),(2413270),(3040989),(1981097),(2925276),(1981098),(1633953),(2910933),(3040984),(2995928),(2925271),(101051),(3007186),(792241),(3040979),(2413275),(2943664),(3025577),(2936486),(1951448),(1981139),(2936481),(1981138),(3040959),(2943674),(2943669),(779985),(3025587),(3046159),(2691852),(2691851),(2691853),(882412),(3008262),(773858),(2907911),(2992898),(3043100),(2699029),(2652950),(2691818),(2691817),(3007212),(2925291),(3046119),(2692836),(2700004),(2925281),(3046113),(2691815),(2691834),(2943741),(2691833),(2691835),(2983679),(2632446),(2922229),(2943736),(2978640),(2313032),(2416450),(2692944),(2982722),(2691912),(2313037),(2636635),(2636638),(3002204),(3046233),(2699091),(3045204),(2987859),(2992941),(3043120),(2917161),(1258326),(792389),(2987815),(3028771),(2965281),(1261380),(3046208),(1261379),(1261377),(3046203),(2692925),(2992953),(1261381),(1258316),(3029809),(1261389),(1980226),(2297731),(2297730),(2297729),(2944905),(2987916),(1977210),(2962306),(3028865),(2888603),(2268057),(2636649),(2697066),(2944878),(2697065),(1977243),(2697070),(2697058),(2697057),(3033958),(2697059),(3036003),(2697064),(2636648),(689055),(3028860),(2636671),(2700146),(2697073),(2697078),(2636662),(2697080),(3999),(2697079),(4000),(2689994),(4001),(2689993),(4002),(4003),(4004),(3027915),(4005),(4006),(4007),(4008),(2936773),(4009),(4010),(4011),(689057),(4012),(4013),(1977268),(4014),(2689992),(326571),(4015),(2689991),(4016),(4017),(3049440),(3027936),(4018),(100275),(3049435),(4022),(100277),(2700248),(2978772),(2413531),(2631595),(3048366),(1977307),(3027884),(3048361),(1980373),(2978727),(2927550),(2927545),(2945979),(2991031),(2702262),(3048371)))) dgRW ON (dgRW.member_item_id = i3.item_id) ) p3 ON (p3.item_id = IFSTATS_BL.item_id) INNER JOIN v_etl_poll_item p2 ON (p2.item_id = IFSTATS_BL.item_id) INNER JOIN (SELECT distinct item_id FROM v_etl_item_type dit WHERE (dit.item_type_name = 'device' AND (dit.item_subtype_name = 'router'))) dit ON (dit.item_id = p2.device_item_id) WHERE (IFSTATS_BL.endtime > 1615293180 AND IFSTATS_BL.endtime <= 1615296780) GROUP BY p4.item_id, p4.device_item_id ) v2 ON (v1.".ID"=v2.".ID") INNER JOIN v_etl_poll_item p4 ON (p4.item_id = v1.".ID") INNER JOIN item i ON (i.item_id = v1.".ID") ORDER BY CASE WHEN v2.".UtilizationIn.DevBaselineNorm" <> 0 THEN ABS((v1.".UtilizationIn.Avg" - v2.".UtilizationIn.DevBaselineNorm")/v2.".UtilizationIn.DevBaselineNorm"*100) ELSE 0 END DESC LIMIT 100, queryId=RIBQuery_6b085697_b485_4e10_8d43_7f1265520dc6, pageId=dc2693bd-3338-436d-bf5b-435ac4c4c427|(2000005), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=4, numberOfConcurrentRealTimeQueries=3, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)

    you can see there is problems to run some queries,is that might be relationship with the version?
    Currently I'm using the 20.2.1.0 version.

    thank you

    Valéria


  • 18.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 09, 2021 11:07 AM
    The Long running log can show both queries that return > 5000 rows (which really isn't much of an issue unless it runs long), and queries > 60 secs.

    So appears you have some queries > 60 secs and some > 90 secs.
    What does Queries System Health dashboard show for average RIB calls?

    This is where faster CPUs and disabling cpu throttling would come into play probably.  Speeding up these queries.
    Does it appear most of the queries > 60  secs are for IFSTATS in the query?
    Did we find that those spikes in the thresholding % poll cycle to be interface MF related?  So there appears to be a lot of use of the ifstats tables, if that's the case.

    If queries run faster, then there is less waiting for resources to run additional queries.

    How much memory is on the DR nodes?  That plays a role in how fast queries can run too.  Vertica prefers like 6-8GB per core at least.


  • 19.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 09, 2021 10:25 AM
    Yes, hostname is the DA hostname.  It makes rest calls to the DA to get a list of retired (not present) items and then rest calls to delete them.

    Note: if it comes back with like an error about bad XML or something, the response is gonna be too large.  So you can try using -d option and pass a number of days it's been in not present state.  Use like 365 and work down to 1, and it will hopefully find smaller batches to not hit too big response error (413).


  • 20.  RE: many gaps in all graphics

    Posted Mar 09, 2021 02:37 PM
      |   view attached
    Hello Jeffrey,

    I run the ./remove_not_present_items.sh -h -o myretired.csv on data aggregator server and the Total Retired Count was 16241.
    Can I delete all items without problems?
    I'm attached here this file myretired.csv and about the DR node information that you asked me:

    df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/vg_modelolin05-lv_root
    45G 8.5G 34G 21% /
    tmpfs 16G 0 16G 0% /dev/shm
    /dev/sda2 477M 104M 349M 23% /boot
    /dev/sda1 200M 264K 200M 1% /boot/efi
    /dev/sdb1 985G 251G 684G 27% /opt/CA




    thank you,

    Valéria

    Attachment(s)

    csv
    myretired.csv   2.07 MB 1 version


  • 21.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 09, 2021 03:17 PM
    Sorry, I meant the time to run RIB queries. Left side graph.

    Also, how much memory (not disk space) on the DR nodes?  cat /proc/meminfo

    Yes, you can delete the not present items.
    If you don't have the cleanupDeletedItems.sh script, you can reach out to support via case to get it.  It's good to run after deleting many items.


  • 22.  RE: many gaps in all graphics

    Posted Mar 09, 2021 03:31 PM
    Hello Jeffrey,

    Sorry about my mistakes.I'm sending you the RIB queries graph as told me the left side graph.

    and about the DR Nodes information using  cat /proc/meminfo

    /home/suporte_pm # cat /proc/meminfo
    MemTotal: 32869688 kB
    MemFree: 4438440 kB
    Buffers: 395864 kB
    Cached: 13463776 kB
    SwapCached: 0 kB
    Active: 18726792 kB
    Inactive: 8790984 kB
    Active(anon): 11829680 kB
    Inactive(anon): 1828964 kB
    Active(file): 6897112 kB
    Inactive(file): 6962020 kB
    Unevictable: 0 kB
    Mlocked: 0 kB
    SwapTotal: 4046844 kB
    SwapFree: 4046844 kB
    Dirty: 11012 kB
    Writeback: 0 kB
    AnonPages: 13658664 kB
    Mapped: 39088 kB
    Shmem: 480 kB
    Slab: 601572 kB
    SReclaimable: 551108 kB
    SUnreclaim: 50464 kB
    KernelStack: 7696 kB
    PageTables: 49044 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 20481688 kB
    Committed_AS: 13188556 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 81356 kB
    VmallocChunk: 34359645604 kB
    HardwareCorrupted: 0 kB
    AnonHugePages: 0 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    DirectMap4k: 8192 kB
    DirectMap2M: 33544192 kB


    I have the cleanupComponents.sh only.

    thank you

    Valéria


  • 23.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 09, 2021 03:49 PM
    Yeh, 32GB is kinda on the low side, along with the 2.27GHz.

    How many items in DR:
       select count(*) from dauser.item;

    The average RIB times are higher when there is a lot more RIB calls are being made, as vertica tries to run upto 40 sql calls at once, along with any rollups/thresholding calls at same time.  So the 32G along with CPU speed could cause things to take longer to run when it all happens at same time.

    You can check RIB avg time, and count at same time that thresholding(eventing) and rollups are run (and possibly data loading times).  The more concurrent work vertica has to do, the more stress on the vertica resources you have currently allocated.


  • 24.  RE: many gaps in all graphics

    Posted Mar 10, 2021 09:18 AM
    Hello Jeffrey,

    I got the cleanupDeletedItems.sh script and Can I use this script only typing the user and password without -s dauser ? something like this:

    cleanupDeletedItems.sh -u dradmin -w dbpass

    I ran the ./remove_not_present_items.sh script on data aggregator server as you asked me and it seems that CAPC is getting improve the performance

    I would to know how can I use this command select count(*) from dauser.item; on data aggregator to obtain the items.

    Thank you

    Valéria





  • 25.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 10, 2021 10:04 AM
    You have to run 1 of 2 ways.

    cleanupDeletedItems.sh -u dradmin -w adminpassword -s schema
    cleanupDeletedItems.sh -u schemauser -w schemauserpassword

    On how to run commands:
    log into /opt/vertica/bin/vsql, and run:   select count(*) from dauser.item;
    also, can you run:   select facet_qname, count(*) from dauser.v_item_facet group by 1 order by 2 desc limit 30;


  • 26.  RE: many gaps in all graphics

    Posted Mar 10, 2021 10:06 AM
    Today I see theses logs below ,I think that might be the cleanupDeletedItems.sh script could be helpfull in this case:

    tail karaf.log
    WARN | nstance_Worker-2 | 2021-03-10 11:57:48,444 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 248 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob required more than 80% of the currently configured execution interval of 1 minutes.
    INFO | nstance_Worker-2 | 2021-03-10 11:57:48,444 | atchProcessSystemLogNotifierImpl | atchProcessSystemLogNotifierImpl 57 | .ca.im.aggregator.loader | | Event 2.StateLongRunning was generated
    WARN | FacetBatcher | 2021-03-10 11:57:54,984 | nagedDeviceResourceDiscoveryImpl | nagedDeviceResourceDiscoveryImpl 771 | .im.aggregator.discovery | | Unable to retrieve mfInfo for key=MultiKey[126904, {http://im.ca.com/normalizer}NormalizedAddressInfo]
    WARN | FacetBatcher | 2021-03-10 11:57:54,991 | nagedDeviceResourceDiscoveryImpl | nagedDeviceResourceDiscoveryImpl 771 | .im.aggregator.discovery | | Unable to retrieve mfInfo for key=MultiKey[124859, {http://im.ca.com/normalizer}NormalizedAddressInfo]
    INFO | nstance_Worker-1 | 2021-03-10 11:58:30,884 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 242 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob completed in 0:00:30.883( 30883 ms )
    WARN | FacetBatcher | 2021-03-10 11:58:43,058 | nagedDeviceResourceDiscoveryImpl | nagedDeviceResourceDiscoveryImpl 771 | .im.aggregator.discovery | | Unable to retrieve mfInfo for key=MultiKey[1047492, {http://im.ca.com/normalizer}NormalizedAddressInfo]
    INFO | hPoller-thread-1 | 2021-03-10 11:59:20,001 | DAHealthPollTaskImpl | itor.poller.DAHealthPollTaskImpl 253 | .ca.im.aggregator.loader | | Failed to find expression for pollable metric family attr {http://im.ca.com/normalizer}NormalizedRollUpInfo.RunCount on vendor cert {http://im.ca.com/selfmonitor}DAHealthRollUpCalculationTimes
    INFO | nstance_Worker-2 | 2021-03-10 11:59:31,037 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 242 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob completed in 0:00:31.036( 31036 ms )
    WARN | FacetBatcher | 2021-03-10 11:59:35,887 | nagedDeviceResourceDiscoveryImpl | nagedDeviceResourceDiscoveryImpl 771 | .im.aggregator.discovery | | Unable to retrieve mfInfo for key=MultiKey[125589, {http://im.ca.com/normalizer}NormalizedAddressInfo]
    WARN | FacetBatcher | 2021-03-10 11:59:35,892 | nagedDeviceResourceDiscoveryImpl | nagedDeviceResourceDiscoveryImpl 771 | .im.aggregator.discovery | | Unable to retrieve mfInfo for key=MultiKey[2658436, {http://im.ca.com/normalizer}NormalizedAddressInfo]

    thank you

    Valéria


  • 27.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 10, 2021 10:24 AM
    WARN | nstance_Worker-2 | 2021-03-10 11:57:48,444 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 248 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob required more than 80% of the currently configured execution interval of 1 minutes.

    That means for this run, it took more than 48 secs to run.  If DR was busy with other work, then yes could take longer.
    Appears from other messages, the normal time is 30-31 secs.



  • 28.  RE: many gaps in all graphics

    Posted Mar 10, 2021 10:33 AM
    Hello Jeffrey

    About the scritp cleanupDeletedItems.sh what would be the schema that I have to use in this case?

    I sent you the logs about LongRunningAndFailedQueries.log maybe this can help you:

    2021-03-09 12:52:08,3102021-03-09 12:52:08,310 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=27, time_SQLExecution_ms=92349, time_QueryTotal_ms=92349, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=20, numberOfColumnsReturned=5, sort_Nature=NOT_TIMESERIES, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=HOURLY_ROLLUP, tableRequestedByMetricFactTableTypeProperty=HOURLY_ROLLUP, tableChosen=HOURLY_ROLLUP, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .Item.Name, .Item.DisplayDescription, .PctDiscardsIn.Avg, .PctDiscardsOut.Avg FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.DeviceID = '125028' AND .EndTime(3600) > 1614700200 AND .EndTime(3600) <= 1615305000 GROUPBY .PollItem.ID, .Item.Name, .Item.DisplayDescription ORDERBY .PctDiscardsIn.Avg DESC, .PollItem.ID ASC LIMIT 30, sqlQuery=select /*+label(RIBQuery_a9dca94b_ec5d_4bb7_bb9c_1ae99043950f)*/ p4.item_id as ".PollItem.ID", i.item_name as ".Item.Name", i.item_display_description as ".Item.DisplayDescription", AVG(IFSTATS.im_PctDiscardsIn) as ".PctDiscardsIn.Avg", AVG(IFSTATS.im_PctDiscardsOut) as ".PctDiscardsOut.Avg" FROM IFSTATS_ltd IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) INNER JOIN item i ON (i.item_id = IFSTATS.item_id) WHERE (p4.device_item_id = '125028' AND IFSTATS.tstamp > 1614700200 AND IFSTATS.tstamp <= 1615305000) GROUP BY p4.item_id, i.item_name, i.item_display_description ORDER BY AVG(IFSTATS.im_PctDiscardsIn) IS NULL ASC, AVG(IFSTATS.im_PctDiscardsIn) DESC, p4.item_id ASC LIMIT 30, queryId=RIBQuery_a9dca94b_ec5d_4bb7_bb9c_1ae99043950f, pageId=c4894ad8-62e0-45bf-a85c-0a3d5744c49a|(2000087), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=4, numberOfConcurrentRealTimeQueries=4, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 12:52:12,9092021-03-09 12:52:12,909 | INFO | LongRunningAndFailedQueries | Long Running Query - finished RIBQueryPerformanceData [time_RIBtoSQL_ms=18, time_SQLExecution_ms=96943, time_QueryTotal_ms=96944, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=20, numberOfColumnsReturned=5, sort_Nature=NOT_TIMESERIES, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=HOURLY_ROLLUP, tableRequestedByMetricFactTableTypeProperty=HOURLY_ROLLUP, tableChosen=HOURLY_ROLLUP, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .PollItem.ID, .Item.Name, .Item.DisplayDescription, .ErrorsIn.Sum, .ErrorsOut.Sum FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.DeviceID = '125028' AND .EndTime(3600) > 1614700200 AND .EndTime(3600) <= 1615305000 GROUPBY .PollItem.ID, .Item.Name, .Item.DisplayDescription ORDERBY .ErrorsIn.Sum DESC, .PollItem.ID ASC LIMIT 30, sqlQuery=select /*+label(RIBQuery_257f08f6_b698_4836_93bf_b80b1f26f69f)*/ p4.item_id as ".PollItem.ID", i.item_name as ".Item.Name", i.item_display_description as ".Item.DisplayDescription", SUM(IFSTATS.im_ErrorsIn) as ".ErrorsIn.Sum", SUM(IFSTATS.im_ErrorsOut) as ".ErrorsOut.Sum" FROM IFSTATS_ltd IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) INNER JOIN item i ON (i.item_id = IFSTATS.item_id) WHERE (p4.device_item_id = '125028' AND IFSTATS.tstamp > 1614700200 AND IFSTATS.tstamp <= 1615305000) GROUP BY p4.item_id, i.item_name, i.item_display_description ORDER BY SUM(IFSTATS.im_ErrorsIn) IS NULL ASC, SUM(IFSTATS.im_ErrorsIn) DESC, p4.item_id ASC LIMIT 30, queryId=RIBQuery_257f08f6_b698_4836_93bf_b80b1f26f69f, pageId=c4894ad8-62e0-45bf-a85c-0a3d5744c49a|(2000087), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=5, numberOfConcurrentRealTimeQueries=5, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:799)
    2021-03-09 12:53:59,3912021-03-09 12:53:59,391 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=9, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsIn.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '283126' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_598cedd1_0925_4b51_8d01_815b02f5c8e6)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsIn/IFSTATS.duration) as ".BitsIn.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '283126' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_598cedd1_0925_4b51_8d01_815b02f5c8e6, pageId=136813ff-ba8a-4d5f-a444-31a61cf96dc1|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=7, numberOfConcurrentRealTimeQueries=7, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:53:59,5992021-03-09 12:53:59,599 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=10, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsIn.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '234754' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_8a33ebf1_0729_40b5_bc05_203cda0b882a)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsIn/IFSTATS.duration) as ".BitsIn.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '234754' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_8a33ebf1_0729_40b5_bc05_203cda0b882a, pageId=2e1439e6-8163-471d-b4f3-269ac0b68ff3|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=13, numberOfConcurrentRealTimeQueries=13, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:53:59,7642021-03-09 12:53:59,764 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=11, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsIn.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '234752' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_b0a38b25_fce6_4f4a_9acb_4340721ba2b8)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsIn/IFSTATS.duration) as ".BitsIn.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '234752' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_b0a38b25_fce6_4f4a_9acb_4340721ba2b8, pageId=0fc31189-cd25-4cbb-8325-f9713b0b061b|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=12, numberOfConcurrentRealTimeQueries=12, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:53:59,7942021-03-09 12:53:59,794 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=13, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsIn.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '283128' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_6f908cd2_06ba_4656_b2fd_1fcf1e7e1627)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsIn/IFSTATS.duration) as ".BitsIn.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '283128' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_6f908cd2_06ba_4656_b2fd_1fcf1e7e1627, pageId=6cbb2436-ebdf-4b23-b61e-f4de50931a08|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=17, numberOfConcurrentRealTimeQueries=17, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:54:02,0902021-03-09 12:54:02,090 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=16, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsOut.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '283126' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_cfd55b62_4089_4900_a9e6_df136ff8beff)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsOut/IFSTATS.duration) as ".BitsOut.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '283126' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_cfd55b62_4089_4900_a9e6_df136ff8beff, pageId=136813ff-ba8a-4d5f-a444-31a61cf96dc1|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=3, numberOfConcurrentRealTimeQueries=3, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:54:02,1932021-03-09 12:54:02,193 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=16, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsOut.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '283128' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_25d2579c_2fce_4d04_a788_381ebd7bdbb4)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsOut/IFSTATS.duration) as ".BitsOut.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '283128' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_25d2579c_2fce_4d04_a788_381ebd7bdbb4, pageId=6cbb2436-ebdf-4b23-b61e-f4de50931a08|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=6, numberOfConcurrentRealTimeQueries=6, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:54:02,3442021-03-09 12:54:02,344 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=10, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsOut.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '234752' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_a249638c_3efa_4432_8897_412560394004)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsOut/IFSTATS.duration) as ".BitsOut.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '234752' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_a249638c_3efa_4432_8897_412560394004, pageId=0fc31189-cd25-4cbb-8325-f9713b0b061b|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=4, numberOfConcurrentRealTimeQueries=4, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)
    2021-03-09 12:54:02,5752021-03-09 12:54:02,575 | INFO | LongRunningAndFailedQueries | Query returned more than 5000 rows! , RIBQueryPerformanceData [time_RIBtoSQL_ms=11, time_SQLExecution_ms=0, time_QueryTotal_ms=0, time_ToDetermineSortCharacteristic=0, numberOfRowsReturned=0, numberOfColumnsReturned=0, sort_Nature=UNDETERMINED, tableChoiceResults=TableChoiceResults [tableRequestedByResolution=RATE, tableRequestedByMetricFactTableTypeProperty=RATE, tableChosen=RATE, tableChoiceReason=AS_REQUESTED_BY_RESOLUTION], percentileQueryForm=NOT_PERCENTILE, gapAdditionCounter=0, ribQuery=SELECT .EndTime(60), .BitsOut.AvgRate, .Resolution.Returned FROM CA.IM.DA.MF.NormalizedPortInfo.IFSTATS WHERE .PollItem.ID = '234754' AND .EndTime(60) > 1614700380 AND .EndTime(60) <= 1615305180 GROUPBY .EndTime(60) ORDERBY .EndTime(60) ASC, sqlQuery=select /*+label(RIBQuery_c823c99b_38f3_4df1_9932_c482f1e280a2)*/ IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 as ".EndTime(60)", AVG(IFSTATS.im_BitsOut/IFSTATS.duration) as ".BitsOut.AvgRate", GREATEST(60,MAX(IFSTATS.rinterval)) as ".Resolution.Returned" FROM IFSTATS_rate IFSTATS INNER JOIN v_etl_poll_item p4 ON (p4.item_id = IFSTATS.item_id) WHERE (p4.item_id = '234754' AND IFSTATS.tstamp > 1614700320 AND IFSTATS.tstamp <= 1615305180) GROUP BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ORDER BY IFSTATS.tstamp-mod(IFSTATS.tstamp-1,60)+60-1 ASC, queryId=RIBQuery_c823c99b_38f3_4df1_9932_c482f1e280a2, pageId=2e1439e6-8163-471d-b4f3-269ac0b68ff3|(2000350), longRunningQueryTripValue_ms=60000, useScheduledDataSource=false, numberOfConcurrentQueries=5, numberOfConcurrentRealTimeQueries=5, numberOfConcurrentScheduledQueries=0] | (RIBQuery.java:544)

    thank you

    Valéria



  • 29.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 10, 2021 11:25 AM
    We don't use schema in DA calls to DB, because we actually connect as the schema user.

    To determine the schema user, check DA  apache-karaf-2.4.3/etc/dbconnection.cfg for dbUser= value.


  • 30.  RE: many gaps in all graphics

    Posted Mar 10, 2021 12:57 PM
    Hello Jeffrey,

    I ran the 2 commands as you asked me and below you can see the results:


    dradmin=> select count(*) from daadmin.item;
    count
    --------
    827786
    (1 row)


    select facet_qname, count(*) from daadmin.v_item_facet group by 1 order by 2 desc limit 30;


    dradmin=> select facet_qname, count(*) from daadmin.v_item_facet group by 1 order by 2 desc limit 30;
    facet_qname | count
    ----------------------------------------------------------------------------+--------
    {http://im.ca.com/inventory}Pollable | 609636
    {http://im.ca.com/core}Syncable | 608034
    {http://im.ca.com/inventory}DeviceComponent | 606449
    {http://im.ca.com/inventory}DiscoveryInfo | 595871
    {http://im.ca.com/core}GroupHasMember | 183548
    {http://im.ca.com/inventory}ProcessCPUUsage | 164018
    {http://im.ca.com/inventory}ChassisVoltageEnvironmentalSensor | 71417
    {http://im.ca.com/inventory}AlternatePort | 70530
    {http://im.ca.com/inventory}Hierarchy | 70075
    {http://im.ca.com/inventory}QoSQueuing | 67693
    {http://im.ca.com/inventory}Port | 64109
    {http://im.ca.com/inventory}ChassisPowerSupplyEnvironmentalSensor | 47890
    {http://im.ca.com/inventory}ChassisTemperatureEnvironmentalSensorAlternate | 28553
    {http://im.ca.com/pollingconfig}PollingConfig | 21170
    {http://im.ca.com/inventory}Memory | 19921
    {http://im.ca.com/inventory}CPU | 17591
    {http://im.ca.com/inventory}NetFlowStatistics | 9487
    {http://im.ca.com/inventory}IPv6stats | 6921
    {http://im.ca.com/core}CertificationFacetHistory | 6001
    {http://im.ca.com/inventory}ContentAddressableMemory | 4416
    {http://im.ca.com/inventory}Device | 3211
    {http://im.ca.com/monitoringprofile}DeviceMonitoringProfile | 3211
    {http://im.ca.com/pollgroup}PollGroup | 3202
    {http://im.ca.com/core}IPDomainMember | 3191
    {http://im.ca.com/inventory}ConsolidatedAndDiscoveredMetricFamilyHistory | 3187
    {http://im.ca.com/core}Lifecycle | 3187
    {http://im.ca.com/inventory}MetricFamilyDiscoveryHistory | 3187
    {http://im.ca.com/inventory}ManageableDevice | 3186
    {http://im.ca.com/dcm}DataCollectionMgrId | 3186
    {http://im.ca.com/normalizer}NormalizedPortInfo | 3186

    Unfortunately,I couldn't run ./cleanupDeletedItems.sh on data aggregator server and below I got these errors below:

    ./cleanupDeletedItems.sh: line 83: error: command not found
    ./cleanupDeletedItems.sh: line 100: package: command not found

    thank you

    Valéria


  • 31.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 10, 2021 01:25 PM
    It needs to run on one of the vertica nodes?

    So under 1 million items, 660k device components.
    The one thing I do notice is ProcessCPUUsage facet.  This item is known to be an issue with creating many not present items.  Because process cpu items come and go on the device.  Just FYI, so you'd want to maybe keep an eye on the Not Present (Retired) items in the future.

    I'd think adding more memory (double to 64G at least) would help speed up queries, especially when more at run at the same time due to different aspects of the process running.  If you can get faster CPUs, all the better too.



  • 32.  RE: many gaps in all graphics

    Posted Mar 10, 2021 03:00 PM
    Hello Jeffrey,

    I ran the ./cleanupDeletedItems.sh  on data repository server and I got these results below:

    Count of table entries where item_id NOT in item table anymore:
    Devices: 3151
    Poll Items: 276072
    Item Facets: 3500973
    Item Relationships: 1802451
    Visible Tenants: 1780870

    Removing stale entries from database...
    Deleted 3151 device entries
    Deleted 276072 poll_item entries
    Deleted 3500973 item_facet entries
    Deleted 1802451 item_relationship entries
    Deleted 1780870 visible_tenant entries
    Running DB purge to physically remove deleted data...
    Completed purge.

    elapsed time: 0 minutes 45 seconds


    I'll try to obtain more memory and CPU as you recommended to me.

    My appreciation to your help

    Valéria


  • 33.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 10, 2021 03:02 PM
    That is a good amount of "leftovers" that should help with query performance.


  • 34.  RE: many gaps in all graphics

    Posted Mar 12, 2021 09:05 AM
    Hello Jeffrey,

    Will I have to have more cpu and memory only for DR server or for entire solution?

    Yesterday the CAPC was working fine but today the poll cycle increased again the percent over the 50% and the DA Server became degrated so many times.

    thank you

    Valéria


  • 35.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 12, 2021 09:22 AM
    Just the DR nodes.

    The degraded state is because thresholding was over 80% for 15 mins.  It doesn't mean the DA had any issues.  It's our way of turning off thresholding if those queries may be overloading the DR causing other DR queries to be slower.


  • 36.  RE: many gaps in all graphics

    Posted Mar 12, 2021 10:44 AM
    Jeffrey,

    I realized that only node has 32 GB and others nodes have 144 GB:

    Node 1:

    MemTotal: 32869688 kB
    MemFree: 3378696 kB
    Buffers: 351336 kB
    Cached: 14096300 kB
    SwapCached: 0 kB
    Active: 19384464 kB
    Inactive: 9154892 kB
    Active(anon): 12263296 kB
    Inactive(anon): 1828940 kB
    Active(file): 7121168 kB
    Inactive(file): 7325952 kB
    Unevictable: 0 kB
    Mlocked: 0 kB
    SwapTotal: 4046844 kB
    SwapFree: 4046844 kB
    Dirty: 18564 kB
    Writeback: 0 kB
    AnonPages: 14091672 kB
    Mapped: 39296 kB
    Shmem: 480 kB
    Slab: 638200 kB
    SReclaimable: 587328 kB
    SUnreclaim: 50872 kB
    KernelStack: 7440 kB
    PageTables: 49184 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 20481688 kB
    Committed_AS: 12508332 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 81356 kB
    VmallocChunk: 34359645604 kB
    HardwareCorrupted: 0 kB
    AnonHugePages: 0 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    DirectMap4k: 8192 kB
    DirectMap2M: 33544192 kB

    Node 2 

    MemTotal: 148693792 kB
    MemFree: 6616480 kB
    Buffers: 808408 kB
    Cached: 125424412 kB
    SwapCached: 0 kB
    Active: 71486260 kB
    Inactive: 64858568 kB
    Active(anon): 8778612 kB
    Inactive(anon): 1333892 kB
    Active(file): 62707648 kB
    Inactive(file): 63524676 kB
    Unevictable: 0 kB
    Mlocked: 0 kB
    SwapTotal: 4046844 kB
    SwapFree: 4046844 kB
    Dirty: 18208 kB
    Writeback: 0 kB
    AnonPages: 10112056 kB
    Mapped: 58692 kB
    Shmem: 492 kB
    Slab: 4629048 kB
    SReclaimable: 4379944 kB
    SUnreclaim: 249104 kB
    KernelStack: 7152 kB
    PageTables: 37020 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 78393740 kB
    Committed_AS: 3956932 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 426124 kB
    VmallocChunk: 34359300792 kB
    HardwareCorrupted: 0 kB
    AnonHugePages: 2048 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    DirectMap4k: 8192 kB
    DirectMap2M: 151508992 kB

    Node 3:

    MemTotal: 148693792 kB
    MemFree: 2608760 kB
    Buffers: 818380 kB
    Cached: 125748472 kB
    SwapCached: 0 kB
    Active: 75849164 kB
    Inactive: 64607068 kB
    Active(anon): 12505860 kB
    Inactive(anon): 1384788 kB
    Active(file): 63343304 kB
    Inactive(file): 63222280 kB
    Unevictable: 0 kB
    Mlocked: 0 kB
    SwapTotal: 4046844 kB
    SwapFree: 4046844 kB
    Dirty: 106036 kB
    Writeback: 15336 kB
    AnonPages: 13890180 kB
    Mapped: 59312 kB
    Shmem: 492 kB
    Slab: 4515596 kB
    SReclaimable: 4275900 kB
    SUnreclaim: 239696 kB
    KernelStack: 7968 kB
    PageTables: 46808 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 78393740 kB
    Committed_AS: 6108008 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 426124 kB
    VmallocChunk: 34359300724 kB
    HardwareCorrupted: 0 kB
    AnonHugePages: 2048 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    DirectMap4k: 8192 kB
    DirectMap2M: 151508992 kB

    Even that I have only 1 server with low memory,so the others could'n help them with these tasks?

    For changing this memory I'll have to shut down the server to update its memory.

    thank you

    Valéria




  • 37.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 12, 2021 11:17 AM
    Yeh, we like to say vertica runs at the slowest node.

    In the case of this, when vertica goes to plan the query to determine memory usage, depending what node was called to initiate the query, could cause different plan to be created and query run slower or faster.

    If the query went to node running 32GB, it would plan based on 32GB on all nodes, so it may not utilize the resources the best.
    But on the other hand, if the 144GB node was the initial query, it may request more memory for the steps it sends to 32GB node, and that node may have to wait longer to borrow memory from the general pool (memory not reserved for the resource pools).

    So if you can get node1 upto 144GB, your perf issue may go away.
    To upgrade the 32GB node, use admintools to "stop vertica on host" for that node.  The other 2 nodes should stay up since 2 of 3 is still ksafe 1.
    Stop the node, add the memory, then when node comes up, if vertica isn't automatically started, then use admintools to "restart vertica on host" for the node.

    There is configuration needed after adding the additional memory.  Vertica will use upto 95% of system RAM for it's work.


  • 38.  RE: many gaps in all graphics

    Posted Mar 17, 2021 11:04 AM
    Hello Jeffrey,

    I did what you asked me do,so I put more memory and The CAPC is now faster but what do you mean about "There is configuration needed after adding the additional memory.  Vertica will use upto 95% of system RAM for it's work."?
    Today I've restarted the DC sever and DA server and it's took a long time to came back and the graphics during the night period keeping presents many gaps even though with the poll cycle percent has decreased.

    thank you

    Valéria



  • 39.  RE: many gaps in all graphics

    Posted Mar 17, 2021 11:30 AM
    Jeffrey,

    I saw this information in karaf.log.What would be this information?

    root@VMLPFMPRD12 /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data/log # tail karaf.log
    INFO | ficationConsumer | 2021-03-17 12:26:12,074 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 20 pending data loads to nrm_tcp_stats_rate
    INFO | ficationConsumer | 2021-03-17 12:26:12,075 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 21 pending data loads to nrm_tcp_stats_rate
    INFO | ficationConsumer | 2021-03-17 12:26:12,076 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 21 pending data loads to nrm_udp_stats_rate
    INFO | ficationConsumer | 2021-03-17 12:26:12,076 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 15 pending data loads to nrm_hardware_status_rate
    INFO | ficationConsumer | 2021-03-17 12:26:17,095 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 19 pending data loads to ifstats_rate
    INFO | ficationConsumer | 2021-03-17 12:26:17,095 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 16 pending data loads to nrm_fan_env_sensor_rate
    INFO | ficationConsumer | 2021-03-17 12:26:17,095 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 21 pending data loads to nrm_partitions_rate
    INFO | ficationConsumer | 2021-03-17 12:26:17,096 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 19 pending data loads to nrm_system_stats_rate
    INFO | ficationConsumer | 2021-03-17 12:26:17,097 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 16 pending data loads to nrm_fw_conn_stats_rate
    INFO | ficationConsumer | 2021-03-17 12:26:17,097 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 22 pending data loads to nrm_generic_system_rate

    thank you,

    Valéria


  • 40.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 17, 2021 11:53 AM

    My bad, it should've said "This is NO configuration....".

    I believe those pending messages are printed when we start the data loading processors.  
    Are they still being printed?  

    Can you check self monitoring General Processing dashboard for last 8 hrs for Data Loading views?

    Also, can enable this trace in DA apache-karaf-2.4.3/etc/org.ops4j.pax.logging.cfg

    log4j.logger.com.ca.im.dm.core.aggregator.loader.bulk.notifications.DataLoadNotificationManagerImpl=TRACE,sift
    log4j.additivity.com.ca.im.dm.core.aggregator.loader.bulk.notifications.DataLoadNotificationManagerImpl=false

    It will log to a new file in apache-karaf-2.4.3/data/logs the DataLoadNotificationManagerImpl messages and any additional debug/trace messages.
    Any errors in Exception.log about data loading?




  • 41.  RE: many gaps in all graphics

    Posted Mar 17, 2021 12:06 PM
    Jeffrey,

    This messagens are increasing in karaf logs and doesn't stop:

    root@VMLPFMPRD12 /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data/log # tail karaf.log
    INFO | ficationConsumer | 2021-03-17 12:57:17,000 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 44 pending data loads to nrm_generic_system_rate
    INFO | ficationConsumer | 2021-03-17 12:57:17,000 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 43 pending data loads to reach_rate
    WARN | Host:VMLPFMPRD16 | 2021-03-17 12:57:17,163 | shutdown | ase.heartbeat.DBStateManagerImpl 792 | ommon.core.services.impl | | DB heartbeat to host VMLPFMPRD16 successful, but the response time of 0:01:28.527 was longer then a threshold of 20000 ms.
    INFO | ficationConsumer | 2021-03-17 12:57:22,174 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 33 pending data loads to nrm_net_svcs_dhcp_nt_app_svc_rate
    INFO | ficationConsumer | 2021-03-17 12:57:22,176 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 34 pending data loads to nrm_boolean_sensor_rate
    INFO | ficationConsumer | 2021-03-17 12:57:22,176 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 26 pending data loads to nrm_da_amq_broker_health_rate
    INFO | ficationConsumer | 2021-03-17 12:57:22,176 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 29 pending data loads to nrm_event_health_times_rate
    INFO | ficationConsumer | 2021-03-17 12:57:30,475 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 38 pending data loads to nrm_env_sensor_optical_power_rate
    INFO | ficationConsumer | 2021-03-17 12:57:35,511 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 42 pending data loads to avail_rate
    INFO | ficationConsumer | 2021-03-17 12:57:35,512 | DataLoadNotificationManagerImpl | .DataLoadNotificationManagerImpl 67 | .ca.im.aggregator.loader | | DATA-LOAD: There are 46 pending data loads to nrm_partitions_rate

    Should stop again the DC to stop these messagens?
     




  • 42.  RE: many gaps in all graphics

    Posted Mar 17, 2021 12:13 PM
    Jeffrey,

    About the Exception log:

    root@VMLPFMPRD12 /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data/log # tail Exception.log
    stderr -- 2021-03-17 11:14:34,854 - Passed abnormal-service-list-detector checking. 32 services of org.springframework.context.ApplicationContext found

    About Data Aggregator General Processing views:




    thank you,

    Valéria


  • 43.  RE: many gaps in all graphics

    Posted Mar 17, 2021 12:44 PM
    Jeffrey,

    I enabled what you asked me do and now those messages doesn't appear anymore,so I've restarted the DC Server because it's was taking a long time to sync.I don't know what I have to do to CAPC back to normal?

    thank you

    Valéria


  • 44.  RE: many gaps in all graphics

    Posted Mar 17, 2021 12:59 PM
    Jeffrey,

    Now you can see this in karaf logs:

    tail karaf.log
    INFO | esponseHandler-3 | 2021-03-17 13:57:10,775 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 16.DataCollectorDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-4 | 2021-03-17 13:57:12,115 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-3 | 2021-03-17 13:57:13,559 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-1 | 2021-03-17 13:57:16,915 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-2 | 2021-03-17 13:57:17,551 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-4 | 2021-03-17 13:57:17,551 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 16.DataCollectorDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-3 | 2021-03-17 13:57:18,213 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-2 | 2021-03-17 13:57:22,828 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-2 | 2021-03-17 13:57:23,347 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | esponseHandler-2 | 2021-03-17 13:57:24,971 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 102 | ager.core.collector.impl | | Event 17.DataCollectorNoLongerDroppingPolls was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab


    thank you,

    Valéria


  • 45.  RE: many gaps in all graphics

    Posted Mar 17, 2021 01:18 PM
    WARN | Host:VMLPFMPRD14 | 2021-03-17 14:16:58,762 | shutdown | ase.heartbeat.DBStateManagerImpl 792 | ommon.core.services.impl | | DB heartbeat to host VMLPFMPRD14 successful, but the response time of 0:04:23.450 was longer then a threshold of 20000 ms.
    WARN | atTimer-thread-2 | 2021-03-17 14:17:02,980 | DCHeartBeatLog | r.controller.DCMHeartbeatManager 131 | ore.collector.interfaces | | No response has been received from DC 3987 in timeframe 50418 (ms):
    DA Unresponsive, previous responses=4544908, current responses=4544916
    WARN | atTimer-thread-2 | 2021-03-17 14:17:02,981 | DCHeartBeatLog | r.controller.DCMHeartbeatManager 147 | ore.collector.interfaces | | No response has been received from VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab in timeframe 50418 ms: DC Contact Lost. Previous responses=4544908, current responses=4544916. Time since last heartbeat check: 10000 ms
    INFO | atTimer-thread-2 | 2021-03-17 14:17:02,981 | IPDomainStatusManager | oller.impl.IPDomainStatusManager 79 | ager.core.collector.impl | | IP Domain 2's status changed to NO_DC_RUNNING
    INFO | itory-thread-149 | 2021-03-17 14:17:02,981 | CableModemInventoryDiscovery | hcp.CableModemInventoryDiscovery 257 | .im.aggregator.discovery | | CMTS DC 3987 status changed from RUNNING to CONTACT_LOST
    INFO | atTimer-thread-2 | 2021-03-17 14:17:02,981 | IPDomainStatusManager | oller.impl.IPDomainStatusManager 104 | ager.core.collector.impl | | Set IP domain 2 status to NO_DC_RUNNING
    ERROR | atTimer-thread-2 | 2021-03-17 14:17:02,983 | DCHeartBeatLog | impl.DCMContactStatusManagerImpl 117 | ager.core.collector.impl | | Lost contact to DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab. State changed from RUNNING to CONTACT_LOST. The last heartbeat was received 50418 ms ago
    INFO | atTimer-thread-2 | 2021-03-17 14:17:02,983 | DCMEventGeneratorImpl | oller.impl.DCMEventGeneratorImpl 93 | ager.core.collector.impl | | Event 5.DataCollectorContactLost was generated on DC VMLPFMPRD13:01401ca8-6b2f-41d7-91a9-cbe34ba9b5ab
    INFO | itory-thread-146 | 2021-03-17 14:17:02,994 | IPDomainAttrListener | covery.impl.IPDomainAttrListener 105 | .im.aggregator.discovery | | IP Domain 2 status changed from ACTIVE to NO_DC_RUNNING. 3 discovery profiles status set to NO_DC_RUNNING


  • 46.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 17, 2021 02:07 PM
    So all those messages are now gonna appear in a new file in data/log.  Check for a com.ca.im.aggregator.loader.log maybe.   See if the entries are appearing in there.

    Upload the file, if you see messages.

    As for the DataCollectorNoLongerDroppingPolls  message, think that means DC is clearing events that polls were being dropped for diff devices.
    When you restart the DCs, it should goto NO_DC_RUNNING, but after DC registers, it should go back to ACTIVE.

    What does the last 8 hrs look like for that Data Loading view using 5 min resolution?

    Can you also check apache-karaf-2.4.3/data/shutdown-details.log and see how long it's taking the DA to DR heartbeats?

    So all DR nodes are at 144GB now?




  • 47.  RE: many gaps in all graphics

    Posted Mar 17, 2021 02:28 PM
    Hello Jeffrey,

    I couldn't find any com.ca.im.aggregator.loader.log like you said,but below you can see what I have in /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data/log  directory:


    /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data/log # ls -la
    total 77636
    drwxr-xr-x 2 root root 4096 Mar 17 11:18 .
    drwxr-xr-x 6 root root 4096 Mar 17 11:09 ..
    -rw-r--r-- 1 root root 0 Mar 17 11:09 ActiveMQConsumer.log
    -rw-r--r-- 1 root root 0 Mar 17 11:09 ems.log
    -rw-r--r-- 1 root root 540 Mar 17 13:26 Exception.log
    -rw-r--r-- 1 root root 3595 Mar 17 15:04 Expression.log
    -rw-r--r-- 1 root root 36654 Mar 17 14:23 ItemDeletes.log
    -rw-r--r-- 1 root root 8913705 Mar 17 15:15 karaf.log
    -rw-r--r-- 1 root root 2227692 Mar 17 15:08 LongRunningAndFailedQueries.log
    -rw-r--r-- 1 root root 3837 Mar 17 11:43 odata-services.impl.log
    -rw-r--r-- 1 root root 49417697 Mar 17 15:05 PCCEvaluation.log
    -rw-r--r-- 1 root root 18854883 Mar 17 15:15 PollSummary.log
    -rw-r--r-- 1 root root 0 Mar 17 11:09 RootException.log
    -rw-r--r-- 1 root root 0 Mar 17 11:09 vna_change.log
    -rw-r--r-- 1 root root 184 Mar 17 11:18 vna.log


    I couldn't any shutdown-details.log in /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data  directory


    /opt/CA/IMDataAggregator/apache-karaf-2.4.3/data # ls -la
    total 32
    drwxr-xr-x 6 root root 4096 Mar 17 11:09 .
    drwxr-xr-x 21 root root 4096 Mar 17 11:09 ..
    drwxr-xr-x 4 root root 4096 Mar 17 11:09 cache
    drwxr-xr-x 2 root root 4096 Mar 17 11:09 generated-bundles
    -rw-r--r-- 1 root root 637 Mar 17 13:26 karaf.out
    drwxr-xr-x 2 root root 4096 Mar 17 11:18 log
    -rw-r--r-- 1 root root 5 Mar 17 11:09 port
    drwxr-xr-x 4 root root 4096 Mar 17 11:09 tmp


    So all DR nodes are at 144GB now?

    Yes,all of them

    Now the CAPC Server only stay as degraded  state and doesn't back to a normal state


    I sent you the logs that I can see in karaf.log:



    tail karaf.log
    INFO | zSchedulerThread | 2021-03-17 15:19:13,058 | atchProcessSystemLogNotifierImpl | atchProcessSystemLogNotifierImpl 57 | .ca.im.aggregator.loader | | Event 1.StateMisfire was generated
    INFO | nstance_Worker-1 | 2021-03-17 15:20:26,593 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 242 | .ca.im.aggregator.loader | | Batch process job: DimItemsETLJob completed in 0:05:26.590( 326590 ms )
    WARN | zSchedulerThread | 2021-03-17 15:20:26,594 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 276 | .ca.im.aggregator.loader | | Batch process job: PollItemsETLJob mis-fired due to long running job
    INFO | zSchedulerThread | 2021-03-17 15:20:26,595 | atchProcessSystemLogNotifierImpl | atchProcessSystemLogNotifierImpl 57 | .ca.im.aggregator.loader | | Event 1.StateMisfire was generated
    INFO | nstance_Worker-2 | 2021-03-17 15:20:51,613 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 242 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob completed in 0:00:51.612( 51612 ms )
    WARN | nstance_Worker-2 | 2021-03-17 15:20:52,764 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 248 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob required more than 80% of the currently configured execution interval of 1 minutes.
    INFO | nstance_Worker-2 | 2021-03-17 15:20:52,766 | atchProcessSystemLogNotifierImpl | atchProcessSystemLogNotifierImpl 57 | .ca.im.aggregator.loader | | Event 2.StateLongRunning was generated
    INFO | nstance_Worker-1 | 2021-03-17 15:21:48,576 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 242 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob completed in 0:00:48.574( 48574 ms )
    WARN | nstance_Worker-1 | 2021-03-17 15:21:48,576 | BatchProcessorImpl | ocessorImpl$BatchTriggerListener 248 | .ca.im.aggregator.loader | | Batch process job: GroupsETLJob required more than 80% of the currently configured execution interval of 1 minutes.
    INFO | nstance_Worker-1 | 2021-03-17 15:21:48,578 | atchProcessSystemLogNotifierImpl | atchProcessSystemLogNotifierImpl 57 | .ca.im.aggregator.loader | | Event 2.StateLongRunning was generated


    I can't see nothing more in Data Aggregator General Processing:


    The problem is when I restarted all servers from CAPC ,unfortunately the CAPC doesn't back to a normal stage  until now.So I have to restart again entire solution?

    thank you

    Valéria




  • 48.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 17, 2021 02:56 PM

    Is the DA failing to sync with PC?


    Yeh, it might be a good idea to bring down the DA, and DR nodes. Restart the DR DB.  Then start up the DA.
    I assume you did as I suggested and just brought down node 1 to add memory?  I wonder if we really needed to restart the DB when that happened.  I know we've had other customers add memory to nodes and did it 1 node at a time, and didn't seem to have any issues using method I suggested.

    Also, check /opt/DataAggregator/data and see if there is a dto_error directory with files.

    You might want to open a support case at this point, so support can directly work with you to get you back up and running.  They'll reach out to me most likely.  You can mention this thread and me in the case.




  • 49.  RE: many gaps in all graphics

    Posted Mar 17, 2021 03:58 PM
    Hello Jeffrey,

    I really appreciate your attention and supporting but the CAPC is back to a normal.

    thank you,

    Valeria


  • 50.  RE: many gaps in all graphics

    Broadcom Employee
    Posted Mar 17, 2021 04:13 PM
    I only suggested support in case it was an application down scenario, as they are equipped to work with you to get it fully working.
    If you hit that again, open a support case immediately to have them look at the environement.

    Did you end up restarting the DR/DA?


  • 51.  RE: many gaps in all graphics

    Posted Mar 17, 2021 04:47 PM
    Jeffrey,

    You are right about the support because they have all the tools to help me.
    I didn't need to restart the DR and DA.
    The CAPC is really faster right now and I think that I won't have any problems so early.

    Thank you