I have a severe issue with my CAPC i explain the whole scanerio to you maybe you have any solution to this by CA Support i am on this case from last 30 days approx. got no RCA/Solution.
my CAPC health status will automatically fluctuate between failed to active randomly with on different times. and then sometimes we observe the status is good and when i open the dashboards the data is missing from last 4 hours or 10 hours.
when i check the events there are too many status fail events at that time after which the data is missing in dashboards. when i stop and then restart the dadaemon service the data is then populated after few minutes.
currently no observation shared by CA support for this issue.
As per my observation maybe its due to connectivity issue.
as our DA,DRs,PC are on 1 subnet and the DC is on another subnet as we need public IP to communicate with devices present far away.
i am very upset as its the production environment and its not stable and we are suffering it from approx. 1.5 month... no proper justification shared by CA support on this issue.
Can you share the case numbers (can share privately sending to email@example.com if you wish) as it appears this is a matter already gone over with support.
If that isn't correct and no support case has been opened for analysis of the problem, one should be opened as soon as possible, along with logs via the re.sh script packages, for analysis.
i send you all the case numbers on email and you can also check this events from last 4 hours the DA is continuously fluctuating.
It's clear from the case notes that there are capacity issues involved. The latest updates in the case about the DA sync failure, and the one about DA health degraded are the same as the cases are the same.
Those should be one support case, for the same problem.
It appears from the research in the cases that the systems are either:
The logging messages noted in the cases clearly show memory related usage problems on a regular basis.
I suspect you may have a large number of inactive and/or retired items in the environment being loaded into memory and causing the issue. This is often seen when the DA data source is set to sync inactive items. If excessive numbers of QoS related items have been created it can cause similar symptoms.
I'd recommend following the steps provided in those cases, working with support to clean up the system of old deleted and/or retired items and components. Once some of the load is decreased or resources are increased I suspect the problems will start to dissipate.
If they don't, once the system is cleaned up and not showing capacity or memory problems, then whatever issue remains will be more readily found and resolved.
These are VM systems according to the notes. It is also possible that the resources for the VMs are shared, not dedicated. If shared other systems sharing the resources may be causing an issue. VM resources for these systems in production environments should always be dedicated to avoid resource contention issues with other VMs sharing the resources.
yes we deleted many items shared the results on ticket but still there are many items and also filtered items present those are representing in CAPC. IS there any way to remove these we are never using QOS currently.
Did you break the connection between any QoS Metric Families, and their associations to devices via Monitor Profile to Collection associations?