DX NetOps

View Only

DX NetOps and Heap Memory (#3)

MARUBUN SUPPORT posted Sep 26, 2024 02:47 AM

Hi Team,

I have a question regarding "DX NetOps and Heap Memory (#2)" (

https://community.broadcom.com/question/dx-netops-and-heap-memory-2#5690e75c-bbe2-4e85-8846-0192055eb7de

), which you answered previously.

I would be grateful if you could answer following question.

[Product]

DX NetOps PM 22.2.2

Linux RHEL7

[Questions]

>>(1) Based on this advice, does it mean that 400MB of performance-spool capacity is insufficient, and do they need to ensure that the performance-spool capacity is around 2 to 3 GB?

> Correct, 400MB is not near enough, especially if it's being shared by the rest of IMDataAggregator directory. We have logs that can take up space, etc. Plus depending on scale or catching up, we could use 400MB in 1 file to load into DB.

>> (2) If they need to increase the spool capacity to 2-3 GB, please tell me how to increase the spool capacity.

> I would suggest 2G per MF they poll as a minimum, if performance-spool is it's own directory, but like 20-25G free if we're talking all of IMDataAggregator to handle logs, and other files that can change over time.

The end user understood that the free space in performance-spool does not provide a "safe zone where service will not be affected even if a service restart occurs."

The end user understands that "the free space in performance-spool can determine a safe zone where service will not be affected even if a service restart occurs."

Could you please explain to the end user the reason why "the free space in performance-spool does not provide a "safe zone where service will not be affected even if a service restart occurs" so that they can understand?

> performance-spool should be at least 2-3G per MF being polled, IMO. SO it can handle these catch up periods depending how long DA/DC are not connected.

Is there any documentation (such as release notes) that describes this content?

Best Regards,
Marubun Support

Broadcom Employee Jeffrey Pinard posted Sep 27, 2024 01:54 PM

What does your mean by a "safe zone" not effected by service restart???

2. Those numbers are just an estimate from my years of working on this product and what each MF might need for space during a DA/DC catch up (when DC reconnects after being down for some time).

MARUBUN SUPPORT posted Sep 30, 2024 04:36 AM

> What does your mean by a "safe zone" not effected by service restart???

My explanation was insufficient.
When heap space usage increases, end users reboot the system to reduce heap space usage.
End users believe that there is a size (=> safe zone) above which this does not need to be done.
However, your previous answer did not specify that size, which led to this question.

Catalin Farcasanu posted Sep 30, 2024 04:55 AM

You do realize the absurd of the situation? Basically you reboot the server to reduce the heap size (that's not a solution per se) , then you ask what value is acceptable so you don't reboot the server?

Broadcom Employee Jeffrey Pinard posted Sep 30, 2024 09:55 AM

I agree with the comment that rebooting the server is NOT the correct way to handle heap usage.

Memory is meant to be used, it's not meant to just sit there incase you need it.

Java 17 (NetOps 23.3.8+) tends to use more memory at once so that GC is kept low. If memory does get too high, a longer GC may be needed to reclaim memory that the application is asking for. But it could also mean that the scale of the system has increased to a level that may need more memory to more efficiently. G1GC used in Java 17 is auto-tuning based on memory usage. So each environment is different in terms of polled items, device count/types, how much thresholding, or rollups it does. So there really isn't a magic number to remain under.

You need to watch the application pause times in DA/DC self monitoring dashboards. once those start getting over 3-4 secs per minute, maybe consider adding memory after checking if scale is increasing. That doesn't mean just item count, but how much thresholding, how many groups you define in Portal, etc...

MARUBUN SUPPORT posted Oct 03, 2024 03:48 AM

Thank you for your reply.

Just to be sure, is this the same for DX NetOps 22.2?

Broadcom Employee Jeffrey Pinard posted Oct 03, 2024 09:38 AM

22.2 is different in that it uses Java 11 and CMS GC method.

Meaning we've seen it favor keeping memory as low as possible at the expense of app pause.

So the closer you get to 70-75% heap usage, the more aggressive we've seen CMS GC get and pause the app for longer.

Why we like the Java 17 G1GC as it favors app performance/uptime over keeping memory as low as possible.