So, we've run into a bit of a problem over the weekend related to this.
After reading https://na4.salesforce.com/articles/Case_Summary/alarm-enrichment-probe-become-unresponsive?popup=true we attempted to add these config keys so it would auto restart when needed:
lower_memory_usage_threshold_percentage = 0.90
upper_memory_usage_threshold_percentage = 0.90
memory_usage_exceeded_threshold = 1
Last night, it ended up restarting so often that controller stopped retarting it.
So, I guess we have the choice between it stopping to process, and it restarting so often it dies. Good choices
It's "functioning" at the moment, but still restarting quite often (every 10-20min). We get the following in the logfile:
feb 22 10:18:47:900 [attach_clientsession, alarm_enrichment] AlarmQueueReader: Upper capacity check : memory (free/used/total): 10139736/102057896/112197632 OR 0.9096261140342071% used
feb 22 10:18:47:900 [attach_clientsession, alarm_enrichment] AlarmQueueReader upperCount: 1
feb 22 10:18:48:016 [attach_clientsession, alarm_enrichment] Nas alarm_enrichment INTERNAL_RESTART checkForCapacityToAddMoreAlarms
feb 22 10:18:48:058 [attach_clientsession, alarm_enrichment] TimeOverThesholdService shutting down
feb 22 10:18:48:627 [attach_clientsession, alarm_enrichment] CmdbMessageEnricherError closing : 'by_source' problem: null
feb 22 10:18:48:627 [attach_clientsession, alarm_enrichment] CmdbMessageEnricherError closing : 'by_source' problem: null
feb 22 10:18:48:627 [attach_clientsession, alarm_enrichment] CmdbMessageEnricherError closing : 'by_source' problem: null
feb 22 10:18:48:627 [attach_clientsession, alarm_enrichment] CmdbMessageEnricherError closing : 'by_source' problem: null
feb 22 10:18:48:627 [attach_clientsession, alarm_enrichment] Nas: Shutdown complete in 610ms
So, it's clearly the 0.90 threshold restart that is being triggered.
Naturally, these config options aren't documented, so does anyone know what memory_usage_exceeded_threshold actually is? Could it be number of times it needs to breach the memory threshold before restarting?
Also, when it's restarting, it's using 112 MB memory, which is quite far away from the 1024MB max it has configured as max:
../../../../jre/jre7/bin/java.exe -Xms64m -Xmx1024m -Dfile.encoding=UTF-8 -jar ../lib/alarm_enrichment.jar
Increasing -Xms might at least increase the time between restarts, as it hopefully would have more memory allocated initially, but hard to ****. How to tune the memory settings in alarm_enrichment isn't really documented either. Since it's config is a part of the nas config, I'm guessing it might be different than the other java probes? Does anyone know?
Adding the usual:
<startup>
<opt>
java_mem_init = -Xms512m
</opt>
</startup>
to nas.cfg doesn't do any good at least.