Idea Details

Publish APM Data to ElasticSearch (and now Splunk!)

Last activity 06-17-2019 09:14 AM
tnoonan's profile image
07-03-2015 09:26 AM

It has long been an issue no matter where I have worked with in the past with CA's APM that if you are working in a large environment with a LOT of agents and thus agent data, that you will begin to run into performance issues with the cluster.  The current organization that I am involved with has multiple parties that are trying to re-surface our performance data ranging from static Cognos reports, third party dashboarding technologies or capacity management reporting.  All requests take a toll on the collector architecture when one of those third parties comes in and trys to query say 30 days worth of data.

 

While researching solutions, I came across the ELK stack from Elastic.co.  Specifically the ElasticSearch component of the stack.  It's a document based storage facility that is super fast and super scalable.  So, my thought was to publish data, in real time, to an elastic search engine.  Then that document repository could be used by the third parties wanting historical performance data at will.  Beat it up...all the while our EM cluster is calmly pushing out 1 minute data to this data store, reducing the hit that we take when a large query comes through.  I'm not trying to boil the ocean and send "every" metric possibly available.  We should just be going after the KPI's that are important to the organization, such as:

 

  • CPU, Memory, Workload for Capacity management solutions
  • CEM RTTM data
  • Frontend/Backend high level data
  • whatever is important to your organization

 

I've attached a zip file that contains my first pass at an integration that does just this.  Publishes CA's APM data via simple REST calls to an ElasticSearch node.  The zip file contains the following:

 

  1. Readme document providing the nuts and bolts of the integration
  2. jar file to be included on the classpath of the MOM and all collectors
  3. javascript example files, for publishing certain metrics
  4. ElasticSearch mapping file that corresponds to this integration.

 

--------- Update on 7/27/2015  ------------------

v3.0 of extension to publish data to ElasticSearch

1. Changed elastic search mapping to a template that supports daily creation of indices.  Prior releases created only a single indices "app_performance", which would make it difficult to purge data. Now a new index is created on a daily basis based on current time.

2.  In new template mapping storing much less data in the document.  Only value, min and max.  All other elements are searchable but can only think of the values of the metric being the most important to re-display.

----------------------------------------------------------- end update------

 

--------- Update on 9/7/2015  ------------------

v3.1 of extension to publish data to ElasticSearch

1. Modified the pattern of one to one relationship between metric pulled and json post request to ElasticSearch.  Now extension uses a bulk process of pulling metrics and then sending a single post request to ES.  It doesn't speed the time to pull data from the smartstor, but the separate thread that is spawned from the process doesn't have to manage potentially 100's or 1000's of separate connections to the ES data store.

----------------------------------------------------------- end update------

--------- Update on 3/4/2017  ------------------

I know this is really not the right venue to officially submit this integration to the community...but whatever...use it...don't use it...BUT I've added in functionality to do the same kind of data spooling for Elastic but now sending to Splunk as well.  Setup is the exact same, deploy jar file and javascript files and BAM!  you've got data going to another datasource.  One of my main customers we are sending around 12k metrics every 5 minutes to ElasticSearch for keeping SLA data for an extended period of time.  I've got 18+ months and counting of elastic indexes with general availability data, including CEM and ADA.  Really anything that you would want to send off that reports to the Investigator tree.

----------------------------------------------------------- end update------


Comments

07-03-2018 07:46 AM

I did try the JDBC plugin first, but there were conflicts between its version of bouncycastle and the version used by logstash.  As well, it is not a real JDBC driver, it is kind of like a JDBC driver, with methods that are not implemented properly so I was skeptical it would work at all with the JDBC plugin.  Finally, it is not used in version 10.x, so I would have to replace it anyway. I wrote my application to easily replace the component that is doing the query with a different version (APMSQL) when we upgrade to 10.x

 

We only needed 15 minute resolution every 24 hours, not live, and not for all metrics, so the data is manageable by making multiple queries to the EM via logstash to retrieve whatever metrics we want.  However, if you wanted to export all the data, you might be better copying it to a second EM cluster for processing by ES.  To do so I believe you are supposed shut down the operational cluster before doing the copy.  

07-03-2018 04:53 AM

The solution provided are very good with isolate agents as the EPAgent with 1 minute resolution as tnoonan is using in the example, I believe will be a issue with large amount of data - That's why I think the best approach is export the data from CA APM and let Elasticsearch/Splunk come to index the files - they're good doing this kind of job.

07-03-2018 04:50 AM

Hi Neil

 

Have you tried to connect to CA APM using the Logstash plugin JDBC? 

 

Cheers

Cristiano 

07-02-2018 08:15 PM

I've tried using the solution provided by @tnoonan and while it was a good idea, it didn't fit my needs for being able to index large amounts of data and I was worried about its performance affecting the EM while running the calculators for a large amount of metrics being sent to the ES cluster.  We wanted to capture a large number of metrics daily for capacity/performance/ad-hoc analysis in Kibana.

 

I think it would be better for CA to provide an official ES input plugin to extract the metric data via logstash and some useful filters to process them.  That way you can send it to whatever cluster you like and incorporate your own filters/mutators/etc. to format the results as you like.  In our case, our ES cluster has to index not just Introscope, but performance data from other sources as well (zOS, ESX Clusters, etc.). 

 

To solve our requirements, I wrote a Java application that utilizes the APM "JDBC" API (we are at 9.7 and will need to change this in 10.x since that API went away in favour of APMSQL).  It runs from the  "exec" input plugin in a logstash pipeline.  Given the query parameters (metric/agent regex, date range, period) I query the EM, convert it to csv, and write it to stdout to be processed in the pipeline.  It wakes up once a day and retrieves the previous days worth of data for metrics that I am interested in and runs them through common filters (to create canonical metric names for the same type of metric) and specific filters that do things like join metric data for the same resource (like the 5 blame point metrics) into the same JSON document where needed.

 

While it wasn't overly difficult to write the application or filters (even with the weird implementation of the JDBC "driver"), it would have been nice to have an officially supported plugin.

 

 

Cheers,

Neil

07-02-2018 10:17 AM

Hi all

 

First of all I would like to thank tnoonan to share this solution.

 

 

I want to give my thoughts as I have experience with CA APM and Splunk and I start using ELK recently. 

 

I like the idea to use Splunk/Elasticsearch with what they have best which is INDEXING.

 

I would change the solution proposed to use Elasticsearch(Logstash)/Splunk(Universal Forwarder) to index the CA APM log files directly without the JSON requests and continue to do what you did exposing the metrics in the CA APM log files.

 

If one of the JSON requests fails you have a risk to have gap in the Kibana/Splunk dashboards but if you index the log files directly the chances of losing information are very low.

 

What you think guys? 

01-23-2018 02:17 PM

This is a neat idea, thank you for implementing it.  If you are using  ElasticSearch 6.1, you need to make a few changes:

 

When sending the mapping to the ES server:

 

You now need to specify the content type on the curl command or you get an error indicating the content type is not supported.    I used:

curl -H'Content-Type: application/json'  -XPUT 'http://myhost:9200/_template/ca_apm_template' -d@ca.apm.template.json

 

 

In the mapping file:

  • "string" type is no longer supported, use "text" instead.
  • index value "analyzed", "not_analyzed" is not supported (cannot cast to a Boolean), use true, false respectively.
  • The default  "_all" field feature is no longer supported.  I just disabled it, but apparently you can use the "copy_to" feature to copy the fields you would like to be added together to a separate field instead of setting a default "_all" field.

I'm just starting with ES, so your mileage may vary

 

Cheers,

Neil

01-15-2018 05:19 PM

No; agents will also send metrics to be created in APM. You can, however, intercept them as they're being sent to the EM via JavaScript calculator and sent out to another location. You'll need some proficiency with JavaScript to do this, but not entirely impossible.

You could also use APMSQL to gather data to export for reporting purposes.

01-12-2018 05:16 AM

Hello Tommy,

Is there a way to get metric data directly from agents without create metrics on introscope?

In my scenario i have to get URL GROUP metrics from agents, but i don't want to create metric tree into introscope.

I want to get metrics directly from agents and send to my elasticsearch.

 

Regards.

03-04-2017 12:32 PM

Hey, I'm sorry I just now saw this post.  I believe that was a bug that i fixed later on down the road.  The thinking at the time was if we were trying to debug a problem...maybe we didn't want to send anything.  Now it should write debug output but also send the metric.  Give it a try if you like again.  Thanks for giving it a shot.

 

-Tommy

09-07-2016 10:11 AM

This is great thanks for sharing!

06-07-2016 04:35 PM

Great job and great post. Not sure why but if I disable the debug information . Field #13 in the method Packages.com.rfdinc.ElasticSearchPlugin (The one that  writes out debug information to standard out  ) then the  ES engine stops receiving data. Did somebody had the same issue? Any suggestion?

 

 

best regards.

07-27-2015 03:44 AM

I found this very insightful, spending some time with customers I realise that APM reporting is not providing them answers to the issues at hand. ATC goes a long way to address the real time analysis but found the ELK stack approach, provide/listing questions-to-solution, noteworthy. This is not limited to APM only.


I will definitely look into this, thanks for the post.

07-03-2015 04:41 PM

Well done Tommy. This looks like a great community developed solution for large reporting needs. Aswesome job. Thanks for sharing.