There seems to be a lot of threads here... but the IMMEDIATE solution is to simply "drop" the smartstors - and get APM back in business. Simply stop the collectors, rename the ~/data directory, and restart - and the historical metrics are eliminated.
Next, let's address some apparent confusion amongst "historical" and "live" metrics.
"Live" metrics are those that are being generated by the agent and reporting every 7.5 seconds to the Collector. This is later aggregated to give the 15 sec reporting interval. Check out the APM Performance and Capacity Management Guide... on the BookShelf.
Historical metrics are ANY metric that has been "aged out" of the current display. This happens after 60 minutes, when a metric goes from "live" to "historical". This is simply metadata about the agent, that is preserved, so that IF and WHEN an agent reconnects with that metric, that the Collector 'knows' that metric and reuses the metadata. This scheme works great... until it doesn't... and then excess METADATA (historical metrics count) accumulates until the working memory of the Collector is compromised... and bad things happen.
The SmartSotr also stores metric data (no strings) for up to one year. This is, technically, a "historical" data... but this IS NOT the same thing as the historical-metric-meta-data we are talking about.
So why don't just add... like - unlimited RAM??? Simple. The metadata is a DATA STRUCTURE which is navigated whenever a metric arrives (in the simplest sense). The bigger the data structure, the longer the the time spent 'walking' that data structure in order to find the metadata. Of course, there is a cache - which is what the 60 minute interval is for - so that 'active' metrics are found quickly.... but the problem is the growing size of the "historical metrics" - those which have been inactive for at lest 60 minutes.
So how do we end up with excess historical metrics? Lots of ways...
#1 offender - unique SQL statements. Basically a statement that is encountered once... AND NEVER AGAIN. We set aside (5) metrics every time we hit one of those puppies... which simply flood the metadata. The solution is to do a "SQL NORMALIZATION" which effectively 'wild-cards' those variable SQL metrics into a SINGLE statement - and thus only (5) metrics will be collected... no matter how many statements are encountered.
#2 offender - lots of variety in agent naming. This is often the case in the initial deployment, when agent names might change. Since the metric has a FDN (fully distinguished name) of PLATFORM|PROCESS|AGENT - this can create huge piles of metadata (historical metrics) - which are never seen again.
#3 offender - web services which themselves are just tons of unique calls - imagine a unique transaction ID carried by a single web service call - spews historical metrics like mad. This problem is harder to deal with but the quick solution is always TURN IT OFF - until you have time to correct the configuration.
Figuring out which type of problem you have.. it means actually looking at the workstation-Investigator, and looking for huge, steaming piles of 'greyed-out' metrics - these are the source of the problem. Anything else... and you will likely be chasing your tail for weeks and weeks. The easy place to start is to simply look for agents that have excessive number of LIVE metrics (>8k)... and then check the usual suspects. This works pretty much EVERY TIME ;-) No exotic settings required.