Our NimsoftSLM database is currently 5.5 TB running on a clustered Sql
Server with a 3PAR shared storage running on pure SSD. Each of the two
nodes have 144GB RAM and dual E5630 (2.53GHz) CPU. We get over 1GB/s
against the disks.
UMP is snappy as ever (or close enough at least), and we have no
problems graphing arbitrary periods on a number of different qos with
things like PRD.
The 3 biggest tables are:
TABLE ROWS DATA INDEX QOS
RN_QOS_DATA_0007 7.146.672.622 621GB 114GB QOS_DISK_USAGE
RN_QOS_DATA_0008 6.149.095.413 532GB 105GB QOS_CPU_USAGE
RN_QOS_DATA_0016 6.113.560.258 527GB 104GB QOS_DISK_USAGE_PERC
However, there is a problem with the default way SQL Server Updates
statistics on tables which caused us some problems. It only updates
statistics if more than 500+20% of the rows are changed/updated/added.
On large tables (like the ones above), this causes the statistics to
not be updated often enough and the result is that sql server will use
the wrong index for the queries depending on what period you are
graphing.
Updating the trace flag to make it use dynamic thresholds based on row
count to decide when it should update the statistics for most parts
solves this problem.
Bottom line, the trace flag might help you out, and you can always buy
yourself out of the problem with more hardware