the system has been set up before I joined and was way overspec'd. We do have:
However, the load is as low as 10% peak (hourly basis) with 5% average on the AE-server and 20% peak (hourly basis), 6% average on the DB machine - so basically nothing. Even if I would take these values x4 to compensate for the massive overspec, it would feel just right.
Based on my experience, a low CPU load for both AE and DB servers, especially during times of high workload, is a good indicator for too much latency on either or both sides (network / storage) as wait times on the CPU, typically do not show as utilization on the OS. Within the Automic SystemOverview on the other side, wait times DO show as utilization.
Best,
Original Message:
Sent: Aug 06, 2024 11:06 PM
From: BEN BAEZ
Subject: Poor Automic database Performance U00003533
Hello Rene,
Thank you for sharing your experience. a Follow up question for you, on you two servers 20 concurrent users, using 1 PWP, 3 DWP, 12 WP, 4 JWP, 4 JCP and 4 CP. How many CPUs and how much memory are you running on these servers? Our servers meet their requirements of four CPUs and 32Gb of memory. None of which seems to get hit very hard.
Thanks again
Ben
Original Message:
Sent: Aug 05, 2024 03:50 AM
From: Rene Kappel
Subject: Poor Automic database Performance U00003533
Hello Ben,
We have been through the same and found two main ingredients for a performant system:
- network latency between AE server and DB Server: The network needs to be able to achieve a round trip time of maximum (and this is he absolute maximum) 1ms. Clearly 0.1ms is the actual goal. (!) Important these values need to be achieved for an 8K package in order to be able to carry the data between AE and DB fast enough. ("ping -l 8000 <ip>" depending on the OS the command might vary).
We made sure to have no routing or firewall between AE and DB.
- IO latency: Our datacenter runs on hyperconverged VMWare. We changed the storage policy so that only disks in the same rack are being used for the DB machines. Our system works fine with a latency of max 0.2ms for a 1Q/1T/4K write load. With 0.5ms we had massive performance issues.
All those measures brought us from a DB-Perf-Index (U00003533) of around 50 all the way up to 300 (even 400 on the good days). We also tried different driver versions, but that didn't help at all.
The 3505 'Reference' value is a number which can only be achieved with a setup in which no real-world production system would live in: All local NVMe disks, AE server and DB on the same machine, highly optimized for performance, but not for reliability. I'd estimate the maximum achievable value for a real-world, redundant scalable production environment is around 1000 to 1500 max.
Some more numbers to give you some scope:
- avg. Commit time: <3ms (90% of all Commits <5ms)
- avg. transaction time of WPs: <25ms
- 20240805/020033.872 - 20240724/112039.093 - U00003533 UCUDB: Prüfung der Datenquelle fehlerfrei abgeschlossen. Performance CPU/DB: '95984259'/'323 (1000/3.088516 s)'
- with these numbers we are able to run up to 4000 activations per minute peak (~2000 all day long) while serving up 20 concurrent users, using 1 PWP, 3 DWP, 12 WP, 4 JWP, 4 JCP and 4 CP distributed to 2 server
AE: OS MS 2022 / 21.0.10
DB: OS MS 2019 / SQL: MSSQL 2019
Long story short: for us the issue was within the infrastructure.
Original Message:
Sent: Aug 04, 2024 05:34 PM
From: BEN BAEZ
Subject: Poor Automic database Performance U00003533
Hi Lester,
Did you ever find the resolution to this issue? I am looking at the same situation after upgrading to 21.0.11 from 12.3
20240804/085255.805 - U00003533 UCUDB: Check of data source finished: No errors. Performance CPU/DB: '23426205'/'8 (1000/113.539095 s)'
20240804/085255.805 - U00003544 UCUDB: Reference values tested with Linux x64 on XEON 3600 MHz: CPU 525716336, DB 3505
Our production environment AE servers are Windows 2022 and the database is MS SQL 2022.
Our connection string in the usrv.ini: sqlDriverConnect=ODBCVAR=NNNNNNRN,DSN=automic;UID=automicuser;PWD=101938AD27E8CD9B6;Mars_Connection=Yes
We are using mssql-jdbc-12.4.1.jre11
Both AE and database servers are on a 172.23.x.x subnet and tracert is 2 hops to the server.
Per the DBA, the database server and AE servers operates within the norms of our network.
I am also looking to see if anyone else had an issue where the database performance number this low and what steps was done to increase the number.
Original Message:
Sent: Mar 16, 2023 12:53 PM
From: Lester Chew
Subject: Poor Automic database Performance U00003533
All,
It has been pointed out to us, that we have a database performance issue based on the below AE log entry (all entries are similar).
20230314/085622.531 - 20230228/233418.438 - U00003533 UCUDB: Check of data source finished: No errors. Performance CPU/DB: '29047518'/'87 (1000/11.440038 s)'
Our production environment AE servers are Windows 2016 and the database is MS SQL 2019.
Our production environment has always been below 100. Our non-production is around 350 to 400.
Broadcom has been involved and pointed out that the DB has an issue with the IO as the rollback/commit is far too slow.
This is our connection string in the usrv.ini: sqlDriverConnect=ODBCVAR=NNNNNNRN,DSN=automic;UID=automicuser;PWD=101938AD27E8CD9B6;Mars_Connection=Yes
We are using mssql-jdbc-8.4.1.jre8.jar
Both AE and database servers are on a 172.26.x.x subnet and tracert is 3 hops to the server.
Per the DBA, the database server and AE servers operates within the norms of our network.
I am looking to see if anyone else had an issue where the database performance number this low and what steps was done to increase the number.