Automic Workload Automation

 View Only
  • 1.  Poor Automic database Performance U00003533

    Posted Mar 16, 2023 12:53 PM

    All,

     It has been pointed out to us, that we have a database performance issue based on the below AE log entry (all entries are similar).

     20230314/085622.531 - 20230228/233418.438 - U00003533 UCUDB: Check of data source finished: No errors. Performance CPU/DB: '29047518'/'87 (1000/11.440038 s)'

     Our production environment AE servers are Windows 2016 and the database is MS SQL 2019.

     Our production environment has always been below 100.  Our non-production is around 350 to 400. 

     Broadcom has been involved and pointed out that the DB has an issue with the IO as the rollback/commit is far too slow. 

     This is our connection string in the usrv.ini: sqlDriverConnect=ODBCVAR=NNNNNNRN,DSN=automic;UID=automicuser;PWD=­­101938AD27E8CD9B6;Mars_Connection=Yes

     We are using mssql-jdbc-8.4.1.jre8.jar

     Both AE and database servers are on a 172.26.x.x subnet and tracert is 3 hops to the server.

     Per the DBA, the database server and AE servers operates within the norms of our network.

     

    I am looking to see if anyone else had an issue where the database performance number this low and what steps was done to increase the number.



  • 2.  RE: Poor Automic database Performance U00003533

    Posted Mar 22, 2023 02:52 AM

    Let me also ask this questions.  Is the IP address of your AE database and AE servers matches the network part 192.168.x.x and is your DB stats at least greater than 300?  I understand if the first three octet matches, that would be best, but does it make that much of an improvement vs. matching only the first two octet?




  • 3.  RE: Poor Automic database Performance U00003533

    Posted Aug 04, 2024 05:35 PM

    Hi Lester,

        Did you ever find the resolution to this issue? I am looking at the same situation after upgrading to 21.0.11 from 12.3

    20240804/085255.805 - U00003533 UCUDB: Check of data source finished: No errors. Performance CPU/DB: '23426205'/'8 (1000/113.539095 s)'
    20240804/085255.805 - U00003544 UCUDB: Reference values tested with Linux x64 on XEON 3600 MHz: CPU 525716336, DB 3505

    Our production environment AE servers are Windows 2022 and the database is MS SQL 2022.

     Our connection string in the usrv.ini: sqlDriverConnect=ODBCVAR=NNNNNNRN,DSN=automic;UID=automicuser;PWD=­­101938AD27E8CD9B6;Mars_Connection=Yes

     We are using mssql-jdbc-12.4.1.jre11

     Both AE and database servers are on a 172.23.x.x subnet and tracert is 2 hops to the server.

     Per the DBA, the database server and AE servers operates within the norms of our network.

     

    I am also looking to see if anyone else had an issue where the database performance number this low and what steps was done to increase the number.




  • 4.  RE: Poor Automic database Performance U00003533

    Posted Aug 05, 2024 03:55 AM

    Hello Ben,

    We have been through the same and found two main ingredients for a performant system:

    1. network latency between AE server and DB Server: The network needs to be able to achieve a round trip time of maximum (and this is he absolute maximum)  1ms. Clearly 0.1ms is the actual goal. (!) Important these values need to be achieved for an 8K package in order to be able to carry the data between AE and DB fast enough. ("ping -l 8000 <ip>"  depending on the OS the command might vary).
      We made sure to have no routing or firewall between AE and DB.

    2. IO latency: Our datacenter runs on hyperconverged VMWare. We changed the storage policy so that only disks in the same rack are being used for the DB machines. Our system works fine with a latency of max 0.2ms for a 1Q/1T/4K write load. With 0.5ms we had massive performance issues.

    All those measures brought us from a DB-Perf-Index  (U00003533) of around 50 all the way up to 300 (even 400 on the good days). We also tried different driver versions, but that didn't help at all.

    The 3505 'Reference' value is a number which can only be achieved with a setup in which no real-world production system would live in: All local NVMe disks, AE server and DB on the same machine, highly optimized for performance, but not for reliability. I'd estimate the maximum achievable value for a real-world, redundant scalable production environment is around 1000 to 1500 max.

    Some more numbers to give you some scope:

    • avg. Commit time: <3ms (90% of all Commits <5ms)
    • avg. transaction time of WPs: <25ms
    • 20240805/020033.872 - 20240724/112039.093 - U00003533 UCUDB: Prüfung der Datenquelle fehlerfrei abgeschlossen. Performance CPU/DB: '95984259'/'323 (1000/3.088516 s)'
    • with these numbers we are able to run up to 4000 activations per minute peak (~2000 all day long) while serving up 20 concurrent users, using 1 PWP, 3 DWP, 12 WP, 4 JWP, 4 JCP and 4 CP distributed to 2 server

    AE: OS MS 2022 / 21.0.10

    DB: OS MS 2019 / SQL: MSSQL 2019

    Long story short: for us the issue was within the infrastructure.




  • 5.  RE: Poor Automic database Performance U00003533

    Posted Aug 06, 2024 11:06 PM

    Hello Rene,

       Thank you for sharing your experience. a Follow up question for you, on you two servers 20 concurrent users, using 1 PWP, 3 DWP, 12 WP, 4 JWP, 4 JCP and 4 CP. How many CPUs and how much memory are you running on these servers? Our servers meet their requirements of four CPUs and 32Gb of memory. None of which seems to get hit very hard.

    Thanks again

    Ben




  • 6.  RE: Poor Automic database Performance U00003533

    Posted Aug 07, 2024 01:34 AM

    Hello Ben,

    the system has been set up before I joined and was way overspec'd. We do have:

    DB: 16CPU, 192GB RAM

    AE-Server: 16CPU, 64GB each

    However, the load is as low as 10% peak (hourly basis) with 5% average on the AE-server and 20% peak (hourly basis), 6% average on the DB machine - so basically nothing. Even if I would take these values x4 to compensate for the massive overspec, it would feel just right.

    Based on my experience, a low CPU load for both AE and DB servers, especially during times of high workload, is a good indicator for too much latency on either or both sides (network / storage) as wait times on the CPU, typically do not show as utilization on the OS. Within the Automic SystemOverview on the other side, wait times DO show as utilization.

    Best,

    René