I have attached a copy of the presentation I made to the UK User Group back in 2015.
For my approach to work you will need to convert your loading scripts to run as GEL scripts in a process. These scripts will run on the source systems, build the XOG files, and then call XOG on the destination system.
Also, the XOG data should be batched into multiple object instances in a single XOG file. E.g. 10/50/100 Projects/Resources/Custom Instances/etc per XOG file. Your mileage will vary so you will need to instrument your code and find the sweet-spot for your PPM system configuration.
I have also has some success with data loads where my GEL script ran outside of PPM/Clarity and used Threads within the script to send the XOG files. Throughput was impressive BUT getting all the Threads to synchonize properly to a queue of available XOG files and catch errors was intermittent. I feel sure this approach would work if converted to Java but I have not had the time to investigate any further.
What all my experimentation has shown is that the XOG endpoint ('.../niku/xog') is capable of taking many simultaneous XOG files without choking as the limitation appears to be the raw processing speed of XOG and not any inherent limitations in the App code or it's interaction with the database.