I'm looking for some recommendations or advice on supporting batch style workloads on the API Gateway. I had a few situations where I've been asked if we can put a service on the gateway that has a particular profile such that it handles large message bodies (document processing, large xml content generation, batch feeds, etc...). These services also have a high latency where responses can return upwards of 1-2 minutes.
I've been reluctant to host these on a gateway that primarily supports transactional work (quick communications with back end services). I don't want one workload to swamp the gateway and affect the other.
Does anyone else have this kind of problem? If so, how did you go about addressing it? Did you allocate another gateway that is tuned for these kinds of workloads? Did you expose them using an alternate means?
Yes, from my point of view, it may be better to use another gateway cluster to handle long transactions, the load balancer should be able to route the traffic (maybe based on uri ).
Further to what Mark said, there are dozens of things that make "batch style workloads" and "transactional style workloads" very different from each other.
The first thing that comes to mind is that HTTP itself is ill-suited to batches with long latencies. Many load balancers don't support long timeouts on http connections. Our own timeouts default to 60 seconds, and this means that some of the configuration changes you'll have to do start to conflict with one of the big things about high throughput situations: the very real conflict between concurrency, latency and TPS. TPS = Concurrency/(latency in seconds). With 1 concurrent request, you need < 10 milliseconds total request/response time to get above 100 TPS.
If I was planning a "batch" processing system, I'd want to have strict controls on concurrency. For that I'd probably want to use some aggressive mechanism to limit concurrency - if the back end is slow, we hold the messages in RAM for the duration. This makes it quite easy to run the gateway out of RAM.
Using a separate port with a relatively low thread count and private thread pool is a way to do that.