Layer7 API Management

 View Only

Rate Limit strange behavior with Loadbalancer

  • 1.  Rate Limit strange behavior with Loadbalancer

    Posted Jul 30, 2020 11:25 AM
    We have a Policy using a Rate Limit Assertion to protect the backend with high load.
    We are using a Limit of 40 requests per seconds with a spread of a 30 seconds window. The "Cluster Wide" Checkbox is selected.
    We are using the "Throttle" Option and don't use concurrency.
    In our statistics we see that the rate Limit gets active even though the load is still far below the Limit. We found out that the reason for this is an unequal load Distribution from the Loadbalancer (LB) in front of it (we are using F5 LB). It seems that out of our 6-node cluster just a single node gets load. This would at least explain, why the rate Limit is active for this node. But the Question is, why the other nodes are not getting requests anymore. Only the Policy with the rate Limit seems to be affected, if we check the load Distribution across the whole Gateway we see a Pretty good Distribution.

    We have the following Question:
    1. Why is there this "Limit Each" Option available in the Rate Limit Assertion, because this is a "second" factor, which might be helpful in Special use-cases, but if I want to have a fix Rate Limit, then I don't require this Option. We first tried the "Gateway Node" Option and have now changed it to the custom variable "${Service.name}". Can you please provide some more technical Details behind that Option? Everything what is provided in the official documentation is not satisfying.
    2. Do you have any experiences with such a behavior in combination with a LB?
    3. Should we try it without the "Cluster Wide" Option and reduce the values accordingly? Is there any inter-cluster communication required for this Option and are there any side effects, if this is not working properly?

    Here are also some screenshots as an example:
    And here the load distribution for the same timeframe:
    As you can see, both red blocks from the first screenshot only a single cluster node gets load.
    This is strange!!!

    Do you have any ideas what's going wrong here?
    Thanks for any useful information!

    Ciao Stefan :)