Rally Software

 View Only
  • 1.  Webhooks Update

    Broadcom Employee
    Posted Aug 24, 2020 05:26 PM

    Hello Rally Customers, 


    We wanted to provide you all with some updates as we continue to narrow down the root cause of the webhook issue. 

    • Issue #1: Cases where webhooks may fire multiple times based on the number of expressions in that webhook.  For example, if a customer has a webhook rule with 2 expressions that webhook can actually fire twice.  Depending on a customer's code implementation/error handling you may just drop the extraneous messages on the floor. Regardless of whether or not you're seeing this directly, the net result is that this issue has caused massive inflation in the size of our webhooks queue, which is driving the inconsistent delivery times that you're seeing.
      • Next Steps and Timing:  We were just able to pinpoint this exact behavior as the root cause in the last hour. The engineering team is working on a fix right now but can't say yet how soon we will be able to deploy something. As soon as I know more about this fix I will share an update right away. 
    • Issue #2: The other issue that surfaced as part of this is a network configuration setting that limits the number of outbound socket connections that we can make to anyone endpoint.  So while we have been flooded with this backlog of traffic this has limited our ability to move through the queue quickly across all customer and webhook endpoints.   While this issue is clearly exacerbated right now as a result of the first problem we believe that this may have manifested in the past but only very intermittently and was hard to track down leading to a perception that webhooks could be "flakey."  Correcting this problem should improve the overall reliability of webhook delivery.
      • Next Steps and Timing:  We understand what needs to happen to fix this, however, we'd like to have a better sense of timing of the fix to the first issue before we deploy this so that we don't generate an unnecessarily high rate of unexpected webhooks to customer endpoints.

    This has been an incredibly complex issue to debug and we genuinely appreciate the frustration and how this has impacted your business.
     

    Sincerely,

    Your Rally Team