Layer7 API Management

Expand all | Collapse all

Production setup for Audit Sinks, Internal Audit Sink Policy, and Log Sinks

  • 1.  Production setup for Audit Sinks, Internal Audit Sink Policy, and Log Sinks

    Posted 11-30-2016 04:00 PM
    Hey guys,
    Our CA Gateway production 2-node cluster has a problem on Audit Sinks and the Internal Audit Database. We’re using Audit Details a lot to make sure we could debug and troubleshoot problems. That includes putting audit messages that are for debugging such as:
    • Raw requests
    • Raw responses
    • Message Transformation Results of an entire message
    • Printing out variables
    • Putting out messages that a folder assertion has been executed.

     

    Audit Details Examples

     

    We also have modified our Internal Audit Sink Policy to send us alerts via email. This provided us more convenience if services are failing throughout and a central policy to process them.

     

    Internal Audit Sink Policy Customizations

     

    Although our traffic isn’t that significant for that cluster (we’re only processing 2,000,000+ requests per day, that translates to 28 requests/sec on average, 14/sec for each node), we managed to fill up our Internal Audit Database and started failing requests. We tried doing a cron job to clean the internal audit database up and expanded the logical volume of the Internal database but that just degraded the performance of our gateway cluster and we were back on the same problem when traffic goes a little higher
    I know that the way to go is going thru the log sink (or Audit thru JMS or even log files) but we still want a central place to capture all audit events that is being emitted by the Standard (and even custom) Gateway Assertions and Policies. So in summary our requirements are:
    • Send all audit messages (that are emitted by assertions and policies) to a central location that we can effectively search and troubleshoot if there are issues. 
    • If also possible, effectively send our debug information along with the rest of the audit messages. 
    • If not we’re willing to put this on a log sink, although that would be an extra step for us to look out for and probably slow us down in finding the issue.
    • Basically no matter what implementation, the Internal Audit Database should not fill up the allocated logical space which causes the other functions of the Gateway to fail. Which effectively is a DoS.
    What is the recommended setup with that requirement? I’m sure other shops with bigger CA API Gateway clusters and huge amount of transactions would have the same issues.
    Thanks,
    Gian


  • 2.  Re: Production setup for Audit Sinks, Internal Audit Sink Policy, and Log Sinks

    Posted 11-30-2016 04:55 PM

    Gian,

     

    Good afternoon. When it comes to auditing the system, we recommend that auditing is done in a selective manner to reduce overhead on the gateway along with disk usage.

     

    1) We have seen several different models deployed based on the system available including:

    a) Local Database deploying the audit purge script for regular clean ups

    Pros: Does not require additional external components and keeps database contained

    Cons: Finite amount of disk space available and the audit purge can cause slow down. (Note: We are tracking a development incident to review this script for performance impact)

     

    b) Write to a local syslog with a syslog forwarder to push to a central syslog environment where triggers can be incorporated into based on log entries

    Pros: Central monitoring and triggering capabilities along with longer retention periods

    Cons: Requires additional third party components to implement and maintain.

     

    c) External Database using Audit Sink Policy

    Pros: Centralize Audit database so multiple environments can push to a single DB which can allow for longer retention periods

    Cons: Requires additional components to implement and maintain.

     

    d) JMS Server using the Audit Sink Policy

    Pros: Push any format using custom policy in the Audit Sink Policy to a JMS server where you do a one way sync. Back end system can pull from JMS queue to any type of holding environment.

    Cons: Requires additional third party components to implement and maintain.

     

    2) For the control of the audits being created:

    a) Add an Audit Assertion with the Record Audit event set to INFO at the end of the happy path through the policy to avoid writing for successful policy executions

    b) Ensure that you remove any extra auditing assertions not required in the policy prior to promoting to production.

    c) Include branches for if you wish to include debug turned on or look to use the Debug Tracing for the service on the Service Property window.

     

    3) To avoid the internal audit database from getting filled and stopping the processing:

    a) Ensure to deploy both the audit purge script to keep the number of audit down in the database and the manage_binlogs script to ensure that the replication logs are keep cleaned up on the hard disk.

    b) In version 9.1, we implemented a new feature that allows the gateway to keep processing even if the DB filespace is filled by setting the following:

    By default, the CA API Gateway stops processing messages when the database reaches a certain threshold. Now, you can specify that the Gateway stop writing audit messages once the threshold is reached but continue message processing.

     

    To enable the bypass, modify the audit.managementStrategy cluster property (Audit Cluster Properties - CA API Gateway - 9.1 - CA Technologies Documentation ) in Audit Cluster Properties.

     

    Specify how the Gateway should respond when the database exceeds the threshold defined in the audit.archivershutdownthreshold cluster property:

        STOP: Gateway stops processing requests and terminates audit logging.
        BYPASS: Gateway continues processing requests but terminates audit logging. Internal Gateway logging continues, with a SEVERE-level message that audit logging has stopped.

    Default: STOP

    Note: The value is case sensitive.

     

    Sincerely,

     

    Stephen Hughes

    Director, CA Support