DX NetOps

 View Only
  • 1.  Please read!  Too Many ROS file errors

    Broadcom Employee
    Posted Mar 02, 2018 03:21 PM

    Hello User Community:

     

    Recently, a defect was uncovered that affects versions 3.1 through 3.5 of the Data Aggregator, and can lead to 'Too many ROS file' errors being returned to the DA from Vertica for customers at scale. This can cause corruption of configuration data. It’s also possible that it could lead to data loss in smaller metric families. The fix for this will be included in the March update kits for 3.1 3.2 and 3.5. However, there is a workaround to defend customers against the potential problems it can cause, that should be applied ASAP.

    During DA installation, the Vertica configuration parameters 'ReflexiveMoveout', 'MoveOutSizePct', and 'MoveOutMaxAgeTime' are set. The settings for the first two parameters are ok, but the installer sets 'MoveOutMaxAgeTime' to 0. The intention was to use the 80% MoveOutSizePct setting to have MoveOut start when we approach full, and disable MoveOut by age time, but it actually results in Vertica writing Write Optimized Store (WOS, in-memory) to Read Optimized Store (ROS, on-disk) almost instantly. At scale, this can lead to hundreds of ROS files when applying monitoring profiles to large collections, as well as other high impact configuration changes.

    Workaround:

    1. After a DA install, open a shell on one of the DR nodes.
    2. At the command prompt, execute: /opt/vertica/bin/vsql -U {dradmin_username}
    3. Enter the password
    4. At the vsql prompt, execute: select set_config_parameter('MoveOutMaxAgeTime', 240);
    5. Exit the vsql client and the shell.

    That's it. This change can be applied while the DA is running. There is no need to restart anything, and the change takes effect immediately.

     

    If you have questions or concerns about running the commands noted above, please do not move forward.  Open an issue with support and include this community posting and we can certainly assist you

     

    Thanks,

    Joe



  • 2.  Re: Please read!  Too Many ROS file errors

    Broadcom Employee
    Posted Jun 05, 2018 01:19 PM

    Joe,

     

    Was there a particular error here? Was it the standard too many ROS container error in the DA's karaf.log that showed the problem was present?

     

    Thanks,

    Mike