DX NetOps

 View Only
Expand all | Collapse all

Does anyone have a solution for monitoring UNIX/Linux systems, and detecting when a file system or disk goes into Read-Only mode ?

  • 1.  Does anyone have a solution for monitoring UNIX/Linux systems, and detecting when a file system or disk goes into Read-Only mode ?

    Posted Aug 30, 2018 02:34 PM

    We are running int a situation , on UNIX and Linux systems, where a file system and/or a disk goes into Read-Only mode.

    At this point, most system/application functionality will stop because the system can no longer write to a file, for example. Monitoring for this situation difficult. Once the systems gets into R/O mode, most often we cannot log in to the system in order to check anything, such as log outputs , to detect this R/O situation.

     

    Does anyone have any ideas for a solution ?

     

    Thanks.

     

    David



  • 2.  Re: Does anyone have a solution for monitoring UNIX/Linux systems, and detecting when a file system or disk goes into Read-Only mode ?

    Broadcom Employee
    Posted Aug 31, 2018 04:57 PM

    Hi David,

     

    Are there any clues in data from SNMP Polling that might provide an indication the problem is occurring?

     

    When the problem is active, is it normally application failure due to inability to write to the disk that reveals the problems existence?

     

    Thanks,

    Mike



  • 3.  Re: Does anyone have a solution for monitoring UNIX/Linux systems, and detecting when a file system or disk goes into Read-Only mode ?
    Best Answer

    Posted Aug 31, 2018 05:11 PM

    Hey Mike,

     

    As far as I can tell, looking at the data in the CAPC Dashboard , it appears that we keep collecting data . So, I don’t know if anything in CAPC will really help, or help indicate this problem , but I thought I would ask.

     

    F. Y. I. , I sent this same question to the UIM Community.

     

    To your second part, yes, one way is the application people call because their app is not functioning, because it can no longer write to the disk(s). Another indication is that when you try to ssh to the system, it will not let you access the device. Again, I believe, this is because the system cannot write to various log files.

     

     

    Thanks.