Just a Knowledgeshare and Lesson Learned.
When we put in the Linux Connector I put some automation in to alert if we get a flood of z/VM messages from servers.
We wanted to know if a server was going nuts. It monitors the LXCONCOUNT.
We recently had an issue where we were not getting any messages back from z/VM in the OPSLOG. We looked and the Linux Connector was running okay and it's log showed all the traffic / messages. We had some OPS/MVS automation fail because OPS/MVS did not get the messages from the Linux connector. We had no idea this was occurring until an alert was not issued and no Incident generated.
So, to help us 'be in the know'
I put some automation in place that will check the LXCONCOUNT periodically. (every 10 minutes)
If the count does 'not' increment we will get an alert for us to investigate. The code will also issue the F OPSS,RESTART(LXC) to get try and get the Linux Connector and OPS/MVS talking again. Of course we alert for this as well.
I also put in a message rule for 'OPS9500S'. If this occurs the rule will issue the 'F OPSS,RESTART(LXC)'
Example: OPS9500S ABEND X'00C78000' in LXCON cell pool delete OPINLX+X'000042C4
In the dump we got the 'Length Error' (see below). We are putting in the newer PTF 'RO93937' for this.
OPS9990I CA OPS/MVS ABEND 0C3000 detected at OPINLX+00002784
OPS9990I PSW at ABEND 078C20009DDF2F0C - data at 1DDF2F06 ===> 0004443
OPS9990I Last PRB PSW 078C20009DDF2F0C - data at 1DDF2F06 ===> 0004443
OPS9990I ABEND - caused by length error
OPS9990I Home:0020 Prime:0020 Sec:0020 ABEND Interrupt Code:0003
OPS9990I R0=00000010 PSA +0010 00FDEB70 00000000 7FFFF000 7FFFF00
OPS9990I R1=276FBA9A PRIMESTK7B99A 00520002 00100001 D4D560E9 D3E7D7C
OPS9990I R2=00000020 PSA +0020 7FFFF000 7FFFF000 7FFFF000 7FFFF00