Yes looking like storage port is not able to handle frames from the servers.
What are the port speed of the servers and storage?
This might be a Q depth issue, an issue on storage itself or a wrong optimization on it.
Troubleshooting class 3 discards, and device latency issues is a large discussion and can include many aspects of a SAN. A slow draining device can cause this, a poor performing HBA, or a fan ratio issue can contribute, and ISL oversubscription can also affect latency. Please note that port 18 in the report is the port which is BEING affected, and is not likely the port which is the cause of the problem.
First things to do: run the commands: statsclear; slotstatsclear on each switch in the fabric. After 24-48 hours run porterrshow on each switch in the fabric. Review those outputs for errors, and advise what you find. Also provide your output of firmwareshow, fabricshow, so that we know what kind of equipment we are looking at.
I would advise you to run the SAN Health report, and gather some info on your connections, and througput. Without a complete picture of the fabric, and all F and E port connections it will be impossible to diagnose. There are also records in the log file which may be useful in determining what ports are causing latency within the fabric.
You will get a report emailed to you showing the switch connections, and many attributes of the fabric. Once that report is checked, we can proceed with some options. As a first guess, try to find legacy devices which may be running at 4GB and are traversing the fabric via ISL(E_ports). This is a common issue causing class 3 discards, but it is only one possible issue. There are many other things which affect throughput and congestion.
Still the errors are popping up as follows:-
F-Port 18, Condition=ALL_PORTS(DEV_LATENCY_IMPACT==IO_FRAME_LOSS), Current Value:[ DEV_LATENCY_IMPACT,IO_FRAME_LOSS, (1408 C3TX Timeouts) ], RuleName=defALL_PORTS_IO_FRAME_LOSS, Dashboard Category=Fabric Performance Impact.
Port error shows disc c3 and c3timeout tx value is 154.5k
admin> portstatsshow 18stat_wtx 10212000744426 4-byte words transmittedstat_wrx 63486130803777 4-byte words receivedstat_ftx 1681883768 Frames transmittedstat_frx 1658633119 Frames receivedstat_c2_frx 0 Class 2 frames receivedstat_c3_frx 1658728827 Class 3 frames receivedstat_lc_rx 0 Link control frames receivedstat_mc_rx 0 Multicast frames receivedstat_mc_to 0 Multicast timeoutsstat_mc_tx 0 Multicast frames transmittedtim_rdy_pri 0 Time R_RDY high prioritytim_txcrd_z 146519310 Time TX Credit Zero (2.5Us ticks)tim_txcrd_z_vc 0- 3: 0 0 0 0tim_txcrd_z_vc 4- 7: 146519310 0 0 0tim_txcrd_z_vc 8-11: 0 0 0 0tim_txcrd_z_vc 12-15: 0 0 0 0tim_latency_vc 0- 3: 1 1 1 1tim_latency_vc 4- 7: 1 1 1 1tim_latency_vc 8-11: 1 1 1 1tim_latency_vc 12-15: 1 1 1 1
fec_cor_detected 0 Count of blocks that were corrected by FECfec_uncor_detected 0 Count of blocks that were left uncorrected by FECer_enc_in 0 Encoding errors inside of frameser_crc 0 Frames with CRC errorser_trunc 0 Frames shorter than minimumer_toolong 0 Frames longer than maximumer_bad_eof 0 Frames with bad end-of-frameer_enc_out 0 Encoding error outside of frameser_bad_os 0 Invalid ordered seter_pcs_blk 0 PCS block errorser_rx_c3_timeout 0 Class 3 receive frames discarded due to timeouter_tx_c3_timeout 154586 Class 3 transmit frames discarded due to timeouter_unroutable 0 Frames that are unroutableer_unreachable 0 Frames with unreachable destinationer_other_discard 0 Other discardser_type1_miss 0 frames with FTB type 1 misser_type2_miss 0 frames with FTB type 2 misser_type6_miss 0 frames with FTB type 6 misser_zone_miss 0 frames with hard zoning misser_lun_zone_miss 0 frames with LUN zoning misser_crc_good_eof 0 Crc error with good eofer_inv_arb 0 Invalid ARBer_single_credit_loss 0 Single vcrdy/frame loss on linker_multi_credit_loss 0 Multiple vcrdy/frame loss on linkphy_stats_clear_ts 07-06-2018 IST Fri 16:39:55 Timestamp of phy_port stats clearlgc_stats_clear_ts 07-06-2018 IST Fri 16:39:55 Timestamp of lgc_port stats clear
Can it be a FC cable fault?
Hi,If the cable is wrong you will get enc_out errors.If you have 750MB/s on storage port, it looks like that the FE port is overloaded ( I never see greater value that 750 on any 8Gb FC port). Try to remap some huge servers to another pair of storage ports to reduce pressure on storage port.I had an similar issue with cluster where the huge database utilised 2 pair of FE ports (both at 750MB ), so the timeouts occured.
We have solved it with another pair of HBA and different less utiized FE ports.
There's also this:
tim_txcrd_z 146519310 Time TX Credit Zero (2.5Us ticks)
You might have a slow drain device in play.