For more details, please see ourCookie Policy.


Fibre Channel (SAN)

Reply
Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

It could be an answer.  But but it should happen on every type of host and OS. In stead it happens only on solaris os.

Right yesteday, another port had a problem, it went in marginal state configuring then the switch in marginal state.

I've tried to persistent disable that port, then i reenabled it. for the first 10 seconds that port went online and healthy, but after it returned in magrinal state , also the switch.  RIGHT IN THAT MOMENT a storm of scsi timeout on solaris OS hosts appears, NOT on HP-UX hosts, windows host and linux red-hat hosts, and that storm of scsi timeout appears every time the switch try to check/reenable_in_healthy_mode  that port.

Just on solaris OS  as if it were "more sensitive" than other OS when a single port/sfp fails.

Probably Solaris OS admins have to set in their /etc/system  different values than default of these parameters:

max throttle

ssd io time

ssd ua retry count

others scsi/fp parameters i don't remember now.

I also have to substitute those SFPs in my switch, of course

Anyone has another idea/opinion?

Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

I don't know about Solaris and can't tell you if it's more sensetive or not, but it would be intresting to compare different OSses in your environment.

A general rule of thumb I tend to follow >> SCSI timeout value > 60 seconds

Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

Setting the SCSI timeouts properly on the hosts is the first thing to do indeed.

Yet if you did not have this problem before then something must have changed (i.e. either the load, a component failure or the like).

The port you're referring to appears to be part of a trunk, could you provide 'trunkShow' output (check the deskew value). I would also monitor fairly short 'portStatsShow ; portErrShow' samples for abnormal amount of tx_crd_z and disc_c3 (frame discards).

Highlighted
Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

the problem is:

An old HBA FCODE on solaris host.

The solution is upgdare HBA FCODE (my hbas are qlogic reformatted by Sun with Stack leadville) to 2.01 or higher

Scsi timeout are vanished

Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

Hi,

To me this looks like you have some credit starvation somewhere in the fabric. The solaris message is a timeout message, so what is timing out... somwhere in the path in the fabric, it is timing out.

Advise to take the standard troubleshoot approach. Clear the stats on ALL switches in the fabric with statsclear and slotstatsclear and collect supportsaves from ALL switches in the fabric after 4 to 5 hours. And check for problems wiht regards credit starvation/ discards, sfp problems. etc. Then go from there, becasue this might go on, for months.

Cheers,

Ed

Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

Hi Ed,

thanks for your answer but i solved in this mode:

An old HBA FCODE on solaris host.

The solution is upgdare HBA FCODE (my hbas are qlogic reformatted by Sun with Stack leadville) to 2.01 or higher

Scsi timeout are vanished

Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

as far as i know Linux default is 60 seconds  and hp-ux default is typically 90 seconds ( may vary depends on array vendor?

Anonymous
Posts: 0

Re: ISL port problem and fabric connection problem

So if then the problem is solved?

Anyway I suggest you to check those HBA'a and their versions on brocade compatability guide - because this kind of messages can indicate another problems like bad disk performance (storage array is overloaded).

Does the server access the storage through the ISL's?

Join the Broadcom Support Community

Get quick and easy access to valuable resources across the Broadcom Community Network.