Turn on suggestions
![]() Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.
Showing results for
|
08-22-2013 11:44 PM
Hello All,
I have 2 fabrics (A&B) consist of 1 unit of DCX as core and 2 units of 12K as edge switch. These switches are registered in DCFM. DCX running on FOS v6.3.2d
yesterday I was running command sysmonitor --show cpu at DCX Fabric A to know current CPU utilization (this is the 1st time I run this command). then the result it beyond my expectation because the CPU usage are 100%. this is log that i captured:
admin> sysmonitor --show cpu
Showing Cpu Usage:
Cpu Usage : 100%
Cpu Usage limit : 75%
Number of Retries : 3
Polling Interval : 120 seconds
Actions : none
at the other fabric dcx cpu util just 6%
There are no error message at errdump. So i am so worried of my DCX status, Is it just a defect of the FOS or this is the real CPU usage??
DCX spec:
20disc port, 9 host port, available 67ports (total port 96 ports)
note: this fabric located at DRC site, so the IO is not so big..
i also capture the top command via root
root> top
top - 15:18:17 up 112 days, 23:38, 1 user, load average: 2.82, 2.51, 2.30
Tasks: 110 total, 3 running, 107 sleeping, 0 stopped, 0 zombie
Cpu(s): 42.0%us, 57.3%sy, 0.3%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 1863188k total, 1140520k used, 722668k free, 33344k buffers
Swap: 0k total, 0k used, 0k free, 786368k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3113 root 25 0 28108 4016 3376 R 94.8 0.2 11225:22 tracestore
4364 root 16 0 139m 84m 80m S 2.3 4.7 2891:40 iswitchd
2859 root 16 0 72688 4492 3300 S 1.7 0.2 2595:11 emd
4358 root 33 18 64732 7532 3572 S 1.0 0.4 1165:59 fwd
1 root 16 0 1696 592 524 S 0.0 0.0 0:31.96 init
2 root 34 19 0 0 0 S 0.0 0.0 0:03.51 ksoftirqd/0
3 root 10 -5 0 0 0 S 0.0 0.0 0:00.03 events/0
4 root 19 -5 0 0 0 S 0.0 0.0 0:00.02 khelper
5 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
29 root 10 -5 0 0 0 S 0.0 0.0 0:00.17 kblockd/0
62 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
63 root 15 0 0 0 0 S 0.0 0.0 0:00.51 pdflush
65 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
64 root 25 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0
756 root 15 0 0 0 0 S 0.0 0.0 0:07.14 kjournald
774 root RT 0 1676 400 336 S 0.0 0.0 0:00.02 wdtd
835 root 15 0 0 0 0 S 0.0 0.0 0:02.36 kjournald
991 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 eth2/0
1002 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 eth1/0
1023 root 10 -5 0 0 0 S 0.0 0.0 0:00.01 eth0/0
1025 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 eth3/0
1038 bin 16 0 1688 428 336 S 0.0 0.0 0:00.33 portmap
1058 root 16 0 2116 652 508 S 0.0 0.0 0:00.02 inetd
1063 root 15 0 0 0 0 S 0.0 0.0 0:00.00 nfsd
1064 root 15 0 0 0 0 S 0.0 0.0 0:00.00 nfsd
1065 root 15 0 0 0 0 S 0.0 0.0 0:00.00 nfsd
1066 root 23 0 0 0 0 S 0.0 0.0 0:00.00 lockd
1067 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 rpciod/0
1068 root 15 0 0 0 0 S 0.0 0.0 0:00.01 nfsd
1070 root 16 0 2336 556 420 S 0.0 0.0 0:00.01 rpc.mountd
1084 root 25 0 2552 1088 916 S 0.0 0.1 0:55.60 kmsghandler
1098 root 16 0 1700 376 304 S 0.0 0.0 0:11.76 klogd
1099 root 15 0 1808 620 528 S 0.0 0.0 0:04.04 crond
1106 root 16 0 1944 680 532 S 0.0 0.0 0:07.81 syslogd
1128 root 15 0 0 0 0 S 0.0 0.0 0:00.11 RASLOGK_TH
1926 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kwt_nb_thread
2200 root 19 0 0 0 0 S 0.0 0.0 0:00.00 module-182-th
2208 root 15 0 0 0 0 S 0.0 0.0 10:53.57 module-99-th
2230 root 19 0 0 0 0 S 0.0 0.0 0:00.01 module-107-th
08-23-2013 01:50 AM
Hi there,
That looks like DEFECT000363516 to me. It seems to be fixed in FOS 7.0.1
Rgds
08-23-2013 02:16 AM
Oh i see.. but is there any different action beside upgrading to FOS v7.0.1 because the DCX is connected to 12K SAN Switch which prohibited to direct connection with DCX FOS v7.0
08-23-2013 04:49 AM
you could try to failover to the Standby CP.
Rgds
08-23-2013 07:24 AM
azakiyy,
->but is there any different action beside upgrading to FOS v7.0.1 because the DCX is connected to 12K SAN Switch which prohibited to direct connection with DCX FOS v7.0
This is correct, but if you want to continued to work with 12K then you can implement FCR on DCX, trough Integrated Routing, and upgrade to latest FOS 7.1.x release.
Keep in mind IR is optional License.
08-23-2013 08:59 AM
i should get permission 1st from my customer.. i'll inform later
08-23-2013 09:05 AM
I'm afraid i cant do the FOS upgrade for DCX coz I should stick with this topology (core edge)
Is it safe to kill the most consume service (PID 3113/tracestore)?
Is it possible to use command "kill -9 (PID)"?
anyone know what tracestore stand for??
08-24-2013 01:30 AM
If your active CP is at 100%.
Then try the ha failover as suggested by felipon.
If your now passive CP (assuming the fail over was successful) still is at 100% CPU, you can also reboot that CP.
08-28-2013 09:48 PM
Thanks all for ur reply,
i have escalated the problem to support and they replied with same recommendations.
1. try to hafailover the active CP (Check the CPU utit)
2. if the DCX still has 100% CPU then try to hafailover again (My problem fix in this step)
3. use kill -9 pid 3113 (3113 is the PID of tracestore)
note: hafailover is disruptive, do it in less IO time
08-29-2013 05:54 AM
Great that you've got support telling the almost same thing.
However I don't believe hafailover is disruptive to IO traffic.