DX NetOps

 View Only
Expand all | Collapse all

Fault Tolerant SS  OneClick Landscape Status NOT READY

  • 1.  Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 08, 2018 01:35 PM

    We had an issue with our FT SS setup we got resolved and see TCP 14002 listening on FT SS.  Now I see in OC Admin page the Landscape view shows the Secondary  status  NOT READY.  Revalidated the .hostrc and .locrc file setup on both hosts.  SSdb synchronization does work.  During failover of primary OneClick client does not show red border nor does it show yellow border.  Connection status icon shows green, not yellow nor read although OC client cannot ping any device.  So it looks like OC client in limbo. 

     

    See that OC does talk to FT SS when we shutdown the primary SS (spectroserver and processd down) to FT SS on 14004 but not connection initiated to 14002.

     

    Question is, what is required from OC so it will consider FT SS READY for failover in the Landscape view??

     



  • 2.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 08, 2018 05:37 PM

    Check if the hostname resolution between the OneClick and secondary SpectroSERVER is working fine, try to add them to the local host file and see if that helps 



  • 3.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 08, 2018 07:50 PM

    Looks more like a host resolution issue between the servers. Try pinging the servers and see if you are getting a response. If not, adding the host name to "hosts" file will help.



  • 4.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 09, 2018 09:20 AM

    Already had local hosts file on OC server and already has FT SS in it.  Resolution is good.  Wireshark showed OC server talking to FT SS on 14004 but never on 14002 so resolution must have worked for that communication at least from local hosts.



  • 5.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 09, 2018 12:57 AM

    Hi Jon,

     

    When you run $SPECROOT/SS/MapUpdate -view on both Primary and Secondary what do you see?

    Do you have OneClick Server hostname on both Primary and Secondary's .hostrc file?

     

    Regards,

    Widjaja.



  • 6.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 09, 2018 09:16 AM

    We did not have OC server in .hostrc file but I added OC server on both now.  Instead we had + in both .hostrc files.

    From Secondary FT SS

    > ./MapUpdate.exe -v
    Connecting to Location Server running at: ITGC2W000212 port 0xdaff

     

     

    Landscape   Service Type Name  Hostname                       Port   Precedence  Notes
    ----------  ------------------ ------------------------------ ------ ----------  ------------------
    0x500000    Landscape Default  itgc2w000145                   0xbeef 10          1M model mask
    0x500000    Landscape          itgc2w000145                   0xbeef 10          1M model mask
    0x500000    Landscape Default  itgc2w000212                   0xbeef 20          1M model mask
    0x500000    Landscape          itgc2w000212                   0xbeef 20          1M model mask
    0x500000    Events             itgc2w000212                   0xbafe 20          1M model mask
    0x500000    Events             itgc2w000145                   0xbafe 10          1M model mask



  • 7.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 12, 2018 08:58 AM

    Would you mind trying the MapUpdate command on the other server? Just to see if that works in both direction? To me, the "NOT READY" status is something preventing the communication between both SS, nothing related to OC.



  • 8.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 12, 2018 10:05 AM

    Primary SS

    ITGC2W000145%/d/CA_Spectrum/SS-Tools
    > ./MapUpdate.exe -v
    Connecting to Location Server running at: ITGC2W000145 port 0xdaff

     

    Landscape   Service Type Name  Hostname                       Port   Precedence  Notes
    ----------  ------------------ ------------------------------ ------ ----------  ------------------
    0x500000    Landscape Default  itgc2w000145                   0xbeef 10          1M model mask
    0x500000    Landscape          itgc2w000145                   0xbeef 10          1M model mask
    0x500000    Landscape Default  itgc2w000212                   0xbeef 20          1M model mask
    0x500000    Landscape          itgc2w000212                   0xbeef 20          1M model mask
    0x500000    Events             itgc2w000212                   0xbafe 20          1M model mask
    0x500000    Events             itgc2w000145                   0xbafe 10          1M model mask

     

    ITGC2W000145%/d/CA_Spectrum/SS-Tools

     

     

    Secondary Fault Tolerant Spectroserver

    ITGC2W000212%/d/CA_Spectrum/SS-Tools
    > ./MapUpdate.exe -v
    Connecting to Location Server running at: ITGC2W000212 port 0xdaff

     

    Landscape   Service Type Name  Hostname                       Port   Precedence  Notes
    ----------  ------------------ ------------------------------ ------ ----------  ------------------
    0x500000    Landscape Default  itgc2w000145                   0xbeef 10          1M model mask
    0x500000    Landscape          itgc2w000145                   0xbeef 10          1M model mask
    0x500000    Landscape Default  itgc2w000212                   0xbeef 20          1M model mask
    0x500000    Landscape          itgc2w000212                   0xbeef 20          1M model mask
    0x500000    Events             itgc2w000212                   0xbafe 20          1M model mask
    0x500000    Events             itgc2w000145                   0xbafe 10          1M model mask



  • 9.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 09, 2018 09:29 AM

    It's not case sensitive is it? DNS resolves whether upper or lower it seems.  Assume case in .hostrc does should not matter.



  • 10.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 09, 2018 09:44 AM

    Hiya,

    If the problem was occurring when a + sign was the only thing in the

    .hostrc file, then that itself is not the issue anyway.

    Cheers

     

    On 9 March 2018 at 14:30, JonV <communityadmin@communities-mail.ca.com>



  • 11.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 09, 2018 03:45 AM

    Do you have a firewall between both SS? Usually all ports are from OC to SS, but one is from SS to OC (14001/TCP).



  • 12.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 09, 2018 09:17 AM

    No firewall, thankfully, on same subnet.



  • 13.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 12, 2018 06:57 AM

    Hi JonV,

     

    The OneClick web server machine should be connected to both SpectroSERVERs. Here are the list of established connections:

     

    • Secondary SpectroSERVER is the stand-by server:

    OC Port <Random Port>  -------->  secondary SS Port 14002 (SpectroSERVER)

    OC Port <Random Port>  -------->  secondary SS Port 14003 (Archive Manager)

     

    • Primary SpectroSERVER is the active server:

    OC Port 14001  <--------  primary SS Port <Random Port>  (SpectroSERVER to push alarm to OC)

    OC Port <Random Port>  -------->  primary SS Port 14002 (SpectroSERVER)

    OC Port <Random Port>  -------->  primary SS Port 14003 (Archive Manager)

    OC Port <Random Port>  -------->  primary SS Port 14004 (Location Server)

     

    From the OneClick web server machine, what happens if you run telnet?

    C:\> telnet itgc2w000212 14002

     

    Was the connection successful?

     

    Thanks,

    Silvio



  • 14.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 12, 2018 08:49 AM

    Hi Silvio,

     

    Yes, the secondary is listening on 14002, 14003. Here's a wireshark from the OC server.  Used Putty to generate a connection to 14002.  The port is responding to connection requests.

     

    What I really need is the step-by-step of how the OC determines that the secondary SS determines the secondary SS is status READY.  And the steps OC takes to determine to connect to the SS.

    The instructions in doc tells you how to configure but the actual steps the codes takes is not explained.  I think I need those steps so I can use wireshark to see what step is not working besides "Failover is not working" generally.  So I can connect to the port 14002 from OC server.  But what happens before that as a prereq or after that so I can validate whether those steps are occurring.  We just upgrade to 10.2.3 but same issue so it must be in setup some where, some step or steps is/are failing.

     



  • 15.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 12, 2018 09:03 AM

    Hi Jonv,

     

    Do you mind running this syntax on the OC host?

    C:\> telnet itgc2w000212 14002

     

    It is not clear the way you tested the connection using Putty.

     

    Thanks,

    Silvio



  • 16.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 12, 2018 09:58 AM

    Telnet is not something that is on my Windows server.   So I have Putty.exe client.  Simply use SSH option and change the port to 14002 from 22 then enter the hostname or IP that you want to connect to.  Same concept as telnet hostname port#.  Works the same.



  • 17.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 12, 2018 10:02 AM

    Agreed. If Putty works, that basically means the TCP connection is allowed.

    Can you also check the DNS resolution? What if you do a nslookup on the name? What if you do a nslookup on the IP address? (I know /etc/hosts is filled-in, but if a dns resolution is attempted, sometimes, that might play a role).



  • 18.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 13, 2018 10:04 AM

    All the nslookup by name and IP on primary SS, secondary SS, and OC server resolve correctly.  We do have etc hosts on each machine so I don't see any typos in them. 

     

    It would be nice to know exact steps that OC server is performing to determine secondary SS is "READY" versus "NOT READY".  So, example, every 120 seconds OC server does X, then Y, so I can figure out what debug to turn on or how to look at Wireshark to determine if each step is being done. 

     

    See OC server talking to TCP 1404 but never TCP1402 so if it needs to talk to TCP 1402 as part of counting the server ready then what steps is it doing before that that we can look at to see if they're failing? 



  • 19.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 12, 2018 10:03 AM

    But did you supply the hostname? Or you supplied the IP Address?



  • 20.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 12, 2018 05:35 PM

    used hostname for the Putty Test.

     

    Manager found this.  This is our issue.. how do we check #3 in this tech article on primary?

    Spectrum OneClick Administration -> Landscapes pag - CA Knowledge 

     

    This is from Secondary, but primary shows same.  Is this significant?

    \> ./HostUpdate.exe -v

     

    Landscape   Service Type Name  Hostname                       Port   Precedence  Notes
    ----------  ------------------ ------------------------------ ------ ----------  ------------------
    0           Process Daemon     itgc2w000145                   0xfeeb 10          1M model mask
    0           Process Daemon SS  itgc2w000145                   0xfeeb 10          1M model mask
    0           Process Daemon     itgc2w000212                   0xfeeb 10          1M model mask
    0           Process Daemon SS  itgc2w000212                   0xfeeb 10          1M model mask
    ITGC2W000212%/d/CA_Spectrum/SS-Tools



  • 21.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 12, 2018 05:53 PM

    Hi Jon,

     

    #3 is about the result of "MapUpdate -v" on MLS. I believe you got it right that Secondary SpectroSERVER is listed there. Another implication is on your OneClick Server, you have your MLS hostname set into locServerName parameter at the bottom Resource tag in the content of $SPECROOT/tomcat/webapps/spectrum/META-INF/context.xml file. Additionally on the same Resource tag I would also check if you have set adminUserName parameter value to the 'spectrum' install owner user correctly. 

    Oh yes, the 'spectrum' install owner user should be common to both Primary and Secondary SpectroSERVERs.

     

    Regards,

    Widjaja.



  • 22.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 13, 2018 09:58 AM


    Here are details of our setup plus info on context.xml.

     

    locServerName="itgc2acaspectrum2" backupLocServerName="itgc2w000212"

     

    The adminUserName is the Spectrum owner account which is correct.

     

    Other details

     

    All nslookup of hostnames resolve correctly.  Pinging all hostnames from command prompt resolve correctly.

     

    On primary and fault tolerant SS in the hosts there is plus sign as well as hostnames.

     

    Primary SS Host Security Button from Spectrum Control Panel

     

    +

     

    itgc2w000144

     

    itgc2w000145

     

    itgc2w000212

     

    Secondary SS Host Security Button from Spectrum Control Panel

     

    +

     

    itgc2w000144

     

    itgc2w000145

     

    itgc2w000212

     

    Primary SS - itgc2w000145  itgc2acaspectrum2 <--DNS hostnames itgc2caspectrum2 is host we installed SS with

     

    Secondary SS - itgc2w000212 <-- DNS hostname

     

    OC Server - itgc2w000144 itgc2acaspectrum1 <--DNS hostnames itgc2caspectrum1 is host we installed OC with

     

    Primary SS etc/hosts

     

    10.1.9.141 itgc2acaspectrum2 itgc2acaspectrum2.vfc.com itgc2w000145 itgc2w000145.vfc.com
    10.1.9.140 itgc2acaspectrum1 itgc2acaspectrum1.vfc.com itgc2w000144 itgc2w000144.vfc.com
    10.1.9.60 itgc2w000212 itgc2w000212.vfc.com

     

    Secondary etc/hosts

     

    10.1.9.141 itgc2acaspectrum2 itgc2acaspectrum2.vfc.com itgc2w000145 itgc2w000145.vfc.com
    10.1.9.140 itgc2acaspectrum1 itgc2acaspectrum1.vfc.com itgc2w000144 itgc2w000144.vfc.com
    10.1.9.60 itgc2w000212 itgc2w000212.vfc.com

     

    OC Server etc/hosts

     

    10.1.9.140    itgc2w000144 itgc2acaspectrum1    itgc2acaspectrum1.vfc.com itgc2w000144.vfc.com
    10.1.9.141    itgc2acaspectrum2    itgc2acaspectrum2.vfc.com itgc2w000145 itgc2w000145.vfc.com
    10.1.9.16    itgc2acacabi        itgc2acacabi.vfc.com
    10.1.9.60    itgc2w000212 itgc2w000212.vfc.com

     

    Primary SS .locrc

     

    LOC_SERVER_SOCKET_NUMBER=0xdaff
    MAIN_LOCATION_SOCKET_NUMBER=0xdaff
    MAIN_LOCATION_HOST_NAME=itgc2w000145

     

    Secondary SS .locrc

     

    LOC_SERVER_SOCKET_NUMBER=0xdaff
    MAIN_LOCATION_SOCKET_NUMBER=0xdaff
    MAIN_LOCATION_HOST_NAME=itgc2w000145

     

    Primary SS .locregfile

     

    LocationServiceHost=ITGC2W000145
    LocationServicePort=0xdaff
    ApplicationLocServiceHost=
    ApplicationLocServicePort=
    BACKUP_MAIN_LOCATION_HOST_NAME=ITGC2W000212  <--- CA Chat case support instructed me to put FT SS here
    BACKUP_MAIN_LOCATION_SOCKET_NUMBER=0xdaff

     

    Secondary SS .locregfile

     

    LocationServiceHost=ITGC2W000212
    LocationServicePort=0xdaff
    ApplicationLocServiceHost=
    ApplicationLocServicePort=
    BACKUP_MAIN_LOCATION_HOST_NAME=
    BACKUP_MAIN_LOCATION_SOCKET_NUMBER=



  • 23.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 13, 2018 10:06 AM

    Maybe Todd_Kornely could give a detailed overview of that process?



  • 24.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 13, 2018 12:49 PM

    Does the Secondary SpectroSERVER have the .vnmrc setting secondary_polling=yes included so that it functions as a hot standby?

    If this is not the case I would suggest to open a support issue so that we can check further.

     

    Best regards,

    Glenn



  • 25.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 13, 2018 01:07 PM

    No, does it have to?  We are trying for warm standby. Doc says hot, warm, cold modes.  Already have a case open also for 2nd week now.

     

    wait_active=no on secondary SS but right now only 1 model is not activated but we see TCP 14002 listening.

     

    Just don't know the exact steps OC does to determine secondary status is READY so I can debug those specific steps, same question in case too.



  • 26.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 13, 2018 09:34 PM

    Just asking  because its not mentioned here , is port 14006 too open from OneClick to secondary SS?

     

    Did you try to take a look at the tomcat log file before and after switching to secondary SS to see if there are any errors written there? 

    Was a tomcat restart also tried ?



  • 27.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 14, 2018 10:38 AM

    I do see 14006 listening on the secondary SS.  We have restarted tomcat for numerous other reasons but NOT READY status doensn't change.  Good idea on the Tomcat log before and after I can do that test.  I have wireshark trace yesterday on OC Server, primary and secondary SS after restarting Tomcat to try to find related communication that would be OC Server checking the status of secondary SS but I don't know the exact steps to check so don't know what I'm looking for thus the question to support and the this thread on what exact steps OC server takes to check secondary SS status READY or NOT READY.

     

    One thing is our secondary SS has never ever went to 100% models activated.  On 10.2.1 it was one model that had voice that was our slowest model to activate.  It always activates on the primary.  Now on 10.2.3 there's always some different model still on secondary that doesn't activate completely but activation is always 100% primary.  We have wait_active=no on secondary to make 14002 go to listening state. 

     

    The NOT READY message is not because of this is it? 



  • 28.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Broadcom Employee
    Posted Mar 14, 2018 10:53 AM

    Its possible the 99% Activation is also contributing to the NOT READY state of the Secondary - there are other symptoms that occur if the models are not 100% activated, such as the VNM showing "initial" state and polling is paused until activation is complete. 

     

    If you are not sure which device is causing the activating delay, you could enter this line in the .vnmrc and restart the server, it should log model activation status in the VNM.OUT

     

    mdlact_debug=true

     

    ~Jay



  • 29.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY

    Posted Mar 14, 2018 01:43 PM

    Thanks Jay!  We do have mdlact_debug=true in both SS's.  It is very helpful to see if it's same model getting stuck and helpful in the past.  What's interesting is that we can stop the secondary SS and it always goes inactive does not hang on "waiting on model activates to complete".  We just change wait_active=yes and secondary SS went into RUNNING state from Starting. I do  see 14002 listening now and the other required ports listening. 

     

    VNM model on the primary always shows 100% models activated so not sure if VNM model tracks the secondary SS model activation.

     

    We never make it to 100% model activation on secondary SS.   However, primary SS is always 100% active. 

    Today this is what we see from the Spectrum stop/start on the secondary SS.

     

    It's basically a different model each time.  Wonder if this is just how it is and the VNM.out on secondary never shows 100%model activation.  Not sure about this. 

     

    Secondary SS

    Mar 14 13:35:35  Model activation status - 429639/429640 activated( 99% ), 1 problematic.
    currently processing 1 model activate triggers:
    Model 0x50ec18 "SWHQRTR2.vfc.com"/Rtr_Cisco - class CsIHCiscoIfConfig - 749 seconds



  • 30.  Re: Fault Tolerant SS  OneClick Landscape Status NOT READY
    Best Answer

    Posted Mar 15, 2018 08:47 AM

    We found the issue!  In the Spectrum admin page there is a "Spectrum Configuration" section.  In that section there is a field for "Main Location Server" name and "Backup Location Server" name.  The MLS field had a hostname that was of the OC Server not of the Spectroserver.  Hostname should have been itgc2acaspectrum2 but it was itgc2acaspectrum1 (OC Server itself).  So after correcting this hostname and bouncing Tomcat OC found the MLS I suppose and after that our status changed from NOT READY to READY.  Do not recall going into this "Spectrum Configuration" section before so guessing this may have been typo from our setup from CA consultant but hard to know.  At any rate failover works, everything works as it should now.  Honestly, cannot find anything in the Spectrum wiki doc on this "Spectrum Configurtion" page.