Brocade Fibre Channel Networking Community

Expand all | Collapse all

Problem understanding NTP with logical switches

Jump to Best Answer
  • 1.  Problem understanding NTP with logical switches

    Posted 08-24-2016 02:03 AM

    Hello,

     

    I read the SAN admin guide, command reference guide, some posts on the forum, but have difficulties to understand how to configure NTP on logical switches.

     

     

    This is my SAN architecture.

     

    Two identical SAN networks which have no connection between them. Each of them are composed of:

    - 2 DCX, the first on site 1, the second on site 2 (an ISL trunk between them)

    - a dozen of SAN switches connected on the first DCX on site 1

    - a dozen of SAN switches connected on the second DCX on site 2

     

    On each switch, and DCX too, there is the default logical switch (context 128) and another logical switch (context 10).

     

    It will evolve, but now, all ports are on the logical switch 10 (FID10 = context 10).

     

    So each swith in FID128 consider itself alone in the fabric and consequently, fabric principal.

    Do I have to configure tsclockserver on my NTP server IP (and NOT LOCL) on all switches in FID128? And tstimezone?

     

    In FID10, I am not sure of the better solution:

    - only the actual principal switch with tsclockserver on the NTP server IP and the others on LOCL (what happen if this principal switch is down?). tstimezone on witch switch?

    - only the DCX (two on each site) with 0x01 priority, do I have to enter the NTP server IP address on both? And the other switches with 0xFF (never principal switch) with tsclockserver LOCL. Where do I have to configure tstimezone? On all switches, or only those where the NTP server IP address has been configured?

    - other solution?

     

    I am not sure of what is the best practice.

     

    Thanks for your answer.


    #BrocadeFibreChannelNetworkingCommunity


  • 2.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-24-2016 01:35 PM

    Short answer: Yes, you do.

     

    Longer answer can be found in the Admin guide, page 27:

     

    All switches in the fabric maintain the current clock server value in nonvolatile memory. By default,
    this value is the local clock server (LOCL) of the principal or primary FCS switch. Changes to the
    clock server value on the principal or primary FCS switch are propagated to all switches in the
    fabric.
    In a Virtual Fabric, all the switches in the fabric must have the same NTP clock server configured.
    This includes any Fabric OS v6.2.0 or earlier switches in the fabric. This ensures that time does not
    go out of sync in the logical fabric. It is not recommended to have LOCL in the server list.
    When a new switch enters the fabric, the time server daemon of the principal or primary FCS switch
    sends out the addresses of all existing clock servers and the time to the new switch. When a switch
    with Fabric OS v6.1.0 or later enters the fabric, it stores the list and the active servers.

    NOTE
    In a Virtual Fabric, multiple logical switches can share a single chassis. Therefore, the NTP server
    list must be the same across all fabrics.

     

    You have Virtual Fabric(logical switch FID 10) configured. Ergo, you must populate the tsclockserver command in each of your logical switches by setcontext command to that logical switch and run the command with the EXACT SAME IP or DNS name entries as the fabric principal or FCS switch.

     

    Example: tsclockserver "chronos.cru.fr; canon.inria.fr; 192.93.2.20"

     

    Note the literal quotes, and note the semicolon separators. Do this for ALL switches in the fabric. Yes, you must set your time zone too!

     

    Best of luck,


    #BrocadeFibreChannelNetworkingCommunity


  • 3.  Re: Problem understanding NTP with logical switches

    Posted 08-25-2016 12:58 AM

    Hi doc, :smileyhappy:

     

    Thank you for your answer.

     

    I read everything about date, tsclockserver, NTP and principal switch in the admin guide, also what you write (page 74 of FOS admin guide 7.2.0).

     

    In date:

    In a Virtual Fabric, there can be a maximum of eight logical switches per Backbone. Only the
    default switch in the chassis can update the hardware clock.

     

    And in other places some explanations not clear about how NTP is managed in the Brocade SAN switches.

     

     

    To resume (I know well how work the tsclockserver command):

     

    - the addresses of my NTP servers in all context (default and 10) of all switches, so NO LOCL

     

    - the tstimezone in all context of all switches

    PROBLEM: when I enter this value, it displays "System Time Zone change will take effect at next reboot", but I must avoid interruption of service, so I can't reboot... And I do not understand why this require a reboot? So it will never be applied?

     

    Another problem, a few switches of the same FC networks, in a particular perimeter, can't have access to the NTP server, and it will not change due to security reasons. What about their date and time? Will it be updated through the fabric with LOCL parameter? And I suppose I must also configure the same tstimezone.

     

    Thanks again for your detailed explanation, really appreciated. :smileyhappy:

     

    Regards,

    Ludo


    #BrocadeFibreChannelNetworkingCommunity


  • 4.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 01:15 AM

    Hi,

     

    you can ignore the message about the next reboot - the change will applied without reboot. Notice that tstimezone is switch property compared to tsclockserver (which updated through the fabric when the command is input).

     

    The FC switch at the perimeter of your nextwork without network access to the NTP server will be updated with the correct time.

    The way time is updated in the switches, is that the current principal switch (or FCS switch) in each fabric will query the configured

    ntp servers in sequence every 64 seconds. And then the principal switch will distribute the time in band in the fabric (common transport FC-CT) to all other switches in the fabric.  If the principal switch is the defailt switch, it will update the drift file / RTC as well.

    So, ensure that your configured principal switches (you have configured one of the central / new switches as 'backup' principal switch, too) are able to reach the list of ntp servers.

     

    The tsclockserver is distributed as doc mentionned to all switches in a fabric when the command is executed and is needed if the principal switches goes down or the fabric is segment, then the new principal switch can continue distribute the current time inband in the fabric.


    #BrocadeFibreChannelNetworkingCommunity


  • 5.  Re: Problem understanding NTP with logical switches

    Posted 08-25-2016 01:55 AM

    Hi Martin,

     

    Thank you for your also detailed answer.


    @Martin.Sjölin wrote:
    you can ignore the message about the next reboot - the change will applied without reboot. Notice that tstimezone is switch property compared to tsclockserver (which updated through the fabric when the command is input).

    Very clear, thank you.


    @Martin.Sjölin wrote:
    The FC switch at the perimeter of your nextwork without network access to the NTP server will be updated with the correct time.

    Crystal clear, thanks again. I will configure these switches with tsclockserver LOCL and the correct tstimezone.


    @Martin.Sjölin wrote:
    The way time is updated in the switches, is that the current principal switch (or FCS switch) in each fabric will query the configured ntp servers in sequence every 64 seconds. And then the principal switch will distribute the time in band in the fabric (common transport FC-CT) to all other switches in the fabric.  If the principal switch is the defailt switch, it will update the drift file / RTC as well.

    So, ensure that your configured principal switches (you have configured one of the central / new switches as 'backup' principal switch, too) are able to reach the list of ntp servers.


    It's what I understood at first, thank you for the explanation.

    If I understand well your message, and it is what I thought before, only some switches (at least two, for a principal and a backup), in EACH network, and EACH fabric/context (default logical switch 128 and logical switch 10 here) must be configured with the tsclockserver on a valid and reachable NTP server. The others in LOCL.


    @Doc Cherwink wrote:
    You have Virtual Fabric(logical switch FID 10) configured. Ergo, you must populate the tsclockserver command in each of your logical switches by setcontext command to that logical switch and run the command with the EXACT SAME IP or DNS name entries as the fabric principal or FCS switch.

     

    Example: tsclockserver "chronos.cru.fr; canon.inria.fr; 192.93.2.20"


    I think I misunderstood doc's message (I am not fluent in english), because I thought every switches must be configured with the IP addresses of the NTP servers.


    @Martin.Sjölin wrote:

    The tsclockserver is distributed as doc mentionned to all switches in a fabric when the command is executed and is needed if the principal switches goes down or the fabric is segment, then the new principal switch can continue distribute the current time inband in the fabric.


    So is it a correct solution to give the highest priority to my "master/backup" switches in each network, in each logical switch/context (128 and 10), with the 0x01 value (and with the NTP server addresses), and 0xFF to the others (and with LOCL)?

     

    Thank you again.

     

    Regards,

    Ludo

     


    #BrocadeFibreChannelNetworkingCommunity


  • 6.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 02:18 AM

    Hi Ludo,

     

    normally we configured all switch with the ntp server(s) via tsclockserver CLI. In fact entering the tsclockserver command will distribute the list of NTP servers to all switch the fabric. And in case of segementation or other issues in the  FC network, all switches have a possible clock source. So, please ensure that tsclockservers is set to the NTP list on all switches, even though the ntp server are only queried from a principal switch, or not even reach from the perimeter switches

     

    Quote from the CMD guide:

     

    "All switches in the fabric maintain the current clock server IP address in nonvolatile memory. By default, this value is LOCL., that is, the local clock of the Principal or the Primary FCS switch is the default clock server. Changes to the clock server IP addresses on the Principal or Primary FCS switch are propagated to all switches in the fabric. "

     

    regards


    #BrocadeFibreChannelNetworkingCommunity


  • 7.  Re: Problem understanding NTP with logical switches

    Posted 08-25-2016 05:52 AM

    Hi Martin,


    @Martin.Sjölin wrote:

    normally we configured all switch with the ntp server(s) via tsclockserver CLI. In fact entering the tsclockserver command will distribute the list of NTP servers to all switch the fabric. And in case of segementation or other issues in the  FC network, all switches have a possible clock source. So, please ensure that tsclockservers is set to the NTP list on all switches, even though the ntp server are only queried from a principal switch, or not even reach from the perimeter switches

     

    Quote from the CMD guide:

     

    "All switches in the fabric maintain the current clock server IP address in nonvolatile memory. By default, this value is LOCL., that is, the local clock of the Principal or the Primary FCS switch is the default clock server. Changes to the clock server IP addresses on the Principal or Primary FCS switch are propagated to all switches in the fabric. "

     

    regards


    Ok, thank you. Finally, only tsclockserver with my NTP servers IP addresses and tstimezone on all switches, all logical switches, even when some switches can't reach it.

    Clear and simple, good! :smileyhappy:

     

     

    Hi Alexey,


    @alexey.stepanov wrote:

    I don't see why you would not want to setup correct tsclockserver and tstimezone in all the switches (I mean all physical and all logical as well) in your fabric(s). It is not so difficult especially if you can use some ssh scripting. But this way you will make sure that all the clocks are in sync, and this may save you a lot of time when troubleshooting.


    I never said I would not want to setup tsclockserver and tstimezone on all switches. I am used to the FOS CLI and SSH.

    I said the documentation is not clear about the tsclockserver configuration.

     

    It says a switch (principal) update date and time with the NTP server, but not how the other switches must be configured.

     

    Logically (for me), a switch with an NTP server IP address will update date and time itself, and with the principal switch in the fabric when configured otherwise (LOCL). But it appears it is not working like that.

    Likewise, it is logical for me to not configure an NTP server IP address on an equipment who can't reach this server. But again, it appears it is not working like that.

    And I need to understand these points to know how to configure priorities to limit IP addresses who need an access to the NTP servers.

     

    So my questions here are not about what I want or what I don't want to do, but how it works to configure every equipment and all logical switches properly.


    @alexey.stepanov wrote:

    I have seen so many weird SAN issues in my life, and some of them even involved the abandoned default 128 FID having no SAN connections at all. So yes, it is essential that the default unused FID also has correct clock, also because it updates the hardware clock of the chassis.

     

    If your LAN isolated switches will become SAN segmented as well, and therefore loose the SAN distributed clock from principal switches, they will only show a message like "NTP server is not reachable, using LOCL", but will stay with the good clock values until the SAN merges back, I hope that hardware clock is not too much wrong in the modern world devices.

     

    A note about reboot after setting the tstimezone. As you know, FOS is based on Linux. Therefore TZ variable is set in the environment of all the processes. When you change the system default TZ, there is no way to change the TZ of the already running processes (unless if some of them can handle some customized signal that makes them reread the configuration or something like that, but I'm not sure that this is implemented in any of the FOS processes/daemons). So, the only way to switch to the new TZ value is to restart the process. But most of the FOS processes are not restartable. You might have seen the cases where a sudden death of a process is handled by the restart of the entire CP. Therefore the restart of the CP is essentially required to completely switch to the new TZ setting after use of tstimezone command. BUT! You don't have to restart the entire switch to restart the CP. In the DCX-like switches, you can do a dual hafailover to make sure that both CPs are restarted with new TZ. In the smaller switches, you can do hareboot with the same effect.


    Thank you very much for all these really interesting and useful explanations!

     

    I agree with you with the correct clock on the default FID, it is also why I am here trying to understand how it works on Brocade switches. I want to minimize the risk of "weird issues"...

     

    Yes, I have a lot of these messages "NTP server is not reachable, using LOCL" on the LAN isolated switches. It is why I would like to use LOCL, to get rid of them on logs. Knowing they will never have access to these NTP server addresses (if someday they have access on a NTP server, it will be another one), what benefit to configure an unreachable NTP server address in place of LOCL? Does LOCL not allow the switch to synchronize date and time with the principal switch on fiber channel?

     

    Thanks again with your tstimezone explanation. I learn really useful information again. I know a little about CP behaviour on DCX, but was completely unaware about the hareboot.

     

    Regards.


    #BrocadeFibreChannelNetworkingCommunity


  • 8.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 04:24 AM

    I don't see why you would not want to setup correct tsclockserver and tstimezone in all the switches (I mean all physical and all logical as well) in your fabric(s). It is not so difficult especially if you can use some ssh scripting. But this way you will make sure that all the clocks are in sync, and this may save you a lot of time when troubleshooting.

     

    I have seen so many weird SAN issues in my life, and some of them even involved the abandoned default 128 FID having no SAN connections at all. So yes, it is essential that the default unused FID also has correct clock, also because it updates the hardware clock of the chassis.

     

    If your LAN isolated switches will become SAN segmented as well, and therefore loose the SAN distributed clock from principal switches, they will only show a message like "NTP server is not reachable, using LOCL", but will stay with the good clock values until the SAN merges back, I hope that hardware clock is not too much wrong in the modern world devices.

     

    A note about reboot after setting the tstimezone. As you know, FOS is based on Linux. Therefore TZ variable is set in the environment of all the processes. When you change the system default TZ, there is no way to change the TZ of the already running processes (unless if some of them can handle some customized signal that makes them reread the configuration or something like that, but I'm not sure that this is implemented in any of the FOS processes/daemons). So, the only way to switch to the new TZ value is to restart the process. But most of the FOS processes are not restartable. You might have seen the cases where a sudden death of a process is handled by the restart of the entire CP. Therefore the restart of the CP is essentially required to completely switch to the new TZ setting after use of tstimezone command. BUT! You don't have to restart the entire switch to restart the CP. In the DCX-like switches, you can do a dual hafailover to make sure that both CPs are restarted with new TZ. In the smaller switches, you can do hareboot with the same effect.


    #BrocadeFibreChannelNetworkingCommunity


  • 9.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 05:44 AM

    Hi,

     

    I agree with alexey.stepanov that it is better to NTP server on all switches to ensure that time is synchronized for troubleshooting. And the tslockserver command only need to be run once in each fabric (ntp list distributed all other switch at that time) but tstimezone on all of them. Notice the following from the help for tstimezone

     

         Time zone is used in computing local time for error  report-
         ing  and  logging.  An  incorrect  time  zone setup does not
         affect the switch operation in any way.

     

         System services started during the  switch  boot  reflect  a
         time zone change only at the next reboot.

     

    For the user, the change of the timezone is effectvely immediately

     

    SW6510_1:FID128:admin> date
    Thu Aug 25 12:50:59 Localtime 2016

     

    SW6510_1:FID128:admin> firmwareshow
    Appl     Primary/Secondary Versions
    ------------------------------------------
    FOS      v7.4.1c
             v7.4.1c

     

    SW6510_1:FID128:admin> tstimezone
    Time Zone Hour Offset: 1
    Time Zone Minute Offset: 0

    SW6510_1:FID128:admin> fabriclog -s
    Time Stamp      Input and *Action                           S, P   Sn,Pn  Port  Xid
    ===================================================================================
    Switch 0; Thu Aug 18 13:51:23 2016 GMT-1 (GMT+1:00)
    13:51:23.372464 *Fss Init                                   NA,NA  NA,NA  NA    NA
    13:51:23.374389 *Initiate State (max_port=200)              NA,NA  F2,NA  NA    NA
    13:51:23.441162 Expd1 0x00000000 00000000 0000ffff ffffffff F2,NA  F2,NA  0     NA
    13:51:32.797486 Rcv FSS_RECOV_COLD                          F2,NA  F2,NA  NA    NA
    13:51:32.797929 D-port Offline Skip Cnt 1(inst = 1)         F2,NA  F2,NA  NA    NA

     

    SW6510_1:FID128:admin> tstimezone 2,0
    System Time Zone change will take effect at next reboot

     

    SW6510_1:FID128:admin> fabriclog -s
    Time Stamp      Input and *Action                           S, P   Sn,Pn  Port  Xid
    ===================================================================================
    Switch 0; Thu Aug 18 14:51:23 2016 GMT-2 (GMT+2:00)
    14:51:23.352050 *Fss Init                                   NA,NA  NA,NA  NA    NA
    14:51:23.353974 *Initiate State (max_port=200)              NA,NA  F2,NA  NA    NA
    14:51:23.420748 Expd1 0x00000000 00000000 0000ffff ffffffff F2,NA  F2,NA  0     NA
    14:51:32.777072 Rcv FSS_RECOV_COLD                          F2,NA  F2,NA  NA    NA
    14:51:32.777515 D-port Offline Skip Cnt 1(inst = 1)         F2,NA  F2,NA  NA    NA

     

    Interesting, when checking the running process (/proc/


    #BrocadeFibreChannelNetworkingCommunity


  • 10.  Re: Problem understanding NTP with logical switches

    Posted 08-25-2016 05:58 AM

    Thank you Martin.

     

    I think I missed something you and other contributors said (and in the documentation). The tsclockserver command is fabric propragated, so if I configure LOCL or an NTP server IP address, it will be the same configuration on all switches in the fabric.

     

    It is my mistake, I understand better your messages.

     

    Thank you again.

     

    Regards.


    #BrocadeFibreChannelNetworkingCommunity


  • 11.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 12:06 PM

    @Ludo_FR wrote:

    Thank you Martin.

     

    I think I missed something you and other contributors said (and in the documentation). The tsclockserver command is fabric propragated, so if I configure LOCL or an NTP server IP address, it will be the same configuration on all switches in the fabric.

     

    It is my mistake, I understand better your messages.

     

    Thank you again.

     

    Regards.


    I would like to clarify just a bit. If you use LOCL for the time server in ANY switch in the fabric, you will not keep in sync with the stratum offset(tick count update) of the  NTP source ever. Drift will occur because the LOCL source for the system and OS clock is a function of the crystal clock source on the chassis which drives all the buses, and clocks the ASICs, and it is not altered by the NTP source. I will warn you, if you let the time drift too far from the statum 2, 3, or 4 source there may be cases where you will need to reset the date/time before using tsclockserver because the drift of the LOCL source clock becomes too great and cannot be caught up with the NTP source(this is typically not an issue with modern Linux based kernel).

     

    Further, I provided an example with three sources. The algorithm that computes the drift in local time involves comparing the tick drift of more than one NTP server to provide the best correction. The math is not important, but it's best to use a min of 2 different NTP sources provided the OS uses both of them to qualify the metric for drift correction. I believe, but will not look up that Brocade's kernel does offer multiple time source tick corrections.

     

    And that, is all I recall about LInux NTP as it affects the FOS product.

     

    Best of luck.


    #BrocadeFibreChannelNetworkingCommunity


  • 12.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 01:46 PM
    I'm not sure that FOS uses multiple simultaneous NTP sources. Tsclockserver command always shows one Active source and then all those that are configured.
    #BrocadeFibreChannelNetworkingCommunity


  • 13.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-25-2016 04:29 PM

    Well, when I say 'provided the OS uses both of them to qualify the metric for drift correction. I believe, but will not look up that Brocade's kernel does offer multiple time source tick corrections.' I mean I'm not going to look it up, and that provided the OS uses both to fix the drift, then someone comes along and lets me know I 'might' be wrong, I'm ok with that.

     

    And so:

     

    http://linux.die.net/man/8/ntpd

     

    By default, ntpd runs in continuous mode where each of possibly several external servers is polled at intervals determined by an intricate state machine. The state machine measures the incidental roundtrip delay jitter and oscillator frequency wander and determines the best poll interval using a heuristic algorithm.

     

    Also:

     

    http://perdues.com/doc/ntp.html

     

    The minimum system configuration is a couple of lines in your ntp configuration file, typically /etc/ntp.conf. You will need a line or two or three like this:

      server some.timeserver.com
      server othertime.server.org
    

    Synchronizing with more than one NTP server gives you redundancy in case of server or network problems, and it may improve the accuracy of your time synchronization.

     

    Does the ntpd in FOS perform the heuristic math to adjust for best drift correction? Only a FOS coder will know for sure. And I will absolutely, positively not be hacking into it to find out. Alternately, one could insert a packet analyzer into the active CP, and trigger on the DNS or IP for the NTP sources and see what we will see. I won't be doing this either! :womanwink:

     


    #BrocadeFibreChannelNetworkingCommunity


  • 14.  Re: Problem understanding NTP with logical switches
    Best Answer

    Posted 08-26-2016 06:58 AM

    Notice the writing the CLI manual tsclockserver

     

    When multiple NTP server addresses are specified, tsClockServer sets the first reachable address for the active NTP server. The remaining addresses are stored as backup servers, which can take over if the active NTP server fails.

     

    Which does not sound like ntpd.  And I do not see any ntpd daemon currently on my FOS 741c switch in the process list.

    When I looked into this in 2004 (FOS 4.x) and using tcpdump - a query was sent out to active ntp server every 64 seconds,

    and FOS was updating the RTC, and then broadcasting the time to all switches in the fabric.

     

    For a chassis with logical switches, only the ntp server configured on the default switch will update the RTC and drift (file) from tstimezone CLI

     

    When Virtual Fabrics are enabled, the hardware clock is updated by the default switch in the chassis, and the time zone set on any logical switch applies to all logical switches on the chassis. The tsTimeZone command requires chassis permissions.


    #BrocadeFibreChannelNetworkingCommunity


  • 15.  Re: Problem understanding NTP with logical switches

    Posted 08-30-2016 05:09 AM

    Thanks a lot for these really interesting details about NTP working, on GNU-Linux OS and Brocade FOS.

     

    A lot of work last few days, but I read closely your messages.

     

    Thanks again.

     

    Ludo


    #BrocadeFibreChannelNetworkingCommunity


  • 16.  Re: Problem understanding NTP with logical switches

    Posted 09-09-2016 06:20 AM

    For completness, I addthe folowing notice - based on observation and from the FOS admin guide 7.4:

     

    When Virtual Fabric is enabled - all chassis is in fact quering the ntp server as specified in the tsclockserver

     

    • All switches in a given chassis must be configured for the same set of NTP servers. This ensures that

    time does not go out of sync in the chassis. It is not recommended to configure LOCL in the NTP

    server list.

    • All default switches in the fabric can query the NTP server. If Virtual Fabrics is not enabled, only the

    principal or primary FCS switch can query the NTP server.

    • The logical switches in a chassis get their clock information from the default logical switch, and not

    from the principal or primary FCS switch.

     

     

    more details at http://www.brocade.com/content/html/en/administration-guide/fos-741-adminguide/GUID-32BD2FF1-99C2-47B6-A6F7-343434F96A1D.html


    #BrocadeFibreChannelNetworkingCommunity