Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, E6900: How do the clocks work?

Asset ID:	1-71-1004720.1
Update Date:	2012-07-30
Keywords:

Solution Type Technical Instruction Sure

Solution 1004720.1 : Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, E6900: How do the clocks work?

Applies to:

Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E6900 Server
All Platforms

Goal

This document describes the design and operation of system clocks in Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, and E6900 systems.

Solution

Sun Fire[TM] system clocks are a commonly misunderstood subject, and source of confusion for customers and service engineers.
In order for the Solaris[TM] Operating System to keep time, it relies on the system's hardware clock to provide a pulse that Solaris can reference. In discrete systems, this 'pulse', or 'heartbeat', is provided by an oscillator crystal of some sort (generally part of what is referred to as the TOD chip).

Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, and E6900 systems were designed with two goals in mind that complicate this scheme:

The desire to have multiple domains in one physical chassis.
Compounded with the desire to make the systems as fault resilient as possible.

It is legal for each domain within a chassis to have a different time of day.
For instance, one domain could be running on local time.
Another could be serving data for a place on the other side of the world and running in that local time.
A third domain could be doing application testing at some date 20 or 30 years in the future.
This would require multiple hardware clocks to provide each domain its own discrete time source.

Sun Fire designers decided that the best way to handle and manage this issue would be to have the System Controller (SC) provide the time 'pulse', or heartbeat, to the domains using the SC-to-Solaris mailbox infrastructure.

Solaris (on the domains) uses a special driver called todsg to talk to the SC for time purposes. The SC actually has its own Mostek M48T59 TOD chip, which it references for time. It then maintains an offset value from that TOD for each domain. Each of the different methods of changing time on a domain ('setdate' from the domain console, or the stime(2), adjtime(2), and ntp_adjtime(2) system calls from Solaris) are actually updating the domain's offset on the SC.

Sun Fire designers also recognized the SC as a Single Point of Failure (SPOF) within the system.

If the SC were to fail completely, the entire system would lose its clock, and the SC would be unable to provide accurate time to each domain individually. Thus, a spare SC was included, as well as software to allow the SCs to failover between themselves, and the individual domain time offsets are synchronized between the SCs.

One issue with this solution is that the TOD chips on each SC will be different, and will contain different values. Furthermore, no matter how many times you synchronize between the two SCs, the time value between them will always drift, a phenomenon referred to as 'skew'. Left uncorrected, this could cause time to change on the domain when the SCs failover. The times are calculated via a formula similar to 'time = sc_tod_time + domain_offset'. If the SCs were to failover, and no adjustment were made for the difference between TOD times on the SCs, the value for 'sc_tod_time' would change in this formula, thus causing the new main SC to return the wrong time value to Solaris on the domains. This was fixed partially by providing the SC with its own offset. This offset is used in conjunction with the above formula to negate the effect of different SC TOD time values. Thus, the formula becomes 'time = sc_tod_time + domain_offset + sc_offset'. When the SCs failover, sc_tod_time and sc_offset both change accordingly, hopefully leaving the end result the same.

However, skew still remains something of an issue in this scheme.

The TOD chips are relatively inaccurate time keepers and will slowly gain or lose time as they run. Each chip does this at a different pace, and it is very difficult to predict accurately the rate of error that each chip will display. In order to combat this issue, support for the Network Time Protocol (SNTP) was introduced along with support for SC Failover, in firmware level 5.13.0. This allows the SCs to both be synchronized to the same external clock source, allowing the sc_offset values to be updated appropriately. Therefore, when SC failover occurs, the domains should not see a change in the time being provided by the main SC.

Several bugs have also been corrected.

Bug ID 4493896 , fixed in firmware 5.13.0, could cause domain times to change unexpectedly when the SC's time was updated. ScApp would initially synchronize its own internal time with the SC TOD chip at boot time and then would run freely without resynchronizing with the TOD chip. This would allow ScApp's time to drift in relation to the TOD chip. If the date were then updated in ScApp (via setdate), ScApp assumed that its time and the TOD time were the same and would simply update the TOD without performing a check beforehand. The domain times are calculated based against the TOD time, not ScApp's time, so the resulting change to the TOD would also cause a shift in the time provided to the domains. This was corrected by causing ScApp to synchronize its clock with the TOD every five minutes.
Bug ID 4618950 , corrected in Solaris 8 at KJP Patch ID 108528-16 , and in Solaris 9 at Patch ID 112987-01, prevented NTP and 'date -a' from skewing the domain clock. There was a bug in the todsg driver that would cause the adjtime(2) and ntp_adjtime(2) system calls to fail to update system time appropriately. The time offset specified by those system calls would effectively be negated in the todsg driver, causing the time to never be updated. The user would briefly see the time creep towards the desired clock value, only to reset back to the previous value. As a workaround for this bug, many customers may have the following entries in the /etc/system file:
set tod_broken=1
set dosynctodr=0
Setting tod_broken=1 tells the kernel that the virtual TOD is 'broken' (i.e., it is not responding to updates properly), and therefore, the kernel should not try to update the TOD. This essentially disables the normal and most common kernel path into the todsg module. However, this does not guarantee that the kernel will never attempt to set the TOD, so the kernel code paths that exercise that bug must also be disabled; thus dosynctodr is set to 0. Once a customer has installed the appropriate fix for the todsg module, they will need to remove these lines from /etc/system to allow for proper operation of the clock mechanisms.
Another feature was implemented in firmware 5.15.0 (Patch ID 112884-01) to cause the main SC to check the time on the spare before allowing a manual SC failover to proceed. This prevents the user from accidentally causing a time shift on their domains if the SC TODs differ by too large a value.
Finally, Bug ID 4783568, fixed in 5.15.0 firmware (Patch ID 112884-01), describes a condition where the domain offsets may not be properly synchronized between the two SCs if the date is changed on the main SC, and then a SC failover is forced manually in short succession.

Additional Information:

Document 1000616.1 Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump
Document 1001236.1 On Rare Occasions, Sun Fire 3800/4800/4810/6800 and E4900 and E6900 Systems May Experience Data Loss During Clock Jumps
Document 1000062.1 Sun Fire Midrange Server Time Jumps When SC Accumulates Extended Uptime
Document 1009954.1 Sun Fire[TM] Midrange Server: Time Jumps When SC Accumulates Extended Uptime<
It's important to note in this case, that the issue is "extended uptime" and not specific time (ie. 575 or 828 days).
CR 6567546 and CR 6585200 are relevant.

Sun Fire Midrange Server Update Best Practices Update for Firmware 5.20.x (Page 12 offers suggested advice for use of an SNTP server for the SCs).

Product
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E4900 Server
Sun Fire E6900 Server

Bugs remain that are affecting the accuracy of clocks within these systems.

One is a bug in ScApp, Bug ID 4966931 (fixed in Patch ID 114523-02).

This bug
   documents how ScApp manages the Mostek. TOD chip time is set
   using 1 second discrete steps, and the algorithm used to it depends
   upon how far the clock has drifted from the NTP server.
It also
   notes that an SNTP (remember, S == simple )
   implementation is not going to provide the functionality of a full
   blown NTP client implementation (which requires more CPU and memory
   overhead).
The suggested fix for this bug has been to start using
   the calibration bits within the Mostek TOD's control register. This will allow for a more sophisticated skew algorithm, and
   should eliminate the 1 second discrete steps we currently use to
   update the hardware clock.

Due to Bug ID 4663142 (listed above), the first
workaround mentioned in Bug ID 4966931 is not viable if SC Failover is
   desired. Disabling SNTP on the SCs will eventually allow the two
   SCs to drift far enough apart to prevent SC failover from taking
   place. Adding entries to /etc/system is the recommended workaround
   for this reason.

The second bug is in Solaris, Bug ID 4973321.

This bug asserts
   that, with the advent of cyclic timers used to run the clock in
   Solaris 8, it is no longer appropriate to sync the Solaris clock to
   the TOD following boot of the system. This is because the Solaris
   clock will tend to be far more accurate than the TOD, especially
   with NTP configured on the domain.

Bug ID 4514730 "dosynctodr code has structure similar to game of fizzbin".

Created to clean up the code and is in Accepted state (but no progress made).

Bug ID 4663142
"Changing date on Main SC and doing SCFailover, effects the Domain Date"

CR that is not mentioned in the customer accessible section but was integrated in 5.15.0.

Keywords:
clocks, clock, tod, TOD, SC TOD, domain time, time, ntp, sntp, sync, time jump, failover, drift, skew, tod_broken

Attachments

This solution has no attachment