Asset ID: |
1-71-1004720.1 |
Update Date: | 2009-12-02 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1004720.1
:
Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, E6900: How do the clocks work?
Related Items |
- Sun Fire E6900 Server
- Sun Fire 3800 Server
- Sun Fire 6800 Server
- Sun Fire E4900 Server
- Sun Fire 4800 Server
- Sun Fire 4810 Server
|
Related Categories |
- GCS>Sun Microsystems>Servers>Midrange Servers
|
PreviouslyPublishedAs
206553
DescriptionThis document describes the design and operation of system clocks in Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, and E6900 systems.
Steps to FollowSun Fire[TM] system clocks are a commonly misunderstood subject, and source of confusion for customers and service engineers.
In order for the Solaris[TM] Operating System to keep time, it relies on the system's hardware clock to provide a pulse that Solaris can reference. In discrete systems, this 'pulse', or 'heartbeat', is provided by an oscillator crystal of some sort (generally part of what is referred to as the
TOD chip).
Sun Fire[TM] 3800, 4800, 4810, 6800, E4900, and E6900 systems were designed with two goals in mind that
complicate this scheme: - The desire to have multiple domains in one
physical chassis.
- Compounded with the desire to make the systems as
fault resilient as possible.
It is legal for each domain within a
chassis to have a different time of day.
- For instance, one
domain could be running on local time.
- Another could be serving
data for a place on the other side of the world and running in that
local time.
- A third domain could be doing application testing
at some date 20 or 30 years in the future.
This would require
multiple hardware clocks to provide each domain its own discrete
time source.
Sun Fire designers decided that the best way to handle and
manage this issue would be to have the System Controller
(SC) provide the time 'pulse', or heartbeat, to the domains using
the SC-to-Solaris mailbox infrastructure.
Solaris (on the domains) uses a special
driver called todsg to talk to the SC for time purposes. The SC
actually has its own Mostek M48T59 TOD chip, which it references
for time. It then maintains an offset value from that TOD for each
domain. Each of the different methods of changing time on a domain
('setdate' from the domain console, or the stime(2), adjtime(2),
and ntp_adjtime(2) system calls from Solaris) are actually updating
the domain's offset on the SC.
Sun Fire designers also recognized the SC as a Single Point of
Failure (SPOF) within the system.
If the SC were to fail
completely, the entire system would lose its clock, and the SC
would be unable to provide accurate time to each domain
individually. Thus, a spare SC was included, as well as software to
allow the SCs to failover between themselves, and the individual domain time offsets are synchronized
between the SCs.
One issue with this solution is that the TOD chips on each SC
will be different, and will contain different values. Furthermore, no matter how many times you synchronize between
the two SCs, the time value between them will always drift, a
phenomenon referred to as 'skew'. Left uncorrected, this
could cause time to change on the domain when the SCs failover. The
times are calculated via a formula similar to 'time = sc_tod_time +
domain_offset'. If the SCs were to failover, and no adjustment were
made for the difference between TOD times on the SCs, the value for
'sc_tod_time' would change in this formula, thus causing the new
main SC to return the wrong time value to Solaris on the domains.
This was fixed partially by providing the SC with its own offset.
This offset is used in conjunction with the above formula to negate
the effect of different SC TOD time values. Thus, the formula
becomes 'time = sc_tod_time + domain_offset + sc_offset'. When the
SCs failover, sc_tod_time and sc_offset both change accordingly,
hopefully leaving the end result the same.
However, skew still remains something of an issue in this
scheme.
The TOD chips are relatively inaccurate time keepers and
will slowly gain or lose time as they run. Each chip does this at a
different pace, and it is very difficult to predict accurately the
rate of error that each chip will display. In order to combat this
issue, support for the Network Time Protocol (SNTP) was introduced
along with support for SC Failover, in firmware level 5.13.0. This
allows the SCs to both be synchronized to the same external clock
source, allowing the sc_offset values to be updated appropriately.
Therefore, when SC failover occurs, the domains should not
see a change in the time being provided by the main SC.
Several bugs have also been corrected.
Bug ID 4493896 , fixed in
firmware 5.13.0, could cause domain times to change unexpectedly
when the SC's time was updated. ScApp would initially synchronize
its own internal time with the SC TOD chip at boot time and then
would run freely without resynchronizing with the TOD chip. This
would allow ScApp's time to drift in relation to the TOD chip. If
the date were then updated in ScApp (via setdate), ScApp assumed
that its time and the TOD time were the same and would simply
update the TOD without performing a check beforehand. The domain
times are calculated based against the TOD time, not ScApp's time,
so the resulting change to the TOD would also cause a shift in the
time provided to the domains. This was corrected by causing ScApp
to synchronize its clock with the TOD every five minutes.
Bug ID 4618950 , corrected in Solaris 8 at KJP Patch ID 108528-16 , and in
Solaris 9 at Patch ID 112987-01, prevented NTP and 'date -a' from skewing
the domain clock. There was a bug in the todsg driver that would
cause the adjtime(2) and ntp_adjtime(2) system calls to fail to
update system time appropriately. The time offset specified by
those system calls would effectively be negated in the todsg
driver, causing the time to never be updated. The user would
briefly see the time creep towards the desired clock value, only to
reset back to the previous value. As a workaround for this bug, many customers may have the following entries in the
/etc/system file:
set tod_broken=1
set dosynctodr=0
Setting tod_broken=1 tells the kernel that the virtual TOD is
'broken' (i.e., it is not responding to updates properly), and
therefore, the kernel should not try to update the TOD. This
essentially disables the normal and most common kernel path into
the todsg module. However, this does not guarantee that the kernel will never attempt
to set the TOD, so the kernel code paths that exercise that bug
must also be disabled; thus dosynctodr is set to 0. Once a customer
has installed the appropriate fix for the todsg module, they will
need to remove these lines from /etc/system to allow for proper
operation of the clock mechanisms.
Another feature was implemented in firmware 5.15.0 (Patch ID 112884-01)
to cause the main SC to check the time on the spare before allowing
a manual SC failover to proceed. This prevents the user from
accidentally causing a time shift on their domains if the SC TODs
differ by too large a value.
Finally, Bug ID 4783568, fixed in 5.15.0 firmware (Patch ID 112884-01), describes a condition where the domain offsets may not
be properly synchronized between the two SCs if the date is changed
on the main SC, and then a SC failover is forced manually in short
succession.
Additional Information:
- Sun Alert 200817 On Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump
- Sun Alert 201659 On Rare Occasions, Sun Fire 3800/4800/4810/6800 and E4900 and E6900 Systems May Experience Data Loss During Clock Jumps
- Sun Alert 200078 Sun Fire Midrange Server Time Jumps When SC Accumulates Extended Uptime
- <Document: 1009954.1> Sun Fire[TM] Midrange Server: Time Jumps When SC Accumulates Extended Uptime
- It's important to note in this case, that the issue is "extended uptime" and not specific time (ie. 575 or 828 days).
- CR 6567546 and CR 6585200 are relevant.
- Sun Fire Midrange Server Update Best Practices Update for Firmware 5.18.x (Page 8 offers suggested advice for use of an SNTP server for the SCs).
ProductSun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E4900 Server
Sun Fire E6900 Server
Internal Comments
Bugs remain that are affecting the accuracy of clocks within these systems.
One is a bug in ScApp, Bug ID 4966931 (fixed in Patch ID 114523-02).
- This bug
documents how ScApp manages the Mostek. TOD chip time is set
using 1 second discrete steps, and the algorithm used to it depends
upon how far the clock has drifted from the NTP server.
- It also
notes that an SNTP (remember, S == simple )
implementation is not going to provide the functionality of a full
blown NTP client implementation (which requires more CPU and memory
overhead).
- The suggested fix for this bug has been to start using
the calibration bits within the Mostek TOD's control register. This will allow for a more sophisticated skew algorithm, and
should eliminate the 1 second discrete steps we currently use to
update the hardware clock.
Due to Bug ID 4663142 (listed above), the first
workaround mentioned in Bug ID 4966931 is not viable if SC Failover is
desired. Disabling SNTP on the SCs will eventually allow the two
SCs to drift far enough apart to prevent SC failover from taking
place. Adding entries to /etc/system is the recommended workaround
for this reason.
The second bug is in Solaris, Bug ID 4973321.
- This bug asserts
that, with the advent of cyclic timers used to run the clock in
Solaris 8, it is no longer appropriate to sync the Solaris clock to
the TOD following boot of the system. This is because the Solaris
clock will tend to be far more accurate than the TOD, especially
with NTP configured on the domain.
Bug ID 4514730
"dosynctodr code has structure similar to game of fizzbin".- Created to clean up the code and is in Accepted state (but no progress made).
Bug ID 4663142
"Changing date on Main SC and doing SCFailover, effects the Domain
Date"- CR that is not mentioned in the customer accessible section but was integrated in 5.15.0.
clocks, clock, tod, TOD, SC TOD, domain time, time, ntp, sntp, sync, time jump, failover, drift, skew, tod_broken
Previously Published As
70173
Attachments
This solution has no attachment