ALERT - Exadata X2-8 systems affected by Linux bug 14258279: scheduling clock overflows in 208 days

Asset ID:	1-77-1473825.1
Update Date:	2012-07-11
Keywords:

Solution Type Sun Alert Sure

Solution 1473825.1 : ALERT - Exadata X2-8 systems affected by Linux bug 14258279: scheduling clock overflows in 208 days

Applies to:

Exadata Database Machine X2-8
Oracle Exadata Storage Server Software - Version 11.2.2.1.1 to 11.2.2.4.2 [Release 11.2]
Linux x86-64

Description

The 8-socket database servers of Exadata X2-8 systems running Exadata software releases older than 11.2.3.1.0 may experience issues if they have been continuously up for more than 208 days due to Linux Bug 14258279.

These issue can be avoided by upgrading to Exadata release 11.2.3.1.0 or later, or rebooting the servers before they reach 208 days of continuous uptime.

Occurrence

Exadata X2-8 database servers running Unbreakable Enterprise Kernel for Oracle Linux 2.6.32-100.23.1 that have been continuously up for more than 208 days are susceptible to this problem. Unbreakable Enterprise Kernel for Oracle Linux 2.6.32-100.23.1 is the Linux kernel provided with Exadata releases 11.2.2.2.0 through 11.2.2.4.2, inclusive. Uptime may be determined by the uptime(1) command.

# uptime

13:48:23 up 68 days, 3:59, 3 users, load average: 2.05, 2.05, 2.05

Exadata X2-8 database servers running Unbreakable Enterprise Kernel for Oracle Linux 2.6.32-300.7.2 (provided with Exadata 11.2.3.1.0) or later are not affected.

Exadata X2-2, V2, and V1 database servers are not affected.

Exadata Storage Servers are not affected.

Symptoms

One or more of the following issues may occur on or after 208 days of continuous uptime:

Kernel panic
Task scheduler unfairness
CPU soft lockup error, especially when showing invalid time stamp and/or time calculation. The following is an example message reported to the system console with invalid time stamp and/or time calculation:

[18446743993.431771] BUG: soft lockup - CPU#1 stuck for 17163091968s! [???:XXXXX]

Workaround

There are two possible courses of action:

Upgrade to Exadata 11.2.3.1.0 or later (Recommended).
Reboot database servers before uptime reaches 208 days.

1. Upgrade to Exadata 11.2.3.1.0 or later (Recommended)

The recommended action is to upgrade to Exadata release 11.2.3.1.0 or later. Exadata 11.2.3.1.0 and later contain a UEK version with a fix to this bug.

Refer to <Document 888828.1> for the most recent Exadata release.

2. Reboot database servers before uptime reaches 208 days

If you cannot upgrade to Exadata 11.2.3.1.0 or later, then to avoid issues associated with <Bug 14258279> on your current Exadata release, restart database servers before uptime reaches 208 days.

Follow the procedure below to reboot database servers in a rolling fashion.

Reboot Database Servers in Rolling Fashion

Rebooting database servers in rolling fashion allows application service(s) to potentially be available on and automatically failed over to 1 or more other database servers as 1 database server is restarted. For more information on making an application highly available using Oracle Clusterware and Transparent Application Failover please refer to the following:

Repeat the following procedure serially on each database server.

1. Ensure Cluster Ready Services (CRS) autostart is enabled so that CRS will automatically start when the database server is rebooted.

# GRID_HOME/bin/crsctl config crs
CRS-4622: Oracle High Availability Services autostart is enabled.

If autostart is not enabled and the desired behavior is to automatically start when the system is started, enable autostart with the following command:

# GRID_HOME/bin/crsctl enable crs
CRS-4622: Oracle High Availability Services autostart is enabled.

2. Shutdown CRS by running the following command:

# GRID_HOME/bin/crsctl stop crs

If any resources are still running after issuing the above command, then reissue the command using the force “-f” option:

# GRID_HOME/bin/crsctl stop crs –f

3. Once all resources have stopped reboot the database server using the shutdown(8) command:

# shutdown -r now

4. After the database server has restarted, check status of resources.

# GRID_HOME/bin/crsctl status resource -t

Patches

This Linux bug is fixed in Unbreakable Enterprise Kernel versions supplied with Exadata 11.2.3.1.0 and later.

Refer to <Document 888828.1> for information on how to obtain Exadata releases.

History

10-Jul-2012 review ready

References

<BUG:14258279> - [EXADATA] SOFT LOCKUP - CPU#0 STUCK FOR 17163091968S!
@ <BUG:13604567> - SCHED CLOCK OVERFLOWS IN 208 DAYS (I386 AND AMD64)
<NOTE:888828.1> - Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions

Attachments

This solution has no attachment