Document fins/I0766-1


FIN #: I0766-1

SYNOPSIS: VxFS 3.3.3 memory allocation error causes DR to be unable to detach
          or system to appear hung on E10K under Solaris 2.6, 7, and 8

DATE: Feb/01/02

KEYWORDS: VxFS 3.3.3 memory allocation error causes DR to be unable to detach
          or system to appear hung on E10K under Solaris 2.6, 7, and 8


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)


SYNOPSIS: VxFS 3.3.3 memory allocation error causes DR to be unable to 
          detach or system to appear hung on E10K under Solaris 2.6, 7, 
          and 8.


Sun Alert:          No

TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  E10000 DR with VxFS 3.3.3  
 
PRODUCT CATEGORY:   Server / SW Admin


PRODUCTS AFFECTED:  
 
Systems Affected:
----------------
Mkt_ID   Platform   Model   Description               Serial Number
------   --------   -----   -----------               -------------
  -       E10000     ALL    Ultra Enterprise 10000          -
 

X-Options affected:
-------------------
Mkt_ID          Platform   Model      Description                Serial Number
------          --------   -----      -----------                -------------
VXFS-333-9999      -        ALL       VERITAS File System 3.3.3        -


PART NUMBERS AFFECTED: 

Part Number     Description                              Model  
-----------     -----------                              -----
     - 		     -                                     -


REFERENCES:

BugId:   4423662 - VxFS file system buffer pages are allocated as kernel 
                   pages.

PatchId: 110435 - VERITAS File System 3.4: VxFS 3.4 multiple fixes patch 
                     
         110434 - VERITAS File System 3.4: VxFS 3.4 multiple fixes patch
                     
         110433 - VERITAS File System 3.4: VxFS 3.4 multiple fixes patch 
                     
 
ESC:     532725 - unable to detach SB. 
         531528 - DR pbm detaching SB.
      
    
PROBLEM DESCRIPTION:

This FIN is to alert the field of a potential Dynamic Reconfiguration
(DR) failure or system hang when DR is enabled and Veritas VxFS 3.3.3 
is configured on an Enterprise 10000 under Solaris 2.6, 7, and 8.  

The kernel cage created by VxFS may grow unnecessarily onto all boards
in a system.  Once the kernel cage has grown onto all system boards,
you will be unable to detach any system board, severely restricting an
administrator's ability to detach any system board on an E10K running
Solaris 2.6 and VxFS 3.3.3.  In some cases, the system may appear
'hung' due to excessive swap activity (thrashing).

The kernel cage is a software 'cage' which is built around the Unix
Kernel's memory space to protect it.
  
The version of VxFS can be seen using the command:

   # pkginfo -l  VRTSvxfs 

   This will display an extended listing of the Veritas File System package.


The problem can be seen when using DR to attach a board and subsequently 
not being able to detach:

   Example:

     . original domain configuration: sb8, sb9
     . drattach sb1, sb6, sb15
     . attempt to drdetach sb6 fails:

If the original config were: sb1, sb6, sb8, sb9, sb15, it is also
possible that the same error would result when attempting to detach any
one of the boards.  The cage growth is not dependent upon the
attachment of of a board.  Any board in a system, including those at
start of day can be affected by the VxFS bug.

     WARNING: dr_choose_target_adg: Could not find target memory board
     WARNING: dr_build_adg_detach_list: No target memory found
     Undetachable board: Memory configuration prevents board detach.
     Target memory currently unavailable, please retry.
    
This problem is within the VxFS code during an async read ahead when a
valid fault address is unavailable and one is generated.  The generated
address is passed to the function used by DR to determine if a page is
in kernel or user pool.  The address indicates that the page is a part
of the kernel pool so it is allocated as a kernel page, when it should
have been allocated as a user page.

The code has been modified to clear the top nibble, i.e., OR the vaddr
with 0x0fffffff which allows the DR enabled kernel to differentiate
between user and kernel allocations.
  

IMPLEMENTATION: 

         ---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---         


CORRECTIVE ACTION:

The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned problem.

Please adhere to the following guideline in order to prevent DR from 
hanging or crashing on an E10000 system which has VxFS 3.3.3 under 
Solaris 2.6, 7 or 8.

  . Upgrade to VxFS version 3.4 and apply patchId# 110433 
    for Solaris 2.6.

  . Upgrade to VxFS version 3.4 and apply patchId# 110434 
    for Solaris 7.

  . Upgrade to VxFS version 3.4 and apply patchId# 110435 
    for Solaris 8.


COMMENTS:  

  None.

============================================================================

Implementation Footnote:
 
i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
  
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
 
* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
  
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
 
* From there, select the appropriate link to browse the FIN or FCO index.
 
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
   
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
    
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.