Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1353352.1
Update Date:2011-10-11
Keywords:

Solution Type  Problem Resolution Sure

Solution  1353352.1 :   One alternative explanation for Exadata Connect: ossnet: connection failed to server 192.168.10.xx, Result=5 (Login: sosstcpreadtry Failed)  


Related Items
  • Exadata Database Machine X2-8
  •  
Related Categories
  • PLA-Support>Database Technology>Engineered Systems>Oracle Exadata>DB: Exadata_EST
  •  


Exadata Connect: ossnet: connection failed to server can have several sources and should not be considered a single-problem / single-solution issue

In this Document
  Symptoms
  Changes
  Cause
  Solution
  References


Created from <SR 3-3648910931>

Applies to:

Exadata Database Machine X2-8 - Version: Not Applicable and later   [Release: N/A and later ]
Information in this document applies to any platform.

Symptoms

OSSNET messages may occur across multiple databases in a Exadata machine
- This is associated with user reported problems includiung:
  • Poor performance - generic
  • Multiple Connect: ossnet: connection failed to server messages seen at the same time
  • Slow performance for logins
  • Slow performance for query execution


Changes

None: this Exadata Machine was a 1/4 rack with 10 plus databases and most RAC ( resulting in several more instances for the same box)

Another 1/4 rack for the same user which included several databases also exhibited similar problems

Cause

This was a resources issue due to concurrent operations including RMAN and other parallel operations
We determined that RMAN was running with multiple channels at the same time high degrees of  parallelism used in queries.

By monitoring total resource consumption, changing the RMAN schedule and reducing parallelism from open ended to more explicit specification to include no more than 128 DOP for all concurrent operations on the box we were able to avoid the problem.




Solution

Reduce total parallelism across all databases and instance on the box to more intelligently utilize total concurrent parallelism and / or RMAN operations.


DISCUSSION
-----------------

It is important to realize that this problem has many potential sources which show as performance problems but are often based on a lack of resources:
  •  There is more than one potential source for the message "connect: ossnet: connection failed to server"
  • While there are known bugs which contribute to the frequency of the error such as unpublished bug 9338087 which is included in Exadata patch 11.2.1.3.1., you cannot assume that the only source for this message is a bug
  • A review of the OSWatcher and AWR reports as well as determining the total number of instances on the box can help put the error in the proper context of : Bug vs. Resources issue



@  INTERNAL
++++++++++

@ To get an idea of the range of problems you should review some of the following bugs all filed under the same error message
@
Here is a listing of  some BUGS based on  CONNECT: OSSNET: CONNECTION FAILED TO SERVER messages
--------------------------------------------------------------------------------
12353759 CONNECT: OSSNET: CONNECTION FAILED TO SERVER
12788539 GCW:CONNECT:OSSNET:CONNECTION FAILED TO SERVER,RESULT=5
9338087 ASM AND DATABASE HANG - CONNECT: OSSNET: CONNECTION FAILED TO SERVER, RESULT=5
12588085 CONNECT: OSSNET: CONNECTION FAILED TO SERVER
11927605 INTERMITTENT CONNECT: OSSNET: CONNECTION FAILED TO SERVER MSGS
10268026 Abstract: INTERMITTENT CONNECT: OSSNET: CONNECTION FAILED TO SERVER 
11816370 CONNECT: OSSNET: CONNECTION FAILED TO SERVER ..." IN ALERT LOG
11889624 OSPID: 14283: CONNECT: OSSNET: CONNECTION FAILED TO SERVER
10268026 Abstract: OSPID: 14283: CONNECT: OSSNET: CONNECTION FAILED TO SERVER
12400438 OSPID: 24785: CONNECT: OSSNET: CONNECTION FAILED TO SERVER, RESULT=
10268026 Abstract: OSPID: 24785: CONNECT: OSSNET: CONNECTION FAILED TO SERVER,
10283872 CONNECT: OSSNET: CONNECTION FAILED TO SERVER RESULT=5 
9849734 DBMV2: CONNECT: OSSNET: CONNECTION FAILED TO SERVER
9559738 ASM AND DATABASE HANG - CONNECT: OSSNET: CONNECTION FAILED TO
12801122 OSSNET: CONNECTION FAILED AND SQLPLUS USER LOGINS HANG
12558999 OSSNET CONNECTION FAILURES IN PAE SOLARIS TEST
12860485 EXADATA: ORA-63999 ORA-1207: UNDO IN FUTURE VS CONTROL FILE
12568495 PERFORMANCE SLOW, AWR SHOWS WAITS ON THE CELL SERVERS.
9960290 DBMV2: ORA 600 [15709] IN ARCHIVE HIGH COMPRESSION TEST ON EXDATA 12801: error signaled in parallel query server Pxxx
12873445 CELLSRV RESTARTED DUE TO ORA-600[SENDPORT:SENDPORT_1],[2]
12764296 ORA-600 [KSZ_CLN_PROC1], [0X7D180D7E8], [3],connect: ossnet: connection failed to server result=5 (login: sosstcpreadtry failed) (difftime
12745604 BIGBH:OSSNET:CONNECTION FAILED TO SERVER,RESULT=5 (LOGIN: SOSSTCPREADTRY FAILED)
12672207 THE CELL NODE IO PEAKS AND ENTIRE CLUSTER AND DATABASE HANGS
12548380 EXADATA CONNECTION ERRORS: OSSNET: CONNECTION FAILED TO SERVER 
12341091 DBMV2-BIGBH: HUGE PERFORMANCE IMPACT AFTER CELL RESTART
10411315 FAILURE TO CONNECT TO ONE CELL CAUSED COMPLETE HANG ON CLUSTER
12347384 OSPID: 5511: CONNECT: OSSNET: (LOGIN: SOSSTCPREADTRY FAILED)
12343727 NODE EVICTION DUE TO ORA-00020
10232181 EXADATA V2: MULTIPLE ISSUES AFTER RESTARTING CLUSTER
10640254 RANDOM RDS STALL OVER A CHANNEL WITH G5 DURING OLTP RUN
10130125 ORA-29770 AND INSTANCE EVICTIONS AFTER LOSING ACCESS TO A CELL
10264229 CELLSRV CRASHES WITH ORA-07445 SAGESQL_GET_BFILT_KEYIDS IN ALL CELLS
9881098 LRG 4718833: UNABLE TO REACH CELL
10088326 DB NODES CAN'T COMMUNICATE TO A CELL, DB HANG
10170758 DBMV2: FDOM FAILURE CAUSED OSSNET CONNECTION FAILURE
10038379 DBMG5: SPORADIC OSSNET ERROR MESSAGES : CONNECTION FAILED TO
10042306 ORA-29770 DUE TO LMON CF READ HANG. It seems lmon tries to connect to storage server but failed.
9976135 STBH: POSSIBLE I/O ISSUE CAUSED A SPIKE IN ACTIVE SESSIONS IN THE DATABASE
8705000 CELL FREEZED LEAVING NO DEBUGGING INFO, CAUSED COMPUTE NODES HANG
7342123 BETA2B: DATABASE INSTANCES FAIL TO START FOLLOWING CELL HARDWARE FAILURE
9068926 OSSNET CONNECTION TO CELL FAILS
12761392 FILE DESCRIPTOR LEAK IN CELLSRV DUE TO OSSNET LOGIN FAILURE NOT CLEANING UP DUE TO OSSNET LOGIN FAILURE ... indicating CONNECT timeouts ...
11867741 ALL NODES OF EXADATA SYSTEM WERE HUNG
11856270 DBMV2: CELL DISCONNECTED AND CELLSRV RESTARTED FREQUENTLY error ORA-56841:
12603593 BIGBH:DBMV2:FD LEAK IN CELLSRV
11697804 STBH: EXADATA CELL SHOWED PROCESS BLOCKED OTHER RESOURCES FOLLOWED WITH IO ERROR
12798902 EXADATA DATABASE LOSES I/O CONNECTIONS TO THE STORAGE_CELL_UNDERLOAD
11725469 OSSNET ERRORS WHEN ACCESSING CELLS FROM SOLARIS
12714430 CELLSRV FAILED WITH ERROR:
12768752 LOOP TESTS: ORA-27302: FAILURE OCCURRED AT: SKGXP_PATH
12665484 CELL SERVICE UNRESPONSIVE WITH A LARGE AMOUNT OF CPU USAGE
12592457 FENCEMASTER: OSS_IOCTL_FENCE_ENTITY
12690478 MS PROCESS CONSUMES HIGH CPU
 
SUGGESTIONS
----------------------
Gather AWR reports and OSWatcher files
- assess overall performance
IF the statistics appear to show low load for the instance or database and you still see slow performance including the OSSNET CONNECTION errors, gather more information on:
  • Total number of databases on the Database machine
  • Total number of instances
  • Total default parallelism PER instance
  • Concurrent operations across all database instances for a point in time
  • Operations that may require parallelism including RMAN, partitions and PQ
  • Large pool size relative to parallelism usage
  • If AMM is being used = Check for dynamic memory allocation or resize ops in the AWR
  • Review AWRs for performance issues in absence of LOAD - Indicator the problem is total resources vs. instance resources
  • Check OSWatcher files for resource limits being hit

To Recap: Do not assume that there is a single source for OSSNET CONNECTION messages
- take the time to review performance in the context of the Database Machine vs. individual instance or database problems. You will find that many if not most issues are regarding resource allocation management issues vs. bugs or defects.


References

<BUG:9338087> - ASM AND DATABASE HANG - CONNECT: OSSNET: CONNECTION FAILED TO SERVER, RESULT=5
<NOTE:1094303.1> - Database hanging with "connect: ossnet: connection failed to server ..." in alert log

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback