![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1011106.1 : Fabric devices and QuickLoop devices exported to Solaris [TM] via the same Fiber Channel connection.
PreviouslyPublishedAs 215279
Applies to:Sun Fire 12K ServerSun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server All Platforms SymptomsWhen Fabric devices and QuickLoop devices are exported to Solaris via the same Fiber channel connection, it reported offline/online under heavy load. And also, it resulted in poor IO performance.Example: A highend server at the customer site logged the following errors: svr03 qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(1): Loop OFFLINE CauseSimilar issues happened several times on this same server(svr03). Customer DBA complained that backup UFS file system which based on EMC[TM] CLARiiON Cx700 LUNs showed very poor IO performance (less than 50KB/s for read OR write). Scheduled backup jobs failed.After a careful review of the current IO sub-system configuration, it was found that the affected EMC[TM] CLARiiON Cx700 LUNs(OS marked it as "Vendor:DGC") and SAN attached Tape drives (Os marked it as "Vendor:IBM) are all presented to Solaris via the same fiber channel - /pci@fd,600000/SUNW,qlc@1,1/fp@0,0. This is the lower Fiber channel port of server's 1st HBA(Part Number is X6768 OR 375-3108, it is 2GB dual port HBA). The following diagram shows the original system Backup SAN architecture: +--------+ +--------+ +--------+ +--------+ | 1st HBA| | 2nd HBA| | 3rd HBA| | 4th HBA| | | | | | | | | | FC(U) | | FC(U) | | FC(U) | | FC(U) | | | | | | | | | | | | | | | | | | FC(L) | | FC(L) | | FC(L) | | FC(L) | | | | | | | | | | +--|-----+ +--------+ +--------+ +--------+ | | +--> To SAN switch ports for both Cx700 array & tape drives. (The port was configured both in zone svr3-bk-za & svr3-bk-zb, overlapped)Note: Above Four HBA's Upper FC ports [FC(U)] are used for another high-end storage connection. Zone configuration (svr3-bk-za and svr3-bk-zb): Zone Defines Port Port Type -------------------------------------------------------------------- Zone svr03-bk-za 1st HBA lower port F-Port Cx700 Controller SPA F-Port Cx700 Controller SPB F-Port Zone svr03-bk-zb 1st HBA lower port(overlap) F-Port Tape Driver(st30) L-Port, 1 Public Tape Driver(st31) L-Port, 1 Public ---------------------------------------------------------------------We also noticed that above failure used to happen only under heavy IO loads. Light IO workload worked fine. SolutionThough fabric device and QuickLoop device can work together, it was never recommended by any Storage or Switch Vendors. Because a chunk of data needs to be read from this Fiber channel and then write to the tape drives via the same Fiber Channel. This could trigger poor IO performance, resulting application failure.When the ports for the tape drives in zone "svr03-bk-zb" were made to fail, two tape drives st30 & st31 both became offline. Following meesages were logged: svr03 fctl: [ID 517869 kern.warning] WARNING: 2793=>fp(1)::GPN_ID for D_ID=b18c6 failed
svr03 fctl: [ID 517869 kern.warning] WARNING: 2794=>fp(1)::N_x Port with D_ID=b18c6,
PWWN=50050763004a3e06 disappeared from fabric This resulted satisfying IO performance, single Read or write thread can generate IO throughput up to 40-50 MB/s. So, disabling those tape drives can be used as a temporary workaround in a similar configuration. This proves that the bottleneck was in the configuration. Rebuilding the current backup SAN architecture, that is to organize Fibric devices and QuickLoop devices in two separate zones (also using different HBAs), was the proposed solution. Following are the two new planned Zone defines: Zone Defines Port Port Type -------------------------------------------------------------------- Zone svr03-bk-za 1st HBA lower port F-Port 2nd HBA lower port(for DMP) F-Port Cx700 Controller SPA F-Port Cx700 Controller SPB F-Port Zone svr03-bk-zb 3rd HBA lower port F-Port Tape Driver(st30) L-Port, 1 Public Tape Driver(st31) L-Port, 1 Public ---------------------------------------------------------------------Following is the diagram of the final system Backup SAN architecture: +--------+ +--------+ +--------+ +--------+ | 1st HBA| | 2nd HBA| | 3rd HBA| | 4th HBA| | | | | | | | | | FC(U) | | FC(U) | | FC(U) | | FC(U) | | | | | | | | | | | | | | | | | | FC(L) | | FC(L) | | FC(L) | | FC(L) | | | | | | | | | | | | +--|-----+ +---|----+ +----|---+ +--------+ | | | | | | | | +--> To SAN Switch for Tape driver connection | | (This port was configured in zone svr03-bk-zb) | | | +---> To SAN Switch for Cx700 Array connections(DMP path A) | (This port was configured in zone svr3-bk-za) | +--> To SAN Switch Port for Cx700 Array connections(DMP path B) (This port was configured in zone svr3-bk-za) So, as a best practice for device connection via SAN Switch, try to avoid configuring Fabric and QuickLoop devices into the same fiber channel connection especially when they are both used for the same application. Relief/Workaround These two types of devices need to be in different zones. So, disabling one of these devices temporarily would avoid the poor performance issue. Product Sun Fire E25K Server Sun Fire E20K Server Sun Fire 15K Server Sun Fire 12K Server Keywords: SAN switch L-Port, F-Port, X6768, 375-3108, Qlogic qlc, Loop OFFLINE, Link ONLINE, tape, QuickLoop Previously Published As 83332 Product_uuid d842dd03-059b-11d8-84cb-080020a9ed93|Sun Fire E25K Server 1404a2d3-059a-11d8-84cb-080020a9ed93|Sun Fire E20K Server 29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server 077fd4c5-df8f-4320-ad69-7d01603a674d|Sun Fire 12K Server Attachments This solution has no attachment |
||||||||||||
|