Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1018896.1 : Sun Fire[TM] 12K/15K/E20K/E25K: Management Networks (MAN)
PreviouslyPublishedAs 230737
Applies to:Sun Fire 12K ServerSun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server Sun SPARC Sun OS GoalThe Sun Fire[TM] 12K/15K/E20K/E25K platform has three distinct networks which together make up the Management (MAN) Network. The three networks are called the I1 Network, the I2 Network, and the Community (C) Network.The System Controllers (SCs) on these platforms incorporate 22 ethernet interfaces to comprise the MAN network. The type of network interfaces in use depends on which type of SC is in use. There are two different types of SCs used by these four platforms (the SC board itself is the same, but the SC CPU Board can differ):
An SC with a CP2140 (Othello+) uses 22 RIO (eri) based ethernet interfaces to comprise the MAN network. These network interfaces are laid out as shown below: ---------------------------------------------------------- Physical location of Network Interfaces:
The C Network was previously allowed to be tuned. However, design changes in hardware does now disallow any tuning of the C network. It must be set to auto-negotiate on both switch side and on the SC side. NOTE Most network equipments, such as switch, have many network tuning parameters. In other than auto-negotiation environment, all combinations of these parameters are not certified. Some combinations may affect SMS behavior. Be aware that due to some known features of the eri-driver, setting other then auto-negotiation for the C-network's eri interfaces, will affect the eri interfaces used in the I1 and I2 networks and render these networks unsupported. SolutionMANAGEMENT NETWORKS (MAN)Detailed description of Sun Fire[TM] 12K/15K/E20K/E25K Management Networks (MAN) below. C Network The "world" accesses the SCs through the C Network. This public network is the only network on the SC which uses actual ethernet cables/hubs/switches to connect nodes together on a network.
The function of this network is to provide one or more customer-provided network connection(s) to the platform SCs; here is an example of an ifconfig -a from a CP1500 SC (showing only the C network interface): # ifconfig -a I1 Network The SCs communicate with the platform's domains via the I1 Network. This is also known as the DMAN network. This network uses internal circuitry to connect the eri interfaces on the SC to individual eri interfaces on all of the I/O Boards installed in the platform. Each domain on the platform must have an I1 network path to the SC. Because a fully loaded 15K/E25K can have 18 total single boardset domains, each SC has to have 18 eri interfaces assigned to this network to allow each possible domain have a network to the SCs. See https://support.oracle.com/handbook_private/Devices/System_Board/SYSBD_SunFire15K_SysCtlr.html for a picture of the Sun Fire [TM] 12K/15K/E20K/E25K SC. The asics labled RIO on the bottom of the picture are the I1 interface devices (two are partially hidden from under the daughter card in the diagram). Using internal circuitry, the SCs eri interfaces connect with the domain's eri interfaces. There is no cabling/hubs/switches on this network. Everything is completely contained in the platform.
So, for example, a Domain consisting of IO boards 1,2, and 3, has three total eri interfaces, but only the interface on IO1 will be in use, and at the end of HPOST this interface will be labeled the Golden IOSRAM. The purpose behind the golden IOSRAM is to facilitate a network boot off of the MAN network for that individual domain. What this means is that during the OBP handoff stage, the eri interface which was configured as the golden IOSRAM becomes the man-net device alias, thus a network boot could be made by doing a "boot man-net", and thus the domain would be booting off the dman network using the SC as its boot server (if it's set up to be a boot server). The functions of this network are as follows:
Here is an example of an ifconfig -a from a system controller (showing only the I1 interface): # ifconfig -a Here is an example of an ifconfig -a from a domain (showing only the I1 interface): dman0: flags=1008863 mtu 1500 index 2 I2 Network The SCs communicate with each other via the I2 Network. This is also known as the SCMAN network: it uses internal circuitry to connect the SCs to each other without using any external cables/hubs/switches.
The functions of this network are as follows:
scman1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4 MANAGEMENT NETWORK DAEMON (MAND) The Sun Fire [TM] 12K/15K/E20K/E25K platform uses a combination of hardware and software to provide administrative services to the platform's network configuration. The Management Network Daemon (mand) supports all of the networks in the platform. The primary operations this daemon serves are:
At SC startup mand will be in SPARE mode. During startup, the FailOver Monitoring Daemon (fomd) determines what role the SC will function (MAIN or SPARE). Mand does not switch over to MAIN mode until fomd tells it to do so (if/when fomd prompts the SC to become MAIN). The fomd daemon then starts mand appropriately. The mand daemon, now operating in MAIN mode, checks the contents of the /etc/opt/SUNWSMS/config/MAN.cf file (discussed later in this doc) to create a mapping between the "domain ID" and its IP address in the platform configuration database (pcd). Mand then obtains domain configuration information from the pcd daemon in order to properly configure the scman driver. Next, mand seeks domain change notifications, such as newly added system boards to a particular domain, and as appropriate will then update the scman driver of the changes. Also, mand tracks the active ethernet interface in the scman driver and if changes to that interface occur (such as a failure to one of the scman interfaces), then updates those changes back to the pcd daemon. When a domain is powered on (setkeyswitch -d on) mand relays the system startup MAN network information to that particular domain. The mand does not plumb up actual interfaces on the domain, it simply passes on the network information (includes ethernet and MAN network IP address information) to the dman driver running in the domain, and the dman driver then plumbs up its domain interfaces (This is where the golden IOSRAM interface is used---obtaining the information passed on from the system controller via mand). This process occurs during the domain's boot cycle and during the initial domain OS install. This process is how the domain's dman obtains the main system controller's MAC address. So, to summarize, mand is responsible for updating all the responsible network drivers of the platform's network configuration. Without mand running properly, the entire platform's network is unlikely to continue working properly (certainly no new configuration updates will occur at all). Also, because mand is responsible for assuring that the system controller's remain synched, if it is having issues, it is possible that system clock can become disrupted resulting in a platform wide disruption of service. So, mand is a critical daemon. PATH "POLICING" MECHANISMS C Network The C Network is not "policed" per say via any special inter-platform process or configuration. If IPMP is not configured, the "policing" of the C Network is handled by Solaris. Solaris uses tpe-link-test to perform network connection tests. Responses to this test come from the pieces of equipment connected on the other side of the cable that is attached to the particular C Network interface, such as a router, switch, hub, or other node. Fomd monitors for failures of this link test and if encountered takes appropriate actions if required (failover to the SPARE SC being a possible action). If IPMP is configured, network health checks on the C Network is done via the "test" network interface. This "test" interface is an actual NIC (ex. hme0) which has an IP address used exclusively for testing whether the particular NIC is still active. The system uses the logical interface (ex. hme0:1) to transfer its data to the C Network. If the IPMP "test" network interface fails its health check (typically a ping test to a router or a multicast to the network) IPMP will try to fail over the logical IP to another member of the IPMP group. If the IPMP group is completely unavailable then the Solaris "policing" action takes over. At that point fomd may have to take action and possibly then force a failover to the Spare SC. To see the "policing" in action on an IPMP interface, a snoop of the test interface (ex. hme0) can be executed. On the test interface, the only traffic is the test ping or multicast. Configuring IPMP on this network adds an extra layer of redundancy to the network. This added layer of redundancy may help avoid SC failovers from happening as a result of a switch or hub failure, or bad ethernet cable (assuming the IPMP group itself was configured correctly). I1 Network The I1 Network is "policed" every 10 seconds. Specifically the dman0 interface on the domain pings the system controller via the active domain NIC (golden IOSRAM) every 10 seconds. Also, dman0 checks the inbound packet count every 30 seconds. If the packet count has increased, the connection is considered good. If not, a path switch is initiated and the next available path is selected. The "next" available path would be the next I/O Board's eri interface. In a single I/O board domain, there is not a "next" available path on this network. To see this "policing" in action, simply snoop the dman0 interface and watch out for these 10 second and 30 second interface tests. I2 Network The I2 Network is also "policed" every 10 seconds. Specifically, scman1 on the main system controller pings the spare system controller via its active NIC every 10 seconds. Also, every 30 seconds, scman1 on the main system controller checks its inbound packet count. If the packet count has increased, the connection is considered good. If not, the active path is switched to the other NIC on the main system controller, provided it was not previously marked as failed. Assuming you have both system controllers up and running in the platform configuration, you can see the "policing" in action simply by snooping the scman1 interface on one or both of the system controllers. NOTE: If at anytime a failure is detected with any of these NICs, mand is responsible for updating the pcd and making sure all networks are aware that a particular NIC is offline and that communication has switched over to a different NIC if possible. MAN CONFIGURATION FILES /etc/opt/SUNWSMS/config/MAN.cf Features:
Example (from a CP1500 contained SC called, "sc0" on platform called "15k": $ cat /etc/opt/SUNWSMS/config/MAN.cf /var/opt/SUNWSMS/data//idprom.image Features:
$ sysid -d A /var/opt/SUNWSMS/doors/mand Features:
Example: # file /var/opt/SUNWSMS/doors/mand /etc/hosts Features:
cat /etc/hosts /etc/hostname.* Features:
# cat /etc/hostname.hme0 Example from a Domain: # cat /etc/hostname.dman0 /etc/ipnodes Features:
/etc/netmasks Features:
$ cat /etc/netmasks Worth noting also it that if you do not have a default router for your settings, you will notice it will take longer time for IPMP to start up properly. This may have a negative effect in the essense that the IPMP negotiation might take longer time and SMS might experience difficulties during startup. Internal Comments References: Sun Infodoc 203348 (previously 82102): Sun Fire Supported Ethernet settings for System Controllers Doc 1012140.1 (previously 73002): Sun Fire[TM] 12K/15K: MAN Interface Mapping Doc 1010314.1 (previously SRDB 48123): Sun Fire[TM] 12K/15K: Troubleshooting the I1 MAN Network Sun InfoDoc 212476 (previously 72578): Sun Fire[TM] 12K/15K: Troubleshooting MAN I2 Network Doc 1004615.1 (previously 48144): Sun Fire[TM] 15K: IDPROM layout for OpenBoot[TM] PROM failed The last Document (1004615.1) shows how to recreate lost/corrupted idprom.image files, as does the site: http://has.central.sun.com/starcat/15kinfo/faq/recreate_idprom.image.html To upgrade SC cpu boards: 1006164.1 (previously 48120): Procedural Steps for Replacing a Nordica (CP1500) with an Othello+ (CP2140) To know more of ipmp and it's settings and effects on a system controller, please see 1010640.1 @ 12k, 15k, 20k, 25k, mand, sms, fomd, network, i2, i1, scman, dman, MAN, SC, system controller @ This doc was previously Published as 76814 and 230737 Attachments This solution has no attachment |
||||||||||||
|