VMWARE Virtual SAN networking

VSAN networking can be a bit tricky to troubleshoot. Before I go deeper into the topic here is a very important concept to remember about VSAN clusters.

Given any VSAN cluster remember the following:

** “Introduction to Virtual SAN Networking

Before getting into network in detail, it is important to understand the roles that nodes/hosts can play in Virtual SAN. There are three roles in Virtual SAN: master, agent and backup. There is one master that is responsible for getting CMMDS (clustering service) updates from all nodes, and distributing these updates to agents. Roles are applied during cluster discovery, when all nodes participating in Virtual SAN elect a master. A vSphere administrator has no control over roles.”

** from Cormac’s troubleshooting guide

That is a lot to digest but if break it down you can see some key principles about a VSAN cluster to remember.

The roles in VSAN:
A master
B agent
C backup.

There is one master.
If you see more than one master there is something not quite right with you VSAN CLUSTER.

The VSAN admin does not control which node will be the master.

Log into each node of a three node VSAN. The normal pre-req for troubleshooting make sure ssh is enabled.

Run the following command on each node:
~ # esxcli vsan cluster get

Cluster Information will output below.

Cluster Information
Enabled: true

Current Local Time: 2015-03-30T22:38:38Z
Local Node UUID: 55197cee-f530-4966-5ea6-a0369f58b8e4
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 55197cee-f530-4966-5ea6-a0369f58b8e4
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 551374b5-03f9-7bd6-6257-a0369f58b8e8
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member UUIDs: 55197cee-f530-4966-5ea6-a0369f58b8e4
Sub-Cluster Membership UUID: a5ce1955-f5e5-5663-d338-a0369f58b8e4

Node 2
~ # esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2015-03-30T22:38:38Z
Local Node UUID: 55197cee-f530-4966-5ea6-a0369f58b8e4
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 55197cee-f530-4966-5ea6-a0369f58b8e4
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 551374b5-03f9-7bd6-6257-a0369f58b8e8
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member UUIDs: 55197cee-f530-4966-5ea6-a0369f58b8e4
Sub-Cluster Membership UUID: a5ce1955-f5e5-5663-d338-a0369f58b8e4

Node 3
~ # esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2015-03-30T22:56:46Z
Local Node UUID: 54f9dc6f-8674-f412-364d-a0369f58b5a8
Local Node State: BACKUP
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 551374b5-03f9-7bd6-6257-a0369f58b8e8
Sub-Cluster Backup UUID: 54f9dc6f-8674-f412-364d-a0369f58b5a8
Sub-Cluster UUID: 551374b5-03f9-7bd6-6257-a0369f58b8e8
Sub-Cluster Membership Entry Revision: 1
Sub-Cluster Member UUIDs: 551374b5-03f9-7bd6-6257-a0369f58b8e8, 54f9dc6f-8674-f412-364d-a0369f58b5a8
Sub-Cluster Membership UUID: d6da1955-e2f8-38eb-d7f0-a0369f58b8e8

See the image below for the error seen in the web client.

From the output above can you see the problem?



Technical notes on IP Based Storage Guide for EMC VNX and VMware 5.5

When it comes to IP based storage there are two major choices for use in VMware environments. NFS and iSCSI

This article will discuss iSCSI options.

NFS is a very valid design choice with very flexible options for deployment. NFS for use with VMware is Fast and Flexible in other words a solid choice. But NFS is for another time and a different discussion.

Why IP Based storage??

In a single word: SPEED and Flexibility. Okay two words. Core networking speeds are no longer limited by 10baseT, 100baseT but 10 Gigabit Ethernet is more so the standard. You already have IP based network, why not try to leverage what is installed?

I have seen greater interest in deployment of iSCSI based storage for VMware lately. Not just 1 gb but 10 ‎Gigabit Ethernet is gaining more of a foothold for DataCenters, as customer upgrade their core networking from 1 Gb to 10Gb. The potential for performance gains is really a good thing, but fundamental to deployment is to consider a solid network design.

What I mean is end-to-end. What kind of switch are you using? Are you limited by a certain number of 10 gb ports? How many switches do you have. Do you have a single point of failure? This is more critical as you will be really leveraging your network for “double duty”. Instead of a discrete network designated for storage such as FC can provide you will now run storage information across your IP network. Ideally two separate physical switches are ideal. BUT at a minimum use VLANS for logical separation. And let me go ahead and say it… “VLAN 0 (zero) is a not really a scalable enterprise option!” That is a huge red flag you will require more network analysis and work to deploy a IP based storage solution.

There are many considerations for a successful iSCSI implementation.

1) Gather Customer Requirements 2) Design  3) Verify 4) Implement/ Deploy 5) TEST User Acceptance Testing

Ideally having two 10Gb switches for redundancy is a good thing! Be careful in the selection of a Enterprise grade switch. Have seen horrible experience when improper features are not enabled. i.e. flow control can be a good thing!

Software based iSCSI initiators. Don’t forget about Delayed ACK. See Sean’s excellent article here: Delayed ACK setting with VNX. Read more VMware details about how to implement Delayed ACK from KB1002598

“Modify the delayed ACK setting on a discovery address (recommended):

  1. On a discovery address, click the Dynamic Discovery tab.
  2. Click the Server Address tab.
  3. Click Settings > Advanced.”

Double, no TRIPLE check the manufacture recommended CABLE TYPE and LENGTH. For example: Does your 10GbE use fiber optic cable? Do you have the correct type? What about the SFP?  And if you are not using fiber optic, but choose to use TwinAX cabling. Do you have the correct cable as per manufacture requirements?

For example: Meraki, only makes and supports a 1 Meter 10g Passive Copper Cable for their switches.  If you look at any normal Cisco Business Class switch they support 1, 3, 5 meter passive and 7, 10 meter active cables on their switches. 

Active cables are generally more expensive, but could be a requirement depending on your datacenter and or colocation layout.

I try to approach the solution from both ends. Storage to the Host and the reverse Host to the storage.  Examine end-to-end dependancies. Even though your area of responsibility isn’t the network, you will be reliant on the network services and any misconfiguration will impact your ability to meet the design requirements stated. You may not have bought or had any input to the existing infrastructure but you will be impacted by what is there currently in use. How will you Keyword: Interoperability how each independent system will interact with another system. Upstream and downstream dependencies.

Other considerations:

For example: vmkernel port binding: The diagram below is from VMware KB 2038869 “Considerations for iSCSI Port Binding”

Port binding is used in iSCSI when multiple VMkernel ports for iSCSI reside in the same broadcast domain and IP subnet to allow multiple paths to an iSCSI array that broadcasts a single IP address. When using port binding, you must remember that:

  • Array Target iSCSI ports must reside in the same broadcast domain and IP subnet as the VMkernel port.
  • All VMkernel ports used for iSCSI connectivity must reside in the same broadcast domain and IP subnet.
  • All VMkernel ports used for iSCSI connectivity must reside in the same vSwitch.
  • Currently, port binding does not support network routing.”


While there isn’t FC zoning for IP based storage there will be a requirement for subletting and VLAN separation.

For VNX here are some design considerations for iSCSI design:

The following points are best practices for connecting iSCSI hosts to a CLARiiON or VNX:

  • iSCSI subnets must not overlap the management port or Service LAN (128.221.252.x).

  • For iSCSI, there is no zoning (unlike an FC SAN) so separate subnets are used to provide redundant paths to the iSCSI ports on the CLARiiON array. For iSCSI you should have mini-SANs (VLANs) with only one HBA per host in each VLAN with one port per storage processor (SP) (for example, A0 and  B0 in one VLAN, A1 and  B1 in another).  All connections from a single server to a single storage system must use the same interface type, either NIC or HBA, but not both.

  • It is a good practice to create a separate, isolated IP network/VLAN for the iSCSI subnet. This is because the iSCSI data is unencrypted and also having an iSCSI-only network makes troubleshooting easier.

  • If the host has only a single NIC/HBA, then it should connect to only one port per SP. If there are more NICs or HBAs in the host, then each NIC/HBA can connect to one port from SP A and one port from SP B. Connecting more SP ports to a single NIC can lead to discarded frames due to the NIC being overloaded.

  • In the iSCSI initiator, set a different “Source IP” value for each iSCSI connection to an SP.  In other words, make sure that each NIC IP address only appears twice in the host’s list of iSCSI Source IP addresses: once for a port on SP A and once for a port on SP B.

  • Make sure that the Storage Process Management ports do not use the same subnets as the iSCSI ports – see [Link Error:UrlName “emc235739-Changing-configuration-on-one-iSCSI-port-may-cause-I-O-interruption-to-all-iSCSI-ports-on-this-storage-processor-SP-if-using-IP-addresses-from-same-subnet” not found] for more information.

  • It is also a best practice to use a different IP switch for the second iSCSI port on each SP. This is to prevent the IP switch being a single point of failure. In this way, were one IP switch to completely fail, the host can failover (via PowerPath) to the paths on the other IP switch. In the same way, it would be advisable to use different switches for multiple IP connections in the host.

  • Gateways can be used, but the ideal configuration is for HBA to be on the same subnet as one SP A port and one SP B port, without using the gateway.

For example, a typical configuration for the iSCSI ports on a CLARiiON, with two iSCSI ports per SP would be:

A0: (Subnet mask
A1: (Subnet mask
B0: (Subnet mask
B1: (Subnet mask

A host with two NICs should have its connections configured similar to the following in the iSCSI initiator to allow for load balancing and failover:

NIC1 (for example, – SP A0 and SP B0 iSCSI connections
NIC2 (for example, – SP A1 and SP B1 iSCSI connections

Similarly, if there were four iSCSI ports per SP, four subnets would be used. Half of the hosts with two HBA would then use the first two subnets, and the rest would use the other two.

The management ports should also not overlap the iSCSI ports. As the iSCSI network is normally separated from the LAN used to manage the SP, this is rarely an issue, but to follow the example iSCSI addresses above, the other IP used by the array could be as following examples:

VNX Control Station 1:
VNX Control Station 2:
SP A management IP address:
SP B management IP address:

The High Availability Validation Tool will log an HAVT warning if it detects that a host is connected via a single iSCSI Initiator. Even if the initiator has a path to both SP’s it is still at HA risk from a host connectivity view. You will also see this if using unlicensed PowerPath.

Caution! Do not use the IP address range 192.168.1.x because this is used by the serial port PPP connection

Oh.. I haven’t even discussed VMware storage path policy, as that would really be dependent on your array. However VNX is ALUA 4 and RoundRobin works really well! If you don’t have or want PowerPath as an option!


VMware Storage Guide 5.5 (PDF)

VMware Storage Guide 6.0 (PDF)

“Best Practices for Running VMware vSphere on iSCSI” (TECHNICAL MARKETING DOCUMENTATION v 2.0A)

“Using VNX Storage with VMware vSphere” EMC TechBook

IBM Firmware – not too fun

It doesn’t matter what vendor you choose you have to deal with firmware for servers.



How did this happen? with a lot of trial and error.

I first tried the GUI utility “IBM ToosCenter Bootable Media Center”. Be careful with this tool if you change the workdir.. make sure you still have workdir in your path or the ISO image will fail to write.

I tried versions:

I tried to write to USB directly, write to CD, create iso file. The boot image failed to boot beyond GRUB loader.


After several trials and errors I had to resort to update via IMM interface. To summarize the process. You will update the IMM firmware by uploading the exe or bin files. Then you will restart the IMM). The other firmware next and lastly the DSA.  Be sure to document your IMM settings, either by backup IMM config to file or screenshots of IP addresses. Also make sure your java is up to date. The IMM after the update, complained of cookies needed in the browser, even though that setting is on by default in firefox.

Good Luck

** update. Don’t forget to reboot the server and watch the boot process. Some parts of the firmware upgrade is trickier than others. But then I was going almost a 5 year gap for firmware dates 2010-2014.

After several reboots the last firmware was applied. This was verified in the IMM
– look under the VPD Vital Product Data firmware section.

Joining VCSA 5.5 to AD Domain with Secure Token Service (STS)

The easiest choice is:

1. Active Directory with (Integrated Windows Authentication)

a. Use the Machine name.

” If you’re adding AD authentication, simply make sure the VCSA is added to the domain, then use Integrated Windows Authentication using the computer account. Couldn’t be simpler.”

Normally, you would do the above.

I had some problem with this as the error messaged stated the VCSA was improperly joined to the domain. I had to remove and rejoin, without success. So eventually I explored another method.


Following KB: 2058298 “Creating and using a Service Principal Account in vCenter Single Sign-On 5.5”
Service Principal Account (SPN) is a new feature in vCenter Single Sign-On (SSO) 5.5. The SPN account acts as the Secure Token Service (STS) for token issuing.
This article provides steps to configure and use a SPN when creating an Active Directory Identity Source for SSO 5.5.
1. verify domain
C:\>echo %UserDNSDomain%
You see output similar to:
Type setspn -Q sts/DNS_domain_name and press Enter. This verifies that no other SPNs have been created on this domain.
For example:
C:\>setspn -Q STS/child-domain.vmware.com
You see output similar to:
No such SPN Found.
Note: If a SPN is found, consult your Active Directory administrator.
(Here I created a SSOServiceAccount set to domain admin)
Next step is to setspn
C:\>setspn -S STS/child-domain.vmware.com SSOServiceAccount
From here you “Set the Active Directory Identity Source with SSO 5.5”
Creating an Active Directory Identity Source for use with SSO 5.5

To create an Active Directory (Integrated Windows Authentication) Identity Source:
Log in to the vSphere Web Client as administrator@vsphere.local or as another user with SSO administrator privileges. The default vSphere Web Client URL is:


Navigate to Administration > Single Sign-On > Configuration.
In the Identity Sources tab, click the Add Identity Source icon (Add Identity Source icon) under the option menu.
Click Active Directory (Integrated Windows Authentication).

Select the Use SPN option.
Enter this information:

Domain name: DNS_Domain_name
Service Principal Name (SPN): STS/DNS_Domain_name
User Principal Name (UPN): Domain User assigned SPN@DNS_Domain_name.com
Password: Password

For example:

Domain name: child-domain.vmware.com
Service Principal Name (SPN): STS/child-domain.vmware.com
User Principal Name (UPN): SSOServiceAccount@child-domain.vmware.com
Password: WelcomeToSSO55

And there you have it..you can now log onto SSO and you will be able to see the AD you joined in the SSO. Delegate SSO Admin Rights (in the web client “vCenter Users and Groups”. Add AD groups to Administrator group.

How to “fix” VCSA IP settings from command line.

More and more often customers are looking for an easier method to deploy their vsphere management.

Vcenter traditionally has been an application loaded on top of Windows. .. but “the times they are a changing”

There are more use cases that the business requirements will allow for deployment of vcenter appliance.

But here is a quick post to help you “fix” your IP configuration for your appliance. Sometimes during the deploy of the VCSA OVA there is a miss communication or fat finger incident.. Here is how to address that.

It also allows you to change hostname, DNS, default gateway and proxy.


Open a console session of the VCSA
Login as: root
Default password is: vmware
Execute the following command: /opt/vmware/share/vami/vami_config_net


 Main Menu

0)    Show Current Configuration (scroll with Shift-PgUp/PgDown)
1)    Exit this program
2)    Default Gateway
3)    Hostname
4)    DNS
5)    Proxy Server
6)    IP Address Allocation for eth0

After executing the command, a menu is displayed. Within the menu It is possible to change the IP address, hostname, DNS, Default gateway and proxy server.
After allocating a static IP Address to the VCSA, the post configuration can be done by using the following URL:



VCSA was powered on.

ping was not responsive

Verified IP address

cat /etc/sysconfig/networking/devices/ifcfg-eth0 showed

cat /etc/sysconfig/networking/devices/ifcfg-eth0

EMC Elect 2015

347028-graphic-EMC+Elect+2015-hires.jpgI am truly honored to be chosen as a recipient of this recognition. There is always so much change happening in the world of storage. It doesn’t matter who you are a home consumer, SMB business, or large Enterprise Business. Understanding how that  change impacts you, your work, and your lifestyle.. that is the impact. That is the empowerment and that is the difference.

Old, new and not even on silicon.. Sharing about what technology does and doesn’t do … makes a difference in what technology can do and WILL do. That is a positive feedback loop. Contributing to advancement.

Here is the official announcement link:


But finally the EMC Elect of 2015 were selected. Out of the 450 nominations leading to 200 finalists, the 102 official directory of its members for 2015 in alphabetical order are: …”


VNX and NFSv4

Just a note to self: (Actually when discussing NFS with your customer)

If you are using VNX make sure you use OE 7.1 and greater. Why??

NFS4 is enabled by default but just not turned on!!

$ server_nfs <movername> -v4 -service -start where: <movername> = name of the Data Mover

There are other considerations to implement NFSv4

  • NFSv4 Domain
  • Access Policy: Mixed is recommended
  • Delegation Mode off
  • You can even restrict access to NFSv4 only, as normally a file system is exported to all versions of the NFS protocol

Please see:

EMC White Paper: h10949-configuring-nfsv4-vnx-wp.pdf

HOT HOT HOT… Hot Spare that is! VNX VNX2

The other day I had a customer purchase a brand new DAE for his VNX.. awesome.. A full shelf of 25 drives.  900 GB SAS drives 2.5″ form factor.  Well do some quick math.. you have 5 R5 groups (4+1)

But… what a sec.. What about hot spare? You can run parity and have R5 for protection.. but you still need to be in compliance with your hot spare policy.  This customer has the older 3.5″ DAE (15 slots) and the newer drives are 2.5 ” .. what to do..

Will you have a valid hot spare on hand?

After some online research:

Based on the discussion and the reference white papers for both VNX and VNX2 – drive size isn’t of importance. The other factors are: drive type and density. VNX2 is global and won’t take into consideration the drive speed, so you could potentially have slower speed drive of same type for a replacement. This is unknown to the Admin as the policy is set differently.

https://community.emc.com/thread/123197?start=0&tstart=0  — A great discussion about this and a fantastic resource for EMC related issues. The following is take from the above thread.
Hot spare algorithm
The appropriate hot spare is chosen from the provisioned hot spares algorithmically.  If there were no hot spares provisioned of appropriate type and size when a drive fails, no rebuild occurs.  (See the Drive Rebuilds section.)  The RAID group with the failed drive remains in a degraded state until the failed drive is replaced; then the failed drive’s RAID group rebuilds.
The hot spare selection process uses the following criteria in order:
  1. Failing drive in-use capacity – The smallest capacity hot spare drive that can accommodate the in-use capacity of the failing drive’s LUNs will be used.
  2. Hot spare location – Hot spares on the same back-end port as the failing drive are preferred over other like-size hot spares.
  3. Same Drive type – Hot spare must be of the same drive type.
Failing drive in-use capacity
It is the in-use capacity of the failing drive’s that determines the capacity of the hot spare candidates.  Note this is a LUN-dependent criterion, not a raw drive capacity dependency.  This is measured by totalling the capacity of the drive’s bound LUNs.  The in-use capacity of a failing drive’s LUNs is not predictable.  This rule can lead to an unlikely hot spare selection.  For example, it is possible for a smaller capacity hot spare to be automatically selected over a hot spare drive identical to, and adjacent to the failing drive in the same DAE.  This occurs because the formatted capacity of the smaller, hot spare (the highest-order selection criteria) matches the in-use capacity of the failing drive’s LUNs more closely than the identical hot spare drive.
Note that a hot spare’s form factor and speed are not a hot spare criteria within the type.
For example, a 3.5” format 15K rpm drive can be a hot spare for a failing 2.5” 10K rpm SAS drive.
For the VNX2


Bottom line this is good to know because the customer had open slots in their existing 15 slot 3.5 ” DAE and if drive form factor did matter they would need to buy another DAE for the 2.5″ drives!

Here is the Hot Spare drive matrix. It illustrates the Failed drive and compatible spare.


VExpert 2015 announced


vExpert 2015

VMware has announced the vExpert list for 2015. Each year I read the FANTASTIC information shared by all the vExperts and I am always learning something new. This year I made a huge effort to share more and do more to vocalize all things VMware to educate my customers and wow.. I am truly honored that I was included in the list this year.


Thank you everyone for all your support!



There is almost so much going on it is difficult to find enough time to attend every session that is of interest.

The challenge is if your interest falls in multiple areas.. then more often than not the in-depth sessions will have conflicts.



VMware PEX (Partner Exchange) is a much different venue than VMWorld.

It isn’t that the topics are that much different. There are core areas: Virtualization, EUC, BCDR, SDDCU, Hybrid Clouds,

But hands on opportunity to talk to engineering staff and product management. You never know who you run into.

* Of course there are discounted exams, hands on training, Networking opportunities etc.