To begin the search I considered how VMware would currently address the issue, however did not turn up any real meat in terms of official support or KB articles. Considering they have their own backup product and do not provide much guidance in this area leaves me to believe they recognize the thousand different ways this can be accomplished. Next I searched around the different backup vendor sites and this lead to the same lack of ‘official' information. The information I did find was info from other blogs or lists and as you can guess opinions varied as much as the search results I was typing in Google. Considering there are many ways to accomplish this goal I wanted to find information directly through supportable channels to have a good base for this endeavor.
What would be required if my entire virtual environment were trashed and I had to rebuild from scratch The key requirement would be to create a backup that would save the vCenter database but also the ESXi configs and the specific build numbers. If build numbers are not at least noted then firmware compatibility or specific vSphere builds may introduce issues into the environment. It’s easy to stand up a new fresh environment that is fully patched but this can break stuff.
Let’s consider what specifics we need to account for. The typical components of a typical vSphere environment are vCenter and its database, ESXi hosts, datastore connectivity and network connectivity. If there are other services such as vRealize Operations or vRealize Log Insight these services can be saved and recovered either with a replication technology such as vSphere Replication or with a backup technology, vSphere Data Protection or Veeam. We can also use these tools to protect vCenter however we do not have a guarantee of database consistency.
Starting with vSphere and the database if VCSA we can refer to the KB articles
For vCenter 6
This appears to improve the process by adding an online method of saving the database. If you are using a Microsoft SQL server embedded with vCenter your experience may vary using standard backup tools with MSSQL VSS aware plugins. A sure method is to leverage MS SQL Studio to perform SQL backups. This will use the appropriate VSS provider for consistency and then backup the exported DB backup file. Upon recovery this file can be imported into a fresh vCenter deployment for recovery. If the MSSQL server is dedicated the same method can be used however this architecture has shown more reliable while performing backups using the standard backup processes. Below are some references for MSSQL backups.
MS SQL Database backups
Migrate MSSQL Express (unsupported) to SQL Standard (supported)
Next we need to save the config for the ESXi hosts. Yes, this config can be saved as well. Be sure to save any drivers you may have added outside the standard patches. I’ve noticed over time specific versions of drivers become unavailable so it is important to save these as they may have a dependency with the respective card’s firmware version. This is important due to newer CNAs, 10G, FC adapters and their dependency between firmware to driver versions.
Backup ESXi host config
This provides ESXi build references for use in manually creating baselines for recovery for your current ESXi build level.
References for manually creating update manager baselines.
Another best practice is to keep a current config exported of your vSphere dVswitches. This is the only critical piece in the event of a catastrophic failure that would cause downtime. Sure, you would loose some configs and some historical data but these are not critical to the functionality of the virtual machines running on the hosts. Obviously this is very simplistic and other monitoring, automation, and compliance systems do need to be considered in the grand scheme of the design but this provides a second backup type for this very critical information if all else fails.
Export dVswitch config
In the case where a SLA must be maintained for this data and other management systems a dedicated a management cluster becomes the reference and preferred architecture. This would remove the backup circular dependency created when any backup system attempts to quiesce the vCenter database. This also provides a solid architecture where a highly or hyper-converged architecture is implemented. When management systems are integrated with the hardware being managed there are times when manual juggling is required removing some of the automation SDDC provides. Updating, patching, providing maintenance, and unplanned failures often require this juggling effort. For example if vCenter is running on a host that decides it’s time to reject a stick of ram and PSODs while automation tasks are occurring this will impact these tasks while vCenter is non-functional. Here is a link with some great reference designs.
Bottom line… Since many vendors provide tools to accomplish these tasks of ensuring these management applications are recoverable prudence still is required while merging these technologies together. The community forums of each vendor typically provides real world experience and is a valuable support tool. However always reference release notes and documentation as these provide officially supported architecture, behavior and tips for dependable operation.
Been a little while since my last post. Well... Time to come back after spending some time at a new job.
Some cool things I've come across. For 1 I'm writing this from my phone (the little things in life). Watched a video for Google IO. You should check it out. Also VMware anounced a new cloud platform. This should lend itself for those attempting to create a private cloud beyond simply running virtual servers.
One of the next (and sometimes forgotten) issues after you have virtualized your life is now how do you save it? You could keep performing backups the same way you have for years however I would recommend staggering them as if they all start at the same time you stand the risk of creating I/O contention on your SAN. Now you have an alternative method since your virtual servers now are living in essentially files or possibly a LVM style partition, depending on the technology you are using, let's take advantage of this situation.
Using methods provided by traditional solutions as in Backup Exec with the VMware agent or even looking at newer offerings such as Veeam or PHDVirtual you can achieve successful backups easier then sticking with agent per-server (virtual server in this case) methods. The new style software that specifically supports VMware or Xenserver are agent-less and are gaining features that can either equal or even exceed what physical server backups are capable of. Missing in the physical server world compared to the virtual world is the visibility at a lower level from the volume where the data or files reside you are concerned about. On one side we are dealing with platters inside of a physical disk compared to the virtual side where we can easily see a layer under the operating system's disk. Some of what is built into VMware, and to a lesser extent with other solutions, allows us to intelligently deal with this data.
Bottom line - if you are having trouble getting good reliable backups in the physical world perhaps virtualization can assist along with other cost cutting reasons.
For a new VM that you are creating select:
> Store Virtual disk as a single file.
If you have an existing VM make sure all of the snapshots are deleted (if you have taken any) and do this:
> vmware-vdiskmanager -r sourceDisk.vmdk -t 2 destinationDisk.vmdk
In this case the source disk will be the large VMDK file. After you convert you will need to edit the vmx (text based) file to reference the new vmdk file unless you used the same file name. Obviously you'd have to convert the disk to a new directory in this case or change the name. Once it's converted you will actually see 2 new files, one is the very small text file that defines the raw virtual disk file and the other is the raw virtual disk file itself. DO NOT LOOSE THE TEXT FILE! It is essentially impossible to remake as there is a special code in there that references the large raw file.
If you run the 'vmware-vdiskmanager' itself you can see all the options you can do.
Another tip is use multiple partitions to reduce the level of fragmentation. If you are using Linux format the partition with XFS or ext4. I normally give each partition 3-5 VMs and have partitions of 25-50GB.
Another tip is if you can use RAID 0 or RAID 1 of very fast hard drives. I am using 2 WD Raptor 150GB drives at home. I can run 4 VMs at once running a RAID 0 with 4 GB of physical ram. The key here is not necessairly MB/sec but I/O persec. This is where the 10KRpm drives rival any other SATA drive on the market by far. These disks are 50% faster. However if you use RAID 1 you will not loose too much if you use a quality drive like the WD RE3 1TB drive. This is one of the faster ones on the market. Do not worry about hardware vs. software RAID as the current processors have enough performance to lessen the need for hardware RAID (unless you have the money to burn).
I've also done a little research on whether or not to use Enterprise of 'RAID' type drives. There can be a sight advantage beyond the (in some cases) longer warranty and build quality. RAID supportable drives are designed to intentionally fail and even can send commands back to the RAID controller (software or hardware) telling the state of the drive. A standard disk will attempt retries for a number of minutes (typically 2) before it will announce a failure ultimately confusing the RAID software as it may have already declared the disk FAILED even if the disk recovered. Considering RAID type SATA drives will declare themselves failed in a short period of time (7-10 seconds) if it cannot recover and send the message to the RAID software. This behavior is specifically evident in the Western Digital line but are similar with other manufacturers and may not be a critical reason to choose these disks for home/test.