Synology VMM Pro - testing failover virtualization cluster

Today any small company has its own IT resources, which grow along with the growth and development of the company. One or two services can still be kept on a dedicated server, but with the growth of file traffic and the number of jobs, virtualization becomes a prerequisite for work. Starting with free hypervisors, you will immediately encounter certain difficulties, the most innocuous of which will be the complexity in configuration, which is inherent in absolutely all solutions: ESXi, Hyper-V, Xen, Proxmox. the list goes on. It's amazing that even in 2018, software giants with billions of dollars in turnover do not even bother with a more or less decent Web-based management interface. If you want to put 16 hard disks in the server, of which 6 are for backups of working computers, and 10 for virtual machines, get ready for the worst - only Proxmox has built-in array management, and even then it would be better if it did not exist and for Windows Server 2016 in the maximum edition.

Not every company needs a cluster that switches virtual machines between nodes in 1-3 seconds, and not every customer has the same budgets for software as banks and oil companies, but many companies need, so that virtualization is simple and reasonably reliable, and configuration and maintenance can be done independently, without the involvement of `` certified specialists ''. It is for this reason that we warmly welcome the trend of virtualization on NAS-ah and believe that this is a real alternative to hardware hypervisors in the field of small business. In our article on virtualization on Synology NAS we talked about the usability of you can forgive the lack of some advanced features, and today we are testing the Pro version of the hypervisor with cluster, migration and fault tolerance support.

Thousands of copies have been broken in the debate over the choice of a virtualization platform, and the result always depends on why you choose this or that solution. Today the leadership in the field of hypervisors is shared by VMware with vSphere, better known as ESXi, and Microsoft with their Hyper-V. For large corporations with a fleet of hundreds of servers, these solutions have proven themselves to be the best, but when you try to apply them for a small company, difficulties pour in like a cornucopia, and here are just a few of them:

  • ESXi up to version 6.7 (2018) - does not have disk array management tools, does not have full hardware monitoring, not compatible with all hardware, and the free version is limited only the ability to run virtual machines without online migration. To manage built-in arrays, users install FreeNAS, QuantaStor or any other OS as a virtual machine and from there forward iSCSI volumes to the hypervisor storage. This is a mega-curve solution with huge Overheads, crazy control over different operating systems, which still works.
  • Hyper-V - Microsoft reluctantly makes support for Linux guests, but there is no full compatibility there. Under Windows 10, the hypervisor may simply not start, even if it started normally yesterday, and for normal operation with disks, you need to buy Windows Server 2016 Datacenter Edition to create iSCSI targets.
  • XenServer (7.5) - thanks to massive PR, it seems that this is a free hypervisor, but its free version, like ESXi, does not have any "adults"; functions, and even the control interface is installed as a separate application.

That is, there are many hypervisors, but to one degree or another, you will have to attract third-party certified specialists everywhere and pay expensive license fees. Against this background, the path that Synology is following looks like a road to a brighter future: in one Web window you have your storage system with full component monitoring, and a flexible system of updates, and notifications by E-Mail/SMS/Push, and all this planted on the modern BTRFS file system, where 10 virtual machines can take up disk space as one. Moreover, some applications such as video surveillance or iSCSI targets can be launched both at the bare-metal level and in a virtual environment with the same management interface, with the same authorization scheme and the same functions.

Of course, Synology software works only on Synology NAS, and the initial investment in infrastructure is implied by itself, but the power of Intel Xeon E5 processors used in the older Rackstation and Flashstation series is already enough for not very complex databases, both for the 1C infrastructure, and for hundreds of other applications under Windows, and for quickly launching simple programs under Linux there is support for the Docker environment. You can start configuring the same virtual machine on your home DS918+, and then migrate to the powerful FS3017, in a high workload area where the disk system requires hundreds of thousands of I/O operations per second, where fault tolerance and redundancy are required - all this already available not only for small businesses, but even for home users.

However, until recently, opponents of virtualization on NAS-ah said that they say it is impossible to create a failover cluster on such devices. To refute this, we took three completely different storage systems from Synology: the top-end Flashstation FS3017 flash array, the Rackstation RS18017xs workhorse, and the desktop Diskstation DS918+, which is bought by both home and small office. I bet there was no more heterogeneous cluster in the world, with different network interfaces, but we will experience all the difficulties and hardships in the testlab in order to break the prevailing stereotypes that virtualization is difficult and expensive. Let's go!

Glossary of abbreviations in this article:

  • DSM is the operating system of Synology DiskStation Manager
  • VMM (Pro) - a virtualization package that allows you to run virtual machines on a NAS
  • High Availability (H.A.) is a high availability mechanism that allows you to survive the failure of one of the NAS in the cluster
  • BTRFS is one of the most advanced file systems inside Synology NAS

1. One license for the entire cluster

Synology has an amazing idea for licensing VMM Pro: you buy activation for the entire cluster, which can consist of 3 or 7 devices, it doesn't matter which ones and with how many processors, cores and hard drives and with how many virtual machines. VMM Pro activation occurs on one device that creates a cluster, and at any time you can transfer the license to another NAS. If the acceptor is in the same cluster as the donor, then all settings will be saved, and if not, the old cluster will crumble into independent nodes, but the virtual machines will continue to work, there will simply be no advanced functions for them.

The NAS on which the license is activated cannot be removed from the cluster, so if it needs to be decommissioned, you must first activate VMM Pro on any other node, otherwise the devices in the cluster are absolutely equivalent, and manage all Synology virtual datacenters can be accessed from the interface of any of them.

2. Migrating virtual machines

Migration of virtual machines can be done according to the principle of "separate computational part - separate file part", that is, you can send the virtual machine to work on the processors of node "A" while she herself lies on the disks of node `` B '' and from there it will be read, or you can transfer both calculations and files to one node. This is very convenient, because, for example, space on a 12-disk Rackstation RS18017XS + is cheap, but there is one Xeon-D 1531 with six cores, and space on the Flashstation FS3017 flash array is expensive, but there are already two 6-core Xeon E5-2620 v3, so why not take advantage of the 10 Gigabit interconnect and optimize the compute and disk capacities within the cluster?

In our test, we first put Windows 10 x64 build 1807 virtual machine on Flashstation FS3017, and then gave the role of a NAS computer to Rackstation RS18017XS +. Running Atto Disk Benchmark on the virtual machine with the target `` drive C: '' to see if the acceptor's disk pool would be used? The result showed that the computing node does not perform local disk operations, and all the benchmark activity fell on the FS3017 drive pool and the network connection.

Migration of the computing part occurs without stopping the virtual machine, and even when connecting to the virtual desktop via the RDP protocol, the connection does not break, but you need to set the processor compatibility mode and VirtIO SCSI driver for disk subsystem. Of course, it should be borne in mind that Synology VMM does not yet have dynamic resource allocation, so the acceptor must have no less RAM and vCPU resources than the virtual machine uses. To show you visually how the virtual machine is transferred between compute nodes, we ran the following test.

OCCT processor stability test was run under Windows 10 x64 guest operating system. Some time after the start, the virtual machine was given a command to transfer, and by the processor load in the DSM interface, we can track the very mechanics of the migration process. First, the network traffic between the donor and the acceptor sharply increases, apparently the cluster synchronizes the file structure of the virtual disk, and then at some point the load on the donor's CPU decreases, as the instructions already being executed are synchronized. As soon as this process completes, the load on the acceptor processor increases sharply, and the migration is complete!

We ran several other tests - playing media content via RDP, archiving files, working with documents in Word, and the OCCT test somehow showed the migration process, and in other cases it was completely invisible.

True, you cannot transfer the file part of the virtual machine without stopping, although if you have a BTRFS file system with snapshot replication, there is no particular difficulty in synchronizing, and if Synology sets a goal to make this function, it will. True, for other hypervisor developers, migration of the VM file part is one of the most expensive licenses.

3. Virtual machine cloning is the power of BTRFS

Guess how long it takes to create 10 copies of a 74GB virtual machine? Approximately 15-16 seconds! This is exactly how long it takes to register clones in the hypervisor, and disk copying occurs instantly, because thanks to the Copy On Write principle implemented in the BTRFS file system, our storage system will not physically copy every byte to a new place, but will create records for the data blocks that they were copied - a kind of symbolic links that take almost nothing on the disk. Instead of the set 770 GB, these copies will occupy the same 77 GB on the disk, and if compression is enabled in the volume settings, then even less. Of course, when we launch each virtual machine, and it starts updating and downloading its swap file, then the space they occupy will grow, but only by the amount of the changed data, and if we have a virtual desktop structure (VDI) stored on our NAS, then the space savings can be ten- or hundredfold, because most of the Windows system files do not change for years.

Unfortunately, cloning only works for the virtual machine when it is turned off, so it cannot be considered a full replacement for snapshots.

Let's conduct an experiment on a simple Windows 10, in which we have saved 50 GB of good serials. We clone it in 10 copies and look - the available disk space has not changed, and our virtual structure still occupies the same 77.6 GB .

  • Let's start all created clones and see how the occupied disk space has changed. We see that about 300 MB have changed in total, and now 10 copies take up 77.9 GB!
  • Let's remove the serials folder from one virtual machine, lightening its weight by 50 GB. The space occupied has not changed, because the same series are available in 9 more copies and, in fact, continue to be stored on the disk pool.
  • Let's write 59 GB of documentaries to this virtual machine. We see that the occupied volume of 10 clones has changed by just 59 GB, and is now 138.3 GB
  • And if now we remove documentaries from the virtual machine, then we will not return those 59 GB back - the virtual image of the machine has grown to 97 GB and differs from the clones on documentaries that have been on the desktop.
  • We are not satisfied with this, because why would we waste space on empty files? We stop the virtual machine, go to the settings of its storage and tick the box “space reorganization”. I would like the virtual image to shrink to a real 14-15 GB, which is occupied by an empty Windows 10 without movies and TV shows, but a miracle does not happen ... we wait, wait a long time and see that the space starts to decrease a little, and the free space - increase. Yes, this is a very long process, going at a speed of about 200 megabytes per minute.

My opinion: virtual machine cloning is excellent. Fast, economical and simple: this is why such intelligent file systems as BTRFS are created, that's when you see the Copy On Write mechanism with your own eyes.

4. Snapshot and restore to clone

Since we're on the subject of BTRFS, it's time to take another look at the snapshots, but this time for images of virtual machines, which are made here in about 1-2 seconds. We are already so used to snapshots that we begin to evaluate them from the point of view of the convenience of the interface and argue that this is a real replacement for a local backup (no one has canceled backup to the cloud), and Synology VMM Pro also has an extremely convenient snapshot recovery function as a new virtual machine.

That is, if during application development you need to roll back and look at the changes, you can raise the same virtual machine, but in the previous version, and then choose which one to work with - with the new or old .

You ask, how is space saved when restoring from a Snapshot copy? Will the new virtual machine take up the full amount of disk space, or will the BTRFS file system not `` weigh '' nothing? The correct answer is that the clone will take up exactly as much space as the data on the guest's disk has changed since the moment the snapshot was taken.

Snapshot replication is configured for each virtual machine separately, in accordance with the protection policy. In the hypervisor, on the `` protection '' tab you can configure the frequency of snapshots and select the storage from the cluster where the snapshot backups will be stored. The level of protection is displayed by a simple indicator, and on the corresponding tab you can see if everything is configured for backing up virtual machines: how many recovery points there are, whether there are snapshots on remote nodes and whether a schedule is configured.

5. How to configure a failover cluster

We're used to Synology developers having their own way of looking at things, and VMM Pro is no exception. To create a failover cluster, we need at least three NAS, each of which has two roles: storage and compute node. So, we have already made sure that the virtual machine can be stored on one NAS (storage node), and run at the same time on the second (computational node), and now the most interesting thing: high availability is achieved only between computation nodes, and unfortunately this is not the only limitation of the cluster.

For communication between nodes in a cluster, one dedicated network interface (ETH or Bond) is used with a fully mesh topology (all nodes are connected to all), which must be in a different address space than other network interfaces. In practice, this means that the intra-cluster network must be tied through a network switch, and you cannot connect the NAS directly to each other.

This is how it looks in the diagram:

Scheme cluster settings

Let's take our test virtual machine and migrate its files to Diskstation DS918+, leaving the powerful Flashstation FS3017 as the computing node. Due to the lack of dynamic resource allocation, we have to count not only the CPU cores and memory used by the virtual machines, but also those stored in reserve in case of a failover, therefore, of all our virtual clones of Windows 10 (from point 3), I will enable fault tolerance only for one , and also for Virtual DSM.

The hypervisor shows that the High Availability mode is activated, and the hands themselves reach out to simulate the accident. Turn off the power from Flashstation FS3017 and see how first Virtual DSM, and then Windows 10 started on the backup Rackstation RS18017XS + with the same IP addresses.

Inspired by the excellent migration from point 2, I would like to see the same smooth transition between nodes in emergency mode. We return the cluster to a working state, start virtual Windows 10, connect to it via RDP and open the wordpad, then we cut off the network ports from the flash array and ... the RDP connection disappeared, appeared after about a minute, but we have an empty desktop. - Windows 10 booted from scratch.

We bring the cluster back to normal mode by connecting the network to FS3017, and what do we see? An empty desktop, which means that the fault tolerance mechanism here works in Active-Passive mode, and does not synchronize the memory of guest operating systems, as it was during migration. That is, it is not just what is running in the virtual machine now that is protected, but what was written to the disk, which is typical for Cost-effective solutions. This is easy to check if you replace Wordpad with Word, because this text editor periodically flushes temporary files to disk. Yes, if we have a backup operating system boot from scratch, opening Word, we can restore the files, at least in the form in which they were a few minutes ago.

What happens if we disable DS918+ in the cluster as a storage server? What should be in this case - all guest machines will stop with the error `` storage is not available '', but here you can improve reliability by configuring storage replication to another NAS in the cluster, and in case of a disaster, manually recover from a snapshot and connect the new node to as the cluster storage.

Obviously, the term "fast recovery" is very suitable for fault tolerance in Synology VMM Pro, because here, when a node falls out, we see the virtual machines stop and automatically start from the state "a few minutes before accident. ” In this mode, in case of emergency shutdown of one of the nodes, the downtime of services will be 2-3 minutes, which is quite acceptable for small business applications. Again, no one bothers to configure programmatic replication at the application level such as MySQL.

6. DSM Virtualization - Setting Up High Availability and Recursive Thinking

If we're talking about High Availability (HA), the most requested feature among system integrators, Synology has two of them. The first is the high availability of the NAS itself and all of its file resources. This feature has been around for several years now, configurable between two business-grade NAS, and applies to all files and applications, but not virtual machines.

The hypervisor has its own High Availability system, and it & hellip; conflicts with the above, so you have to choose - either a fault-tolerant NAS, or virtualization, which becomes highly available only with the purchase of a VMM Pro license. Yes, you read that correctly - even a simple installation of the VMM hypervisor disables high availability in DSM 6.2. However, you can run DSM 6.2 itself as a virtual machine inside the VMM Pro hypervisor installed in DSM 6.2. and configure high availability for it. Initially, this looks somewhat unusual and goes against the principle of managing the entire ecosystem from a single window, but there is no other choice yet.

Configuring iSCSI, NFS and CIFS resiliency is much easier than explaining how to do it. My task is to make sure that in a cluster of 3 devices, in case of any breakdown, we save our files in shared folders and iSCSI LUNs. Given that the VMM high availability scheme implies a single point of failure in the form of a storage server, we will configure fault tolerance and replication of all data to one of the compute nodes.

So, we have NAS Synology FS3017 running file services via CIFS/Samba, NFS and iSCSI, and connecting two NAS for virtualization we want to create fault-tolerant access to all disk resources of our infrastructure ... Everything is already configured and working on FS3017, so it is highly desirable for us to move all DSM 6.2 services to the virtual world with minimal downtime, without losing accounts and user access rights to shared resources.

In the VMM Pro hypervisor with the configured cluster, go to the "image" tab and click Add. Select the bottommost item & quot; download Virtual DSM & quot; and wait for the download to finish. Now in the tab `` virtual machine '' create a new one by selecting the item `` DSM ''. For virtual DSM we will give 1 processor core, 4 GB of memory and as much disk space as was used by FS3017.

After launching the virtual DSM, it will be assigned a new IP address, which is better not to touch for now, but to configure administrative access. We will transfer the settings of users and shared folders to the virtual machine through the configuration import/export mechanisms and replication, and if you also need fault-tolerant video surveillance, then install and configure Surveillance CMS using our article.

Now you can enable high availability mode for VirtualDSM in Virtual Manager Pro by swapping the IP addresses of the physical DSM on FS3017 and its virtual copy. It is just the moment of replacing addresses that will determine the downtime of the file service when moving to a virtual environment, but this is a matter of 1-2 minutes, and as soon as this is done, then for all questions regarding backups, shared file storage and iSCSI partitions, answer there will be Virtual DSM. After making sure everything went smoothly, you can delete all files from shared folders and iSCSI on the physical Flashstation FS3017.

The next step is to configure fault tolerance, replication and storage of the Virtual DSM image as follows:

What have we done? We migrated the VirtualDSM file storage to Diskstation DS918+, made Rackstation RS18017xs + responsible for the service, Flashstation FS3017 acts as a backup for it, where Virtual DSM itself is replicated. This arrangement will withstand the failure of any node, including the storage node. Even if Rackstation RS18017xs + and Diskstation DS918+ break at the same time, we will be able to restore VirtualDSM along with iSCSI partitions from a snapshot on FS3017 in a matter of minutes with an up-to-dateness of 5 minutes (minimum replication start interval).

Price Question

Virtualization is always free, but software developers charge an annual fee for clustering, migration, and snapshots. Let's compare the cost of the advanced features of Synology VMM Pro with the leading industry players:

  • Synology VMM Pro - 3 Node Cluster License Any Configuration - $ 220 per year
  • Citrix XenServer Enterprise Edition - $ 1,500 per CPU socket per year
  • VMware vSphere 6 Enterprise Plus - $ 3925 per CPU socket per year
  • Microsoft Windows Server 2016 Datacenter Edition - $ 6155 for 1 processor core (Datacenter edition is needed to implement disk storage functions)

Considering the price of the issue, we do not concern the intellectual work of "certified specialists" on configuring virtualization and storage. In Synology VMM Pro, you can do everything yourself in 3-4 hours, without contacting technical support and even without studying forums. Try the same thing on Xen or ESXi, and you'll see how precious your time is.

Conclusions

The main advantage of virtualization on Synology NAS is the same ease with which you can approach cluster disk resource management, because unlike VMware vSphere, here are all tools for creating disk arrays, SMART monitoring , migration and pool expansion have been honed over the years and have earned thousands of positive reviews in thousands of articles and reviews around the world. And on top of this entire unified ecosystem, which can be configured from scratch without looking at manuals, a hypervisor is installed, which has now received the High Availability mechanism and has allowed the small business segment to take a step towards affordable fault tolerance.

What I didn't like:

  • Working in Active-Passive mode
  • Installing Virtual Machine Manager disables High Availability for NAS
  • No dynamic resource allocation

What we liked:

  • Virtualization over BTRFS does wonders for saving disk space
  • The cluster can be configured on different device tiers
  • Restoring a snapshot to a clone
  • Virtual DSM lets you move everything including iSCSI targets to the cluster core
  • 2 minutes for failover is good for this price

The first release of Synology Virtual Machine Manager showed that virtualization is easy and convenient, especially when you have a polished interface for managing all resources, including compute and file resources. If you want to build a cheap and cheerful virtualization cluster - VMM Pro will allow you to even include desktop NAS-s in it to test the operation of the infrastructure before going into production. Well, if in doubt, a demo version of Virtual Machine Manager Pro is available to you, in which you can test all advanced versions of the hypervisor for 1 month.

Mikhail Degtyarev (aka LIKE OFF)
02/09.2018


Read also:

Exclusive interview with Synology GmbH CEO Jeffrey Huang

Will Vendor Lock be implemented, how did the company survive the mining fever, how does it look at Web3 and decentralization, ZFS and RAID 2.0+, will DSM be sold separately? We are talking about this and other topics with the CE...

Impregnable NAS: hardening and protecting Synology

A modern NAS is quite capable of protecting itself from most attacks and guaranteeing not only the continuity of the service, but also the inviolability of the stored data. Even with minimal settings and following the manufactur...