Geo-distributed Cluster / Multi-Site Cluster

Recently there was a task to create and test a Failover Cluster on Windows 2008 Server. After creation to look, as all it will work in actual practice fulfilling the Print-Server role. The task at first sight not difficult, except for the several moments:

  • cluster was necessary to built without using SAN-storage;
  • nodes of the given cluster should be allocated in different networks (i.e. are territorially distributed) and not to have among themselves direct connection;
  • number of nodes in the cluster is equal to two.
  • It is clear that the primary goal – not to cluster Print-Server, and to check up as it is possible and whether it is possible to create Geo-distributed cluster in such conditions.

    Before I never faced such task – all became for the first time. It should be noted that the case for Hyper-V virtual machines – now it is fashionable ;) Therefore would like to share some moments (questions) which can arise for somebody in the course of expansion of the geo-distributed cluster.

    Generally speaking, the cluster s a group of independent computers that work together to increase the availability of StateFull- applications and services – read as for fault tolerance.

    To create a failover cluster requires at least one shared resource, accessible to all nodes of the cluster. To this resource at a particular time can access and manage it, only one of the nodes of the cluster. Resources is, this is not difficult to guess, some storage of information. To organize it are two different ways - using iSCSI or SAN storage. It should be noted that the SAN-storage facility is not a cheap pleasure. Nevertheless, it is;) But to use it for testing and raising Print-Server is not feasible. For iSCSI, under statements of many, there is no future – yes it and it is not necessary (means to organize iSCSI-store). Thereforre it was decided to use the software implementation of iSCSI-target – utility StarWind. Moreover, iSCSI-initiator is already built-in to Windows 2008 Server. Plus utility StarWind allows completely free to connect repository of not more than 2 GB - this is quite enough for testing, and industrial use of Print-Server.

    How it must look in the end. For example in the city A found one node of cluster. In the city B be the second cluster node. Both nodes are connected to the common storage. Cluster act as SQL-Server. If a failure occurs, one of the cluster nodes, which is currently the owner (and, indeed, provides access to SQL-database), its functions immediately catch the second node. As a result, service is available – all clients will be redirected to the second node of the cluster. In doing so, do not forget that the nodes are located in different cities. This is the Geo-Distributed Cluster / Multi-Site Cluster.

    I will not describe about setup iSCSI-target, its connection and configuration of servers roles – on the Internet enough documentation on this theme. I will say only that as a model of a Quorum has been chosen model Node and File Share Majority.

    The first problem which has arisen – how to make a cluster with nodes in different networks. It turned out that the problem is not here - a cluster without any problems connects nodes in different networks and subnets. If necessary, you can optionally set the address (see the figure).

    Cluster Cluster Cluster Cluster

    By practical consideration, the mechanism of operation of a cluster has been installed, which consists in the following – at creation of new service cluster creates in DNS record of type A with IP-address and name of the given cluster. At failure of one of cluster nodes, DNS-record overwritten. Thus, the name of the cluster remains the same, but the IP-address changes. If a node, which becomes the main, be in different subnet – the address and the cluster will also be on the same subnet (address is given previously in the creation of a cluster). Accordingly, all clients will be redirected to the correct subnet.

    The second problem which has arisen – the lifetime (TTL) of the cluster record in DNS. By default it is 20 minutes, i.e. if one of the cluster nodes fails - clients do not know about it within 20 minutes (in the worst form). Time of grab of functions of the failed node makes an order of 15-20 seconds. I.e. efficiency of the service restored within 20 seconds, but clients know about it only after 20 minutes. This is not acceptable. Manually specify the lifetime of DNS-recording does not work – at failure of one of nodes, record in DNS is rewritten completely, together with value of TTL field. To change TTL for all DNS-zone means to multiple increase load on DNS-server. Reduce the value of TTL field was made possible by using CMD command:

    cluster /cluster:< ClusterName > res < NetworkNameResource > /priv HostRecordTTL=< TimeInSeconds >

    More about this command it is possible to read on a TechNet site.
    This is actually all the main points of which I would like to tell. Everything else that is connected with clusters can easily find on the Internet.