What are floatings IPs?

Floating IPs allow you to have IP redundancy in case of a system fault. This is achieved by monitoring your servers and automatically routing IP addresses to another server if an issue is detected. 

You will be provided with a pack of scripts and configuration options that allow utilizing Heficed Terminal API as a way to achieve IP redundancy.
Floating IP solution uses Corosync / Pacemaker as a monitoring system and our API as an IP migration tool. The system can migrate subnets of any size with the help of dynamic routing. Automatic IP migration between different locations is also achievable for subnets that are greater than or equal to /24.

The instructions provided below will get you started with the configuration and use of Floating IP solution:

STEP 1. PREREQUISITES

To start, you need at least two active machines in our infrastructure with CentOS installed. It could be Kronos Cloud or Proto Compute servers.

You will need to have IP addresses as well.

Once you have all the necessary servers ready, you need to get more information about them from your Terminal:

  • Subnet address - subnet you want to be floating;
  • Subnet CIDR - floating subnet mask in CIDR; 
  • Hostnames - this is displayed in the Terminal of each machine you will use in the High Availability (HA) cluster.
  • Product type -  each machine type that you will use in the HA cluster.

Lastly, since you will be using Heficed API, you need to acquire these specific variables:

  • Tenant ID
  • Client ID
  • Client Secret

To obtain this information, you might want to read through our API documentation.

STEP 2. INSTALLATION

Reassignment script

First, download our scripts that are required for this Floating IP solution:


Now, proceed to set up the subnet reassignment script and its configuration.
*The following steps are done with our chosen file paths. Feel free to adjust Python script and its configuration paths as you need. Just make sure to change it in all the scripts where necessary.
**Python Requests Library must be installed on your servers.

Make a new directory for the scripts on both machines:

mkdir /opt/floatingIP

Place assign-ip.py and api.conf in the following location on both machines:

/opt/floatingIP/

Proceed to edit and fill the api.conf with the required values. All values should be the same on both servers except the hostname and product type.

Install Corosync, Pacemaker, and PCS

The next step is to get Corosync, Pacemaker and PCS installed on your machines.

Install the software packages on both machines:

yum install pacemaker pcs

The PCS utility creates a new system user during installation named hacluster, with a disabled password. We need to define a password for this user on both servers. This is required for PCS successful synchronization and subnet migration between cluster nodes.

On both machines, run:

passwd hacluster

Please use the same password on both machines. This password will also be required in further configuration steps.

Set Up the Cluster

Now that we have Corosync, Pacemaker and PCS installed on both servers, we can set up the cluster. To enable and start the PCS daemon, run the following on both machines:

systemctl enable pcsd.service
systemctl start pcsd.service

Authenticate the cluster nodes using the username hacluster and the same password you defined the step before. You will need to enter the primary IP address for each node. From the primary machine, run:

pcs cluster auth first_machine_primary_IP_address second_machine_primary_IP_address

The output should look like this:

Username: hacluster
Password:
first_machine_primary_IP_address: Authorized
second_machine_primary_IP_address: Authorized

On the primary machine, generate the Corosync configuration file by running:

pcs cluster setup --name webcluster first_machine_primary_IP_address second_machine_primary_IP_address

The output will look like this:

Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
first_machine_primary_IP_address: Succeeded
second_machine_primary_IP_address: Succeeded
Synchronizing pcsd certificates on nodes first_machine_primary_IP_address, second_machine_primary_IP_address...
first_machine_primary_IP_address: Success
second_machine_primary_IP_address: Success

Restaring pcsd on the nodes in order to reload the certificates...
first_machine_primary_IP_address: Success
second_machine_primary_IP_address: Success

The new configuration file will be generated at /etc/corosync/corosync.conf based on the parameters provided to the pcs cluster setup command. In this example, the cluster name was webcluster, but you can choose any name you want.

Next, you'll need to start your cluster. Run the following command from the primary machine:

pcs cluster start --all

Output:

first_machine_primary_IP_address: Starting Cluster...
second_machine_primary_IP_address: Starting Cluster...

You can check if both nodes have connected to the cluster by running this command on any of the cluster servers:

pcs status corosync

Output:

Membership information
----------------------
    Nodeid      Votes Name
         2          1 secondary_private_IP_address
         1          1 primary_private_IP_address (local)

To get more information about the current status of the cluster, run:

pcs cluster status

The output should be similar to this:

 Last updated: Fri Dec 11 11:59:09 2015     Last change: Fri Dec 11 11:59:00 2015 by hacluster via crmd on secondary
 Stack: corosync
 Current DC: secondary (version 1.1.13-a14efad) - partition with quorum
 2 nodes and 0 resources configured
 Online: [ primary secondary ]

PCSD Status:
  primary (primary_private_IP_address): Online
  secondary (secondary_private_IP_address): Online

Now you should enable the corosync and pacemaker services so they would start on system boot. Run the following on both machines:

systemctl enable corosync.service
systemctl enable pacemaker.service

In our configuration, we recommend disabling STONITH (Shoot The Other Node In The Head). Run the following command on one of the machines:

pcs property set stonith-enabled=false

Create a Floating IP Reassignment Resource Agent

The last thing you need to configure is the resource agent that will execute the IP reassignment script when a failure is detected in the primary cluster node. The resource agent is responsible for creating an interface between the cluster and the resource itself. In this case, the resource is the assign-ip.py script. The cluster requires the resource agent to execute the right procedures when given a start, stop or monitor command.
The resource agent in this example will be OCF (Open Cluster Framework) standard. We will create a new OCF resource agent to manage the assign-ip.py service on both machines.
First, create the directory that will contain the resource agent. The directory name will be used by Pacemaker as an identifier for this custom agent.

Run the following on both machines:

mkdir /usr/lib/ocf/resource.d/heficed

Next, use floatip resource agent script and place it in the newly created directory, on both machines:

/usr/lib/ocf/resource.d/heficed/

Now make the script executable with the following command on both machines:

chmod +x /usr/lib/ocf/resource.d/heficed/floatip

Next, register the resource agent within the cluster, using the PCS utility. The following command should be executed from one of the nodes:

pcs resource create FloatIP ocf:heficed:floatip

The resource should now be registered and active in the cluster. You can check the registered resources from any of the nodes with the pcs status command:

pcs status

Output:

...
2 nodes and 1 resource configured

Online: [ primary secondary ]

Full list of resources:

 FloatIP    (ocf::heficed:floatip):    Started primary

...

STEP 3. TEST THE SYSTEM

To test if the system is working, you can run floatip script in bash with command reporting:

bash -x /usr/lib/ocf/resource.d/heficed/floatip $command

$command is the option that is provided by the HA system. The script must work with these four commands:

  • start - start the resource.
  • stop - stop the resource. 
  • monitor - monitor the health of a resource.
  • meta-data - provide information about this resource as an XML snippet. 

To check the status code after the script completes, enter:

echo $?

More information about OCF Resource can be found here.

If the script returns correct codes and doesn't show any errors, the system should work correctly. Otherwise, please debug as needed. 

If you encounter any difficulties or have any further questions, feel free to contact our Customer Support Department by creating a Ticket in your Terminal or messaging us directly to support@heficed.com.

Did this answer your question?