• Tejpal Singh

Rancher 2 on High Availability Kubernetes infrastructure

Updated: Mar 5, 2021

If you are thinking of setting up highly scalable infrastructure using Kubernetes on AWS. And you come across the AWS EKS which fulfils your requirements.

Then the question is what do you do if you want to use other service providers later?

In that case, Rancher, Kubernetes Cluster Management system will help.

Rancher 2 on Kubernetes for HA infrastruture

What do you need to do to setup Rancher on AWS?

The only thing you need to start is AWS admin level access. However, you must be well aware with the following:

  1. AWS administration

  2. AWS CLI

  3. Linux commands

  4. Networking and DNS configuration

  5. Terraform CLI

  6. Kubernetes

  7. YAML

  8. Helm Charts

Although the list seems to be overwhelming, but trust me, if you follow me till the end, you will realise how simple it is. To make the long story short, we shall be doing the following key things:

  1. Prepare 3 instances AWS environment

  2. Prepare 3 instances Kubernetes cluster on it

  3. Deploy Rancher to manage the cluster

In this process we will use technique like infrastructure as code using Terraform.

Let us begin.

Prepare AWS

In order to setup the infrastructure, we need to prepare the environment for following services:

  1. Route 53

  2. Key pair

  3. EC2 instances - 3 t2.medium ( done using Terraform )

  4. Load Balancer ( done using Terraform )

Log into AWS using your admin privilege account and do the following:

Route 53: This is needed for your domain to be accessible where your application and other services will reside. You may register the domain with AWS or you already have a domain with Godaddy, other service provider. In this case, let us assume you have a domain with other service provider like Godaddy and allow AWS to manage the subdomains. From Services in AWS Management Console, select Route 53 and do the following:

  • Create Hosted Zone and mention your domain name. Let us take “exampledomain.com” as the domain for our setup.

  • This will create NS and SOA records

EC2 Key Pair: The key pair of your account will give the necessary access permissions to the scripts for configuration of AWS EC2 instances.

  • From Services (AWS Management Console), select EC2 and click Key Pairs under Network and Security

  • Click Create Key Pair: There are 2 key formats, .pem (for non-windows) and .ppk (for windows). We are using .pem. Enter the name of the key and create the key pair. Remember to save the key pair safely as the secret will be required later. As an example, we create the key with name “mainkey”.

  • After creating the key and saving it, you will see it in the list as shown:

Configure AWS CLI

The question is why do you really need AWS CLI ?

You require the AWS CLI to automate the preparation of AWS Infrastructure. You can find the AWS CLI installation instructions at the following link:


Note: we are using AWS CLI version 2. Use the following command to configure aws cli.

$ aws configure

AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: eu-west-3
Default output format [None]: json 

Provide your necessary information about AWS Access Key, AWS Secret, region. Keep the format as json. Configuring AWS CLI is a good practice as that will not expose the AWS access in terraform files.


Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

We will be using Terraform to create our instances with required services activated. Terraform helps us automate the AWS infrastructure preparation. You may read about it here:


Install Terraform CLI for your respective OS from link given below:


Terraform Template files

Create the terraform template files as follows:

main.tf : Main file that mentions the key information about the infrastructure

provider "aws" {
 profile = var.aws_profile
 region = var.aws_region
data "aws_ami" "rancheros" {
 most_recent = true
 owners = ["6058xxxxxxxxx"]
 filter {
 name = "name"
 values = ["rancheros-*"]
 filter {
 name = "virtualization-type"
 values = ["hvm"]
data "aws_route53_zone" "dns_zone" {
 name = var.domain_name
data "aws_vpc" "default" {
 default = true
data "aws_subnet_ids" "available" {
 vpc_id = data.aws_vpc.default.id
resource "aws_security_group" "rancher-elb" {
 name = "${var.server_name}-rancher-elb"
 vpc_id = data.aws_vpc.default.id
 ingress {
 from_port = 80
 to_port = 80
 protocol = "TCP"
 cidr_blocks = [""]
 ingress {
  from_port = 443
 to_port = 443
 protocol = "TCP"
 cidr_blocks = [""]
 ingress {
 from_port = 6443
 to_port = 6443
 protocol = "TCP"
 cidr_blocks = [""]
 egress {
 from_port = 0
 to_port = 0
 protocol = "-1"
 cidr_blocks = [""]
resource "aws_security_group" "rancher" {
 name = "${var.server_name}-server"
 vpc_id = data.aws_vpc.default.id
 ingress {
 from_port = 22
 to_port = 22
 protocol = "TCP"
 cidr_blocks = [""]
 ingress {
 from_port = 80
 to_port = 80
 protocol = "TCP"
 security_groups = [aws_security_group.rancher-elb.id]
 ingress {
 from_port = 443
 to_port = 443
 protocol = "TCP"
 security_groups = [aws_security_group.rancher-elb.id]
 # K8s kube-api for kubectl
 ingress {
 from_port = 6443
 to_port = 6443
 protocol = "TCP"
 cidr_blocks = [""]
 # K8s NodePorts
 ingress {
 from_port = 30000
 to_port = 32767
 protocol = "TCP"
 cidr_blocks = [""]
 # Open intra-cluster
 ingress {
 from_port = 0
 to_port = 0
 protocol  = "-1"
 self = true
 egress {
 from_port = 0
 to_port = 0
 protocol = "-1"
 cidr_blocks = [""]
data "template_file" "cloud_config" {
 template = file("${path.module}/cloud-config.yaml")
resource "aws_instance" "rancher" {
 count = var.node_count
 ami = data.aws_ami.rancheros.image_id
 instance_type = var.instance_type
 key_name = var.ssh_key
 user_data = data.template_file.cloud_config.rendered
 vpc_security_group_ids = [aws_security_group.rancher.id]
 subnet_id = tolist(data.aws_subnet_ids.available.ids)[0]
 associate_public_ip_address = true
 # iam_instance_profile = "k8s-ec2-route53"
 root_block_device {
 volume_type = "gp2"
 volume_size = "50"
 tags = {
 "Name" = "${var.server_name}-${count.index}"
resource "aws_elb" "rancher" {
 name = var.server_name
 # TF-UPGRADE-TODO: In Terraform v0.10 and earlier, it was sometimes necessary to
 # force an interpolation expression to be interpreted as a list by wrapping it
 # in an extra set of list brackets. That form was supported for compatibility in
 # v0.11, but is no longer supported in Terraform v0.12.
 # If the expression in the following list itself returns a list, remove the
 # brackets to avoid interpretation as a list of lists. If the expression
 # returns a single list item then leave it as-is and remove this TODO comment.
 subnets = [tolist(data.aws_subnet_ids.available.ids)[0]]
 security_groups = [aws_security_group.rancher-elb.id]
 listener {
 instance_port = 80
 instance_protocol = "tcp"
 lb_port = 80
 lb_protocol = "tcp"
 listener {
 instance_port = 443
 instance_protocol = "tcp"
 lb_port = 443
 lb_protocol = "tcp"
 health_check {
 healthy_threshold = 2
  unhealthy_threshold = 2
 timeout = 2
 target = "tcp:80"
 interval = 5
 instances = aws_instance.rancher.*.id
 idle_timeout = 1800
 tags = {
 Name = var.server_name
resource "aws_route53_record" "rancher" {
 zone_id = data.aws_route53_zone.dns_zone.zone_id
 name = "${var.subdomain_name}.${var.domain_name}"
 type = "A"
 alias {
 name = aws_elb.rancher.dns_name
 zone_id = aws_elb.rancher.zone_id
 evaluate_target_health = true

Note: main.tf is making use of “cloud-config.yaml” and “variables.tf”. Also note that you have to change “6058xxxxxxxxx” to the right AMI owner id. You can find the AMI owner id in AWS Management Console -> EC2 -> Images -> AMIs. Search for the OS image you want to install on the instances. I recommend using Rancher OS as we will be installing Rancher on these.

variables.tf : Define all the variables used in main.tf here.

variable "aws_profile" {
 default = "default"
variable "aws_region" {
 default = "eu-west-3"
variable "domain_name" {
 default = "exampledomain.com"
variable "subdomain_name" {
 default = "v1"
variable "instance_type" {
 default = "t3.medium"
variable "node_count" {
 default = 3
variable "server_name" {
 default = "rancherCI-server"
variable "ssh_key" {
 default = "mainkey"

Note the following for variables.tf:

  • AWS region. You can set it to your region

  • Use of domain name exampledomain.com

  • Use of subdomain “v1”. This helps manage versions updates

  • Instance type is t3.medium. This is the minimum recommended.

  • Server name is defined for easy understanding. You can define as you like.

  • Use of “mainkey” for ssh-key

cloud-config.yaml : Use this file to define any other items that are needed on the instances. In this case, we have defined to have docker ready on each instance.

 resize_device: /dev/nvme0n1
 engine: docker-18.09.9-ce

Note that we are using docker engine version 18.09, which is latest at the time of this writing. Docker gets installed on the instances

Using these files, we will create the following:

  • Security Groups

  • 3 Instances

  • ELB for 80/443 points to the 3 instances

  • Route 53 DNS pointed to ELB

Terraform CLI commands:

To prepare the infrastructure with terraform template files, execute the following commands in the folder where these files are:

$ terraform init

After successful completion, similar to the below shown message will be displayed.

* provider.aws: version = "~> 2.65"
* provider.template: version = "~> 2.1"
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work.
If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.

Note “terraform plan” can be used to view the whole configuration before actually applying it. Use the below command to initiate the configuration of infrastructure:

$ terraform apply

After displaying the configuration details, we need to specify option to proceed with applying it as shown below:

Plan: 7 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
 Terraform will perform the actions described above.
 Only 'yes' will be accepted to approve.
 Enter a value:

At this point, enter “yes” to apply.

Upon successful completion, similar to the following message will be displayed:

Apply complete! Resources: 7 added, 0 changed, 0 destroyed.

At this point, if you log into your AWS account and go to EC2 service, you will see the details of the instances created as shown below:

You can ssh into the instances and check the details.


Rancher Kubernetes Engine is a CNCF-certified Kubernetes distribution that solves the common frustration of installation complexity with Kubernetes. With RKE, the operation of Kubernetes is easily automated and entirely independent of the operating system and platform you’re running.

Before we use RKE, we need to add the relevant policies in AWS as given below:


 "Version": "2012-10-17",
 "Statement": {
 "Effect": "Allow",
 "Principal": {"Service": "ec2.amazonaws.com"},
 "Action": "sts:AssumeRole"

Use the command below to apply the policy:

$ aws iam create-role --role-name rke-role --assume-role-policy-document file://rke-trust-policy.json


 "Version": "2012-10-17",
 "Statement": [
 "Effect": "Allow",
 "Action": "ec2:Describe*",
 "Resource": "*"
 "Effect": "Allow",
 "Action": "ec2:AttachVolume",
 "Resource": "*"
 "Effect": "Allow",
 "Action": "ec2:DetachVolume",
 "Resource": "*"
 "Effect": "Allow",
 "Action": ["ec2:*"],
 "Resource": ["*"]
 "Effect": "Allow",
  "Action": ["elasticloadbalancing:*"],
 "Resource": ["*"]

Use the commands below to apply the policy, create instance profile and role:

$ aws iam put-role-policy --role-name rke-role --policy-name rke-access-policy --policy-document file://rke-access-policy.json
$ aws iam create-instance-profile --instance-profile-name rke-aws
$ aws iam add-role-to-instance-profile --instance-profile-name rke-aws 

Download RKE from:



Prepare YAML file for the infrastructure

Generate config file using either of the below commands and do the necessary changes:

$ rke config --name cluster.yml
$ rke config --print

cluster.yml : Define the infrastructure details in this file. Ensure that this file is in the same directory where “rke” executable is.

 name: aws
cluster_name: myapplication
 - address:
 user: rancher
 hostname_override: myapplication.0
 ssh_key_path: "path/to/aws/user.pem"
 - controlplane
 - etcd
 - worker
 - address:
 user: rancher
 hostname_override: myapplication.1
 ssh_key_path: "path/to/aws/user.pem"
 - worker
 - address:
 user: rancher
 hostname_override: myapplication.2
 ssh_key_path: "path/to/aws/user.pem"
 - worker
 snapshot: true
 creation: 6h
 retention: 24h
# Required for external TLS termination with
# ingress-nginx v0.22+
 provider: nginx
use-forwarded-headers: "true"

Before you execute the command, ensure to do the following:

  • Is the user created in AWS? (rancher in this case, but you may choose as per your requirement)

  • Mention the Key file for the user. You can download it from the AWS for the user (in this case user=rancher).

  • Replace the IPs for address and internal_address. You can collect these from the detail of instances in AWS

  • Provide the necessary names for each hostname (myapplication.x). You may choose your own name

  • We have used only 1 instance for etcd and controlplane. You can specify this for all three to make it a complete HA infrastructure

By default, Kubernetes clusters require certificates and RKE auto-generates the certificates for all cluster components. You can also use custom certificates. After the Kubernetes cluster is deployed, you can manage these auto-generated certificates.

Execute the below command to fire up the Kubernetes preparation:

$ ./rke up
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host []
INFO[0000] [network] Deploying port listener containers
INFO[0000] [network] Pulling image [alpine:latest] on host []
INFO[0101] Finished building Kubernetes cluster successfully

The last line should read Finished building Kubernetes cluster successfully to indicate that your cluster is ready to use. As part of the Kubernetes creation process, a kubeconfig file has been created and written at kube_config_cluster.yml, which can be used to start interacting with your Kubernetes cluster.

NOTE: If you have used a different file name from cluster.yml, then the kube config file will be named kube_config_<FILE_NAME>.yml.

Save Your Files

The files mentioned below are needed to maintain, troubleshoot and upgrade your cluster.

Save a copy of the following files in a secure location:

cluster.yml: The RKE cluster configuration file.

kube_config_cluster.yml: The Kubeconfig file for the cluster, this file contains credentials for full access to the cluster.

cluster.rkestate: The Kubernetes Cluster State file, this file contains credentials for full access to the cluster.

The Kubernetes Cluster State file is only created when using RKE v0.2.0 or higher.

NOTE: The “rancher-cluster” parts of the two latter file names are dependent on how you name the RKE cluster configuration file. So if you define mycluster.yml, then config file will be kube_config_mycluster.yml and state file will be mycluster.rkestate

Kubernetes Cluster State

The Kubernetes cluster state, which consists of the cluster configuration file cluster.yml and components certificates in Kubernetes cluster, is saved by RKE, but depending on your RKE version, the cluster state is saved differently.

As of v0.2.0, RKE creates a .rkestate file in the same directory that has the cluster configuration file cluster.yml. The .rkestate file contains the current state of the cluster including the RKE configuration and the certificates. It is required to keep this file in order to update the cluster or perform any operation on it through RKE.

Prior to v0.2.0, RKE saved the Kubernetes cluster state as a secret. When updating the state, RKE pulls the secret, updates/changes the state and saves a new secret.

Interacting with your Kubernetes cluster

After your cluster is up and running, you can use the generated kubeconfig file to start interacting with your Kubernetes cluster using kubectl.


Install kubectl as per the below link


On Mac, you can use brew to install kubectl with below command:

$ brew install kubectl

Kubectl config: Configure Kubectl to connect to the cluster using the generated kube_config_cluster.yml file. This is important as we would need to fire some kubectl commands during our infrastructure preparation. Use the following command:

$ export KUBECONFIG=$PWD/kube\_config\_cluster.ym

Note that using the above command will configure it for that terminal session. If you need to configure across terminal sessions, then have this added in the respective shell profile file.

That’s it for the fully functional HA grade Kubernetes Cluster setup.

At this point, your Kubernetes cluster is fully functional. You may choose to use the kubectl and related commands to manage your cluster. But as we would like to make use of Rancher – Kubernetes cluster manager, we will proceed with the next steps.


Helm is the package manager for Kubernetes. You can read detailed background information in the CNCF Helm Project Journey report. You can also use Heml in Rancher by connecting to public Heml repos and deploy applications from within Rancher with very little configuration. This helps in quick deployment of standard application on Kubernetes.


You can find installation instructions at the below link


On Mac, you can use install using Homebrew with the following command:

$ brew install helm

Add Helm chart repository

$ helm repo add rancher-latest https://releases.rancher.com/server-charts/latest

Create a Namespace for Rancher

We’ll need to define a Kubernetes namespace where the resources created by the Chart should be installed. This should always be cattle-system:

$ kubectl create namespace cattle-system

Choose your SSL Configuration

The Rancher management server is designed to be secure by default and requires SSL/TLS configuration.

Note: If you want to terminate SSL/TLS externally, see TLS termination on an External Load Balancer.

There are three recommended options for the source of the certificate used for TLS termination at the Rancher server:

Rancher-generated TLS certificate: In this case, you need to install cert-manager into the cluster. Rancher utilizes cert-manager to issue and maintain its certificates. Rancher will generate a CA certificate of its own, and sign a cert using that CA. cert-manager is then responsible for managing that certificate.

Let’s Encrypt: The Let’s Encrypt option also uses cert-manager. However, in this case, cert-manager is combined with a special Issuer for Let’s Encrypt that performs all actions (including request and validation) necessary for getting a Let’s Encrypt issued cert. This configuration uses HTTP validation (HTTP-01), so the load balancer must have a public DNS record and be accessible from the internet.

Bring your own certificate: This option allows you to bring your own public-CA or private-CA signed certificate. Rancher will use that certificate to secure websocket and HTTPS traffic. In this case, you must upload this certificate (and associated key) as PEM-encoded files with the name tls.crt and tls.key. If you are using a private CA, you must also upload that certificate. This is due to the fact that this private CA may not be trusted by your nodes. Rancher will take that CA certificate, and generate a checksum from it, which the various Rancher components will use to validate their connection to Rancher.

Install cert-manager

This step is only required to use certificates issued by Rancher’s generated CA (ingress.tls.source=rancher) or to request Let’s Encrypt issued certificates (ingress.tls.source=letsEncrypt). Ensure you have the version of cert-manager greater than or equal to v0.11.0

It is not required if you use your own certificates.

We are using Helm chart to install cert-manager. Use the following set of commands to install cert-manager on the cluster:

Create the namespace for installing

$ kubectl create namespace cert-manager

Add the Jetstack Helm repository

$ helm repo add jetstack https://charts.jetstack.io

Update your local Helm chart repository cache

$ helm repo update

Install the cert-manager Helm chart for Lets Encrypt

$ helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v0.15.0 --set ingress.tls.source=letsEncrypt

Note: Before you go further, ensure that you own the domain and is pointed to the AWS Route 53 as per the description in the beginning.

This option uses cert-manager to automatically request and renew Let’s Encrypt certificates. This is a free service that provides you with a valid certificate as Let’s Encrypt is a trusted CA.

Check the cert-manager with below command:

$ kubectl get pods --namespace cert-manage

NAME                               READY  STATUS   RESTARTS   AGE
cert-manager-5c6866597-zw7kh       1/1    Running  0          2m
577f6d9fd7-tr77l                   1/1    Running  0          2m
nlzsq                              1/1    Running  0          2m

We are now ready to install Rancher. We will make user of LetsEncrypt service for certificates. Use the below command to install Rancher.

$ helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=v1.exampledomain.com --set ingress.tls.source=letsEncrypt --set letsEncrypt.email=email@mydomain.com

NOTE: Ensure that these subdomains are configured in the DNS server properly. The best is to set the A record for these subdomains and point it to the load balancer. For v1.exampledomain.com, it is already done in using terraform.

Check the roll out using the below command:

$ kubectl -n cattle-system rollout status deploy/rancher

Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
deployment "rancher" successfully rolled out

If you see the following error: error: deployment "rancher" exceeded its progress deadline, you can check the status of the deployment by running the following command:

$ kubectl -n cattle-system get deploy rancher

rancher   3         3         3            3           3m

It should show the same count for DESIRED and AVAILABLE. With this, your 3 nodes Kubenetes cluster is ready for use with Rancher.

After successful installation, you can access Rancher with the domain url:


At the first access, you will be asked to create the password of the admin user.

After the successful setting the password, you will be shown the login screen to log into the Rancher system.

After successful login, you can confirm the 3 nodes cluster is ready for use as under:

Clicking Global will show the cluster name “local”. And on mouse over it shows the namespaces in that cluster as shown below:

That’s it folks. Happy ranchering your cluster … :)

We will love to hear from you.

#devops #kubernetes #rancher #heml #aws #terraform #linux

429 views0 comments