Ask

  • Purpose
    • Education
    • Development & Testing
    • Hosting Production Applications
  • Cloud or OnPrem?
  • Workloads
    • How many?
    • What kind?
      • Web
      • Big Data/Analytics
    • Application Resource Requirements
      • CPU Intensive
      • Memory Intensive
    • Traffic
      • Heavy traffic
      • Burst Traffic

1.1 Purpose

1.2 Education

  • Minikube
  • Single node cluster with kubeadm /GCP/AWS

1.3 Development & Testing

  • Multi node cluster with a Single Master and Multiple workers
  • Setup using kubeadm tool or quick provision on GCP or AWS or AKS

1.4 Hosting Production Applications

  • High Availability Multi node cluster with multiple master nodes
  • Kubeadm or GCP or Kops on AWS or other supported platforms
  • Upto 5000 nodes
  • Upto 150,000 PODs in the cluster
  • Upto 300,000 Total Containers
  • Upto 100 PODs per Node

新建 Microsoft Word 文档_html_d56d4f2d69c28837.png

1.5 Cloud or OnPrem?

  • Use Kubeadm for on prem
  • GKE for GCP
  • Kops for AWS
  • Azure Kubernetes Service(AKS) for Azure

Basic

2.1 Storage

  • High Performance SSD Backed Storage
  • Multiple Concurrent connections Network based storage
  • Persistent shared volumes for shared access across multiple PODs
  • Label nodes with specific disk types
  • Use Node Selectors to assign applications to nodes with specific disk types

2.2 Nodes

  • Virtual or Physical Machines
  • Minimum of 4 Node Cluster (Size based on workload)
  • Master vs Worker Nodes
  • Linux X86_64 Architecture
  • Master nodes can host workloads
  • Best practice is to not host workloads on Master nodes

Deploy Solutions

3.1 Turnkey Solution

  • You Provision VMs
  • You Configure VMs
  • You Use Scripts to Deploy Cluster
  • You Maintain VMs yourself
  • Eg : Kubernetes on AWS using KOPS

3.2 Tools

  • OpenShift is a popular on-prem kubernetes platform by RedHat. OpenShift is an open source container application platform and is built on top of kubernetes. It provides a set of additional tools and a nice GUI to create and manage kubernetes constructs and easily integrate with CI/CD pipelines etc.
  • Cloud Foundry Container Runtime is an open-source project from Cloud Foundry that helps in deploying and managing highly available kubernetes clusters using their open-source tool called BOSH.
  • If you wish to leverage your existing Vmware environment for kubernetes, then the Vmware Cloud PKS solution is one that should be evaluated.
  • Vagrant provides a set of useful scripts to deploy a Kubernetes cluster on different cloud service providers.

For more kubernetes certified solutions, check out https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

3.3 Hosted Solutions

  • Google Container Engine is a very popular kubernetes as a service offering on Google Cloud Platform.
  • Openshift online is an offering from RedHat where you can gain access to a fully functional kubernetes cluster online.
  • Azure has Azure Kubernetes Service.
  • Amazon Elastic Container Service for Kubernetes is Amazon’s hosted kubernetes offering. Again, these are just some of the solutions

3.4 Network Solution

Depending on your environment & network ecosystem you have a wide variety of networking options to choose from. A list of supported network solutions and their implementation details are available here:

https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model

While choosing a network solution consider its support for Network Policies. We chose to go with Weave as our networking solution due to its simplicity and support for Network Policies.

Good to read:

https://www.objectif-libre.com/en/blog/2018/07/05/k8s-network-solutions-comparison/

Sample Deployment

4.1 Master Nodes

新建 Microsoft Word 文档_html_2d67541fec6ec7e8.png

4.2 19.4.1.1API Server

新建 Microsoft Word 文档_html_162905cede2cd1d0.png

4.3 19.4.1.2Controller Manager (Scheduler same)

新建 Microsoft Word 文档_html_47916e0448c1c292.png

kube-controller-manager --leader-elect true [other options]

--leader-elect-lease-duration 15s

--leader-elect-renew-deadline 10s

--leader-elect-retry-period 2s

These are controllers that watch the State of the cluster and take actions. For example the controller manager consists of controllers like the replication controller that is constantly watching the state of PODs and taking necessary actions, like creating a new POD when one fails. If multiple instances of those run in parallel, then they might duplicate actions resulting in more parts than actually needed. The same is true with scheduler. As such they must not run in parallel.

They run in an active standby mode. So then who decides which among the two is active and which is passive. This is achieved through a leader election process.

So how does that work.

Let's look at controller manager for instance. When a controller manager process is configured. You may specify the leader elect option which is by default set to true. With this option when the controller manager process starts, it tries to gain a lease or a lock on an endpoint object in kubernetes named as kube-controller-manager endpoint. Whichever process first updates the endpoint, it gains the lease and becomes the active of the two. The other becomes passive. The active holds the lock for the lease duration specified using the leader-elect-lease-duration option which is by default set to 15 seconds.

The active process then renews the lease every 10 seconds which is the default value for the option

leader-elect-renew-deadline. Both the processes tries to become the leader every two seconds, set by the leader-elect-retry-period option.

That way if one process fails maybe because the first crashes then the second process can acquire the lock and become the leader. The scheduler follows a similar approach and has the same command line options.

4.4 ETCD

新建 Microsoft Word 文档_html_41948d01df3c4ca2.png

The API server is the only component that talks to the ETCD server and if you look into the

API service configuration options, we have a set of options specifying where the ETCD server is.

So regardless of the topology we use and wherever we configure ETCD servers, weather on the same server or on a separate server. Ultimately we need to make sure that the API server is pointing to the right address of the ETCD servers.

Now remember ETCD is a distributed system, so the API server or any other component that wishes to talk to it, can reach the ETCD server at any of its instances. You can read and write data through any of the available ETCD server instances. This is why we specify a list of etcd-servers in the kube-apiserver configuration.

4.4.1 Concept

ETCD is a distributed reliable key value store that is Simple, Secure & Fast

ETCDCTL utility has two API versions. V2 and V3

export ETCDCTL_API=3

etcdctl put name john

etcdctl get name

etcdctl get / --prefix --keys-only

4.4.2 Stacked Topology

新建 Microsoft Word 文档_html_23441f649f9de20b.png

  • Easier to setup
  • Easier to manage
  • Fewer Servers
  • Risk during failures

4.4.3 External ETCD Topology

新建 Microsoft Word 文档_html_9b27e7a63f8fe290.png

  • Less Risky
  • Harder to Setup
  • More Servers

4.4.4 HA Setup

For HA setup, for example, you have 3 servers, all running etcd, and all maintaining an identical copy of the database. So if you lose one you still have two copies of your data.

You can write to any instance and read your data from any instance. ETCD ensures that the same consistent copy of the data is available on all instances at the same time.

So how does it do that? With reads.

It’s easy since the same data is available across all nodes and you can easily read it from any nodes.

But that is not the case with writes.

ETCD does not process the writes on each node. Instead, only one of the instances is responsible for processing the writes. Internally, the 3 nodes elect a leader among them. Of the total instances one node becomes the leader and the other nodes becomes the followers.

  • If the writes came in through the leader node, then the leader processes the write. The leader makes sure that the other nodes are sent a copy of the data.
  • If the writes come in through any of the other follower nodes, then they forward the writes to the leader, and then the leader processes the writes. Again when the writes are processed, the leader ensures that copies of the write are distributed to other instances in the cluster.

Thus a write is only considered complete, if the leader gets consent from other members in the cluster.

So how do they elect the leader among themselves? And how do they ensure a write is propogated across all instances?

4.4.5 Leader Election

ETCD implements distributed consensus using RAFT protocol.

Let's see how that works in a three node cluster. When the cluster is setup, we have 3 nodes that do not have a leader elected. RAFT algorithm uses random timers for initiating requests.

For example a random timer is kicked off on the three managers. The first one to finish the timer sends out a request to the other nodes requesting permission to be the leader. The other managers, on receiving the request, responds with their vote and the node assumes the Leader role. Now that it is elected the leader.

It sends out notification at regular intervals to other masters informing them that it is continuing to assume the role of the leader. In case the other nodes do not receive a notification from the leader at some point in time, the nodes initiate a re-election process among themselves and a new leader is identified.

4.4.6 Quorum

We said that the ETCD cluster is highly available. So even if we lose a node, it should still function. But a write is considered to be complete, if it can be written on the majority of the nodes in the cluster.

So what is the majority?

Well a more appropriate term to use would be Quorum. Quorum is the minimum number of nodes that must be available for the cluster to function properly or make a successful write. For any given number of nodes, the quorum is the total number of nodes divided by 2 + 1 (trimmed).

新建 Microsoft Word 文档_html_eae8217d1a15130.png

So even if you have 2 instances in the cluster the majority is still 2. If one fails. There is no quorum and so writes won't be processed. So having two instances is like having one instance it doesn't offer you any real fault tolerance.

Which is why it is recommended to have a minimum of 3 instances in an ETCD cluster. That way it offers a fault tolerance of at least 1 node.

So from 3 to 7 what do we consider?

As you can see 3 and 4 have the same fault tolerance of 1 and 5 and 6 have the same fault tolerance of 2. When deciding on the number of master nodes, it is recommended to select an odd number.

Say we have a 6 node cluster. So for example due to a disruption in the network it fails and causes the network to partition. If the network gets partitioned as followed, each group now has 3 nodes.

But since we originally had 6 manager nodes, the quorum for the cluster to stay alive is 4. But neither of these groups have four managers to meet the quorum so it results in a failed cluster.

新建 Microsoft Word 文档_html_d22a1e4fb4dc4583.png

No matter how the network segments there are better chances for your cluster to stay alive in case of network segmentation with odd number of nodes.

Install

wget -q --https-only \

"https://github.com/coreos/etcd/releases/download/v3.3.9/etcd-v3.3.9-linux-amd64.tar.gz"

tar -xvf etcd-v3.3.9-linux-amd64.tar.gz

mv etcd-v3.3.9-linux-amd64/etcd* /usr/local/bin/

mkdir -p /etc/etcd /var/lib/etcd

Copy over the certificate files generated for ETCD.

cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/

Configure the ETCD service.

新建 Microsoft Word 文档_html_c5c5a8c0455e7d0.png

The green highlighted row is how each etcd service knows that it is part of a cluster and where its peers are.

Kubernetes Hardway

https://github.com/mmumshad/kubernetes-the-hard-way

Tags:
Created by Bin Chen on 2020/10/15 08:47
    

Need help?

If you need help with XWiki you can contact:

京ICP备19054609号-1

京公网安备 11010502039855号