Ingress k8s - Let's play around it -- Part I
Hey all, I am going to talk about Ingress in Kubernetes, and for educational purposes we are going to tweak it so that we can understand all of the common behaviors of Ingress.
Let's first give a brief description about Ingress, that existed in order to expose your application(s) which are running as service(s) that are running in k8s pods to outside the cluster.
These applications/services are going to be used by people which obviously they are living outside of your k8s cluster, so we need to define an Ingress resource where it specifies
how we want external clients to connect to our services.
Here is a simple snippet of YAML code to define an Ingress object.
- host: k8s.nzoueidi.com
Here for example the Ingress controller configures itself so it can read the Ingress resource and route the incoming traffic.
It would act as L7 LB. An L7 LB is "Layer 7 Load Balancing", it operates at the high-level application layer, which deals with the actual content of each message.
L7 LB terminates the network traffic and reads the messange whithin. And the most important part is that it makes a load-balancing decision based on the content of the message
that went through the network.. It then makes a new TCP connection to the selected upstream server or in some cases resues an existing one which is applicable to HTTP keepalives.
Anyway, this is not our topic, you can duckduck it and you will have many examples of L7 load-balancing..
Let's get back to Ingress, many people expose their apps using Port Forwarding, this is not good for production matter as it simply forwards the requests you get from port on the client machine
directly to the targeted pod. Also, in the same context we have "HostPort", in some places they call it as Host Network, it is a mechanism where ports on a worker node in our k8s cluster
are directly mapped to ports on pods, which is silly - here you need to have these ports opened for outside communication and as a limitation, here you need to have only one pod per node..
Before deeping too dive into this , we want to know why the hell we want to use Ingress in any similar case. Alright, let's describe the low level of our network stack.
In a general way, connections and requests theorically are either operating through Layer 4 which is TCP or through Layer 7 which is http, rpc, etc.. In the kernel side, Netfilter is the one which is responsible
for routing rules, this is basically in the Layer 3 of OSI which is the Network Layer. If we'r gonna talk about Netfilter, this article will be long and boring. So briefly Netfilter have five principles
hooks that programs can register with. As packets are going through the stack, they will trigger some kernel modules that have registred with these hooks, it depends on the packet itself, if it is
incoming or outgoing, depends also on the packet destination and wether the packet was dropped or rejected. Anyway..a packet when it is coming to our cluster, more specifically to our node's desired interface
is processed by Netfilter, in case it matches the rules established, it is forwarded to the IP of the available and healthy pod.
The Netfilter have some hooks that are invoked in the networking stack of the Linux kernel.
These are the essential hooks that are called by kernel modules whenever it is needed, anyway this is another article that am planning to write.
This is an architecture of the packet flow in Netfilter and General Networking in Linux.
Don't worry as I said I will cover this is another article.
So, to expose your applications we need to implement the same overview above. Thus, our end-users will have to call the cluster IP and port because this is where all the toolchain described starts.
But, the IP we are talking about is only reachable from inside the cluster. The question here how the heck we can forward a public IP endpoint to an IP that is only reachable once the packet
is already on a node?
You can use many "hacks" to get through this:
- Examine the netfilter rules and make a use of iptables's utility, the issue here is that node(s) are ephemeral, same thing for pod(s)
and also the big issue here is that rules you made are operating at Layer 3 (we talked about that above) they don't know if your services are healthy or not.
If the node is somehow unreachable the route will also remain unreachable and stay broken.
- Put everything in the k8s master, well I am sure that you would not do that as long as you are a human being.
- Use a Simple LoadBalancer in front of a distributing client traffic to the nodes in our cluster, for this we need a Public IP address and the addresses of the nodes.
The issue here is that when an end-user wants to connect to our service using other port (e.g 80) we can not afford the request of that packet using that port to the node's interface..
This is because we don't have any kind of listeners like processes on the IP address of the targeted node.
- There are other many quick and dirty ways..
To prevent these issues, Kubernetes solve it by something called NodePort. Here is a snippet for NodePort service in k8s.
- port: 80
A service like this is a ClusterIP service with a nice feature which is: the IP of the node is reachable as well as the cluster IP. How it works exactly?
It asks kube-proxy to allocate a port in a regular range and opens the associated port in every node's interface. Like this we can make sure that any traffic can be routed to any node without any problems.
So, here we used NodePort to expose our service to our end-users on a "fake" port. This is good while our load balancer can reach the real port internally and hide it from the end-user.
But, this is not sufficient as a complete solution for loadbalancing our service(s). This is where Ingress shows up.
Another service of type LoadBalancer, it contains the capababilities of NodePort service and the ability to have a complete Ingress path.
If you are using a cloud provider which has or supports API driven configuration of networking resources, you can try this snippet of YAML code.
- port: 80
You will notice that an external IP has been allocated by typing
kubectl get svc service-loadbalancer
To be fair, the service LoadBalancer have some limitations such as it can not determine HTTPS traffic in the Loadbalancer level.
Back to Ingress, we have 3 main componenets which are the following:
- Ingress Resource: is a k8s resource which defines rules that are used by the Ingress Controller to route only incoming traffic.
- Ingress Controller: an application which captures incoming requests and routes them accordingly by the rules that are already mentioned in the Ingress Resource.
- Default Backend: it is simply a small application that catches all the traffic that are not relevant rules defined by the Ingress Resource and shows up a 404 page.
Our goal here as mentioned in the first part of the article is playing around with Ingress so we can tweak it.
Let's give a good scenario where we can see the importance of why we really should know how Ingress works and more than deep dive into it.
Let's say we have a kubernetes cluster, where we have an application that is being proxied through nginx. For some weird reasons our app started taking too long to respond
Thus, caused connections and requests in general to fall completely into nginx listen backlog that is what made nginx itself to start drop connections including the most important ones that were being made
An nginx listen backlog is a place where all of the pending connections are stored for further usage. It is actually a parameter invoked into the listen() syscall.
The connections that are being made by kubernetes are actually "liveness and readiness probes for containers", kubelet uses it to have the knowledge when exactly to restart a container in a regular pod.
If somehow, the pod fails to respond to the liveness proves we talked about, k8s thinks here that there is absolutely something wrong and it restarts the pod, while our issue really is not about liveness probes..
This also can lead to a crash-loop after several restarts..
This is an image where it gives an overview of the TCP backlog in Linux where it is implemened due to the Listen syscall.
We can go deeper about this to the TCP backlog and how it works under the hoods in Linux, but this could be another separate article :)
Another article will follow on how to tweak code and packages into the Ingress-nginx repository
Last modified on 09/11/2018 5:28 PM