Learn LXD and LXC the hard way!
Hey folks, I am starting this 2nd collection, it is about LXD,LXC and containers in general, how you can use them from a sysadmin/devops or even a developer prespectives.
First of all, LXC is a userspace interface for the Linux kernel containment features. It lets Linux users easily create and manage system or application containers.
While, there is LXD which is the modern container manager, it offers a user experience similar to virtual machines. You can use it online!
I will divide the article into many parts.
I am not going to write the usual informations about those tools, you can check this website for more informations : https://linuxcontainers.org
For today, I will talk briefly about the features that enable for us containers and are needed in order to manage them.
Containers rely on the following features which are built-in in the Linux kernel to get a contained or isolated area within the host machine. This area is closely related to a virtual machine,
but without indeed the need for a hypervior! :D
- Control groups aka cgroups
- Filesystem or rootfs
Let's talk about a little bit about each feature.
To understand the importance of cgroups, consider a common scenario: A process running on a system requests certain resources from the system, at a specific time, there is no available resources
So, automatically, the kernel will decides to defer the process until there is some resources are available.. For sure, the requested resources may become available when other process release
them.. Well, this delay in general is not acceptable, what if the process that wait for resources is an urgent process, and when it is not excuted there is some risky situations can happen.
Besides, resources unavailablity such as our scenario can occur when a process consume all nor the majority of the resources/memory. Here, an OOM killer can be excuted at anytime.
For that, Google presented a new generic method to solve the resource control problem with the cgroups project in 2007. Control groups allow resources to be controlled and
accounted for based on process groups. The mainline Linux kernel first included a cgroups implementation in 2008, and this paved the way for LXC.
And those cgroups can be configured to have specialized behavior as desired of course.
If you wanna take a look about the cgroups you have in your system.
$ ls -alh /sys/fs/cgroup
drwxr-xr-x 13 root root 340 Oct 8 19:32 .
drwxr-xr-x 8 root root 0 Oct 8 19:32 ..
dr-xr-xr-x 3 root root 0 Oct 8 19:32 blkio
lrwxrwxrwx 1 root root 11 Oct 8 19:32 cpu -> cpu,cpuacct
dr-xr-xr-x 3 root root 0 Oct 8 19:32 cpu,cpuacct
lrwxrwxrwx 1 root root 11 Oct 8 19:32 cpuacct -> cpu,cpuacct
dr-xr-xr-x 3 root root 0 Oct 8 19:32 cpuset
dr-xr-xr-x 6 root root 0 Oct 8 19:32 devices
dr-xr-xr-x 3 root root 0 Oct 8 19:32 freezer
dr-xr-xr-x 3 root root 0 Oct 8 19:32 memory
lrwxrwxrwx 1 root root 16 Oct 8 19:32 net_cls -> net_cls,net_prio
dr-xr-xr-x 3 root root 0 Oct 8 19:32 net_cls,net_prio
lrwxrwxrwx 1 root root 16 Oct 8 19:32 net_prio -> net_cls,net_prio
dr-xr-xr-x 3 root root 0 Oct 8 19:32 perf_event
dr-xr-xr-x 6 root root 0 Oct 8 19:32 pids
dr-xr-xr-x 6 root root 0 Oct 8 19:32 systemd
dr-xr-xr-x 5 root root 0 Oct 8 19:32 unified
Let?s take a look at an example of the memory subsystem hierarchy of cgroups. It is available in the following location:
The memory subsystem hierarchy consists of the following files:
$ cd sys/fs/cgroup/memory
cgroup.clone_children docker memory.kmem.limit_in_bytes memory.kmem.tcp.limit_in_bytes memory.limit_in_bytes memory.memsw.max_usage_in_bytes memory.oom_control memory.swappiness release_agent
cgroup.event_control memory.failcnt memory.kmem.max_usage_in_bytes memory.kmem.tcp.max_usage_in_bytes memory.max_usage_in_bytes memory.memsw.usage_in_bytes memory.pressure_level memory.usage_in_bytes tasks
cgroup.procs memory.force_empty memory.kmem.slabinfo memory.kmem.tcp.usage_in_bytes memory.memsw.failcnt memory.move_charge_at_immigrate memory.soft_limit_in_bytes memory.use_hierarchy
cgroup.sane_behavior memory.kmem.failcnt memory.kmem.tcp.failcnt memory.kmem.usage_in_bytes memory.memsw.limit_in_bytes memory.numa_stat memory.stat notify_on_release
Each of the files listed contains information on the control group for which it has been created. For example, the maximum memory usage in bytes is given by the
following command (since this is the top-level hierarchy, it lists the default setting for the current host system):
$ cat memory.max_usage_in_bytes
The preceding value is in bytes; it corresponds to approximately 52GB of memory that is available for use by the currently running system. You can create your own cgroups within /sys/fs/cgroup and control each of the subsystems.
The concept of namespaces is quite interesting, which is provide a way to have varying views of the system for different processes. In human talking, is to have the capability to put apps in isolated environments with separate process lists,
network devices, user lists and filesystems. Cool! Right? :D
The most coolest part is that it is implemented inside the kernel without the need to run hypervisors or even virtualization!
Kernel namespaces are created and manipluated using 3 basic syscalls:
While clone() and unshare() are standard linux syscalls, the setns() is a new addition dedicated solely to manipulate namespace membership.
- clone() which creates a new process and allows the child to share parts of the excution context with its parent. This low level functions sits behind the well know function fork().
- unshare() which allows processes to disassociate parts of their excution context. The main use of this syscall is to modify the excution context of a process without spawning a new child process.
- setns() which changes the namespace of the calling process.
The namespace functionality has been introduced into both clone() and unshare() through a set of additional flags that indicate which namespaces are to be created for the child process (or which are to be disassociated from the caller).
Currently, the Linux kernel implements 6 namespaces:
mnt, pid, net, ipc, uts and user.
Namespaces don?t have names. Instead, each gets a unique inode number at the time of its creation.
In Linux Kernel 3.8 and higher, you can use the /proc filesystem to check in which namespace a given process resides. Simply go to /proc//ns and you will see links for each namespace type.
Filesystem or rootfs
The next component needed for a container is the disk image, which provides the root filesystem (rootfs) for the container. The rootfs consists of a set of files, similar in structure
to the filesystem mounted at root on any GNU/Linux-based machine. The size of rootfs is smaller than a typical OS disk image, since it does not contain the kernel. The container shares the same kernel as the host machine.
A rootfs can further be reduced in size by making it contain just the application and configuring it to share the rootfs of the host machine. Using copy-on-write (COW) techniques,
a single reduced read-only disk image may be shared between multiple containers.
Next article will be published soon, please stay tuned.
Please let me hear your feedback by a mail or a tweet. :)