Not Only Docker
Jump to navigation
Jump to search
<slideshow style="nobleprog" headingmark="⌘" incmark="…" scaled="true" font="Trebuchet MS" >
- title
- Linux Containers: Not Only Docker
- author
- Alexander Patrakov
</slideshow>
Virtualization ⌘
- Run another OS (including another copy of the kernel) in a virtual machine
- OK to run Windows VM on Linux
- Good level of security & isolation
- Separate virtual disks, network stack
- Hacked VM != hacked host
- Can run any OS
- A lot of performance overhead related to emulation of virtual hardware
- Somewhat solved by paravirtualization
- Virtual machines eat memory for caches - partially solved by ballooning
Containerization ⌘
- Running another tree of processes under the same kernel
- They don't see other processes
- They have their own view of the filesystem
- They have their own copy of the network stack
- They look and feel like a virtual machine in some sense
- No overhead
- Native performance of e.g. disks
- Only as much memory occupied as when running directly on the host
- Somewhat less secure than a virtual machine
- But still good enough for many use cases
- Good only for runnng Linux software
Why people use containers and VMs ⌘
- Reproducible deployment of services
- Just copy a VM or container image onto a host, and start it
- No need to configure the application inside
- No need to worry about compatibility with the host
- Multi-tenant setups
- Isolation and security matter here
Popular container runtimes ⌘
- Docker
- Many people don't know that alternatives even exist!
- Everything else server-side
- LXC/LXD
- systemd-nspawn
- runc
- Rocket
- On the desktop
- Flatpak
- Snap
Container building blocks ⌘
- Namespaces
- man 7 namespaces
- UTS, PID, User, Mount, Network, IPC, Cgroup
- Cgroups
- Management software
Chroot ⌘
- A mechanism that allows to change the view of the filesystem
- Some existing directory is treated as root directory for a process
- Chroots are entered using a chroot() system call
- Not bulletproof - root can escape
- No other security - processes in a chroot can send signals and use network as usual
Let's create a chroot ⌘
- Chroot with Debian or Ubuntu:
- Use debootstrap, it is packaged for many distributions
- debootstrap jessie /tmp/jessie
- Chroot with Fedora/CentOS/OpenSUSE:
- Use rinse, it is packaged for Debian and Ubuntu
- On other distributions install from source
- If you are on CentOS/Fedora and want to create a chroot of the same kind, you can use supermin instead
- rinse --arch amd64 --directory /tmp/centos --distribution centos-7
- Use rinse, it is packaged for Debian and Ubuntu
Entering a chroot ⌘
mount -t proc proc /tmp/jessie/proc mount -t sysfs sysfs /tmp/jessie/sys chroot /tmp/jessie . /etc/profile # to set the correct $PATH ls # runs in a chroot exit
Some thoughts about chroots ⌘
- Chroot is a privileged operation
- Can you think why?
- Chroot can be escaped from by root
- Let's say "escaped" means "/etc/hacked" created outside of the chroot
- Can you think of a few ways?
- The "textbook" answer is relying on chroot('..') but there are other ways to do it
Trying Out Namespaces ⌘
- C programmers use these system calls:
- To create a new namespace for the current process: unshare
- To create a new process in a new namespace: clone
- To enter an existing namespace: setns
- Shell utilities:
- unshare: creates new namespaces, runs a shell (or anything else) there
- nsenter: enters existing namespaces
Some examples ⌘
- Bad example: PID namespace created but /proc still refers to the old one
# unshare --fork -p /bin/bash # echo $$ 1 # pstree ... lots of processes ... # exit
- Proper way to create PID namespace
- Combine it with a mount namespace
- Mount a private copy of /proc
# unshare --fork -p -m --mount-proc /bin/bash # pstree bash───pstree # exit
Exercise ⌘
- Create PID and mount namespaces for a new bash process
- Verify by mounting a tmpfs on /mnt that the mount namespace works
- Enter this namespace with nsenter
Network namespaces ⌘
- Separate set of interfaces, routing tables, iptables/ip6tables rules
- How to create:
# ip netns add <namespace_name>
- How to move an interface (cannot be undone):
# ip link set <interface> netns <namespace_name>
- How to run commands:
# ip netns exec <namespace_name> <command> <arguments...>
- Clean up:
# ip netns del <namespace_name>
Exercise ⌘
- Connect to a free OpenVPN server: https://www.vpnkeys.com/get-free-vpn-instantly/
- Known issue: MTU must be limited
- Don't allow openvpn to pull routes, as this may interfere with VNC
- Set the DNS server to 8.8.8.8 on the host
- Move the tun0 interface to a separate network namespace
- Set up tun0 as the default route in that namespace
- Run the other browser there, as your user
- Verify that you are accessing the internet via the VPN from the namespace
- Can you create a nested connection to the same VPN server?
- Try to predict what happens if you stop the VPN
Cgroups ⌘
- Hierarchical organization of processes in a system
- Used e.g. by systemd to track when a service terminates
- Resource limiting via controllers
- CPU, ram, block i/o, bandwidth, ...
- Controlled via a special filesystem, cgroupfs
- Documented e.g. in:
- RedHat Linux Resource Management Guide
- man 7 cgroups
Cgroup security model ⌘
- To move a task to a different cgroup, write access to the target cgroup is required
- This means: root can escape, i.e. evade resource limits
Controlling the cgroup hierarchy ⌘
- Direct way: you can modify files under /sys/fs/cgroup
- Convenient shell wrappers: cgcreate, cgexec
- Parts of cgroup-tools in Ubuntu, libcgroup-tools in CentOS
- Use lscgroup to see the available cgroups
Now you understand bocker! ⌘
- Bocker is a toy container engine
- ~100 lines of bash code
- Read the code, play with it
- bocker pull doesn't work due to Docker registry API change, use bocker init instead
- Debootstrap a directory, configure openssh-server there, create an image and a container from it
- See if you can ssh there
- Can you escape?