Not Only Docker

From Training Material
Jump to navigation Jump to search

<slideshow style="nobleprog" headingmark="⌘" incmark="…" scaled="true" font="Trebuchet MS" >

title
Linux Containers: Not Only Docker
author
Alexander Patrakov

</slideshow>

Virtualization ⌘

  • Run another OS (including another copy of the kernel) in a virtual machine
    • OK to run Windows VM on Linux
  • Good level of security & isolation
    • Separate virtual disks, network stack
    • Hacked VM != hacked host
    • Can run any OS
  • A lot of performance overhead related to emulation of virtual hardware
    • Somewhat solved by paravirtualization
    • Virtual machines eat memory for caches - partially solved by ballooning

Containerization ⌘

  • Running another tree of processes under the same kernel
    • They don't see other processes
    • They have their own view of the filesystem
    • They have their own copy of the network stack
    • They look and feel like a virtual machine in some sense
  • No overhead
    • Native performance of e.g. disks
    • Only as much memory occupied as when running directly on the host
  • Somewhat less secure than a virtual machine
    • But still good enough for many use cases
  • Good only for runnng Linux software

Why people use containers and VMs ⌘

  • Reproducible deployment of services
    • Just copy a VM or container image onto a host, and start it
    • No need to configure the application inside
    • No need to worry about compatibility with the host
  • Multi-tenant setups
    • Isolation and security matter here
  • Docker
    • Many people don't know that alternatives even exist!
  • Everything else server-side
    • LXC/LXD
    • systemd-nspawn
    • runc
    • Rocket
  • On the desktop
    • Flatpak
    • Snap

Container building blocks ⌘

  • Namespaces
    • man 7 namespaces
    • UTS, PID, User, Mount, Network, IPC, Cgroup
  • Cgroups
  • Management software

Chroot ⌘

  • A mechanism that allows to change the view of the filesystem
    • Some existing directory is treated as root directory for a process
    • Chroots are entered using a chroot() system call
      • Not bulletproof - root can escape
      • No other security - processes in a chroot can send signals and use network as usual

Let's create a chroot ⌘

  • Chroot with Debian or Ubuntu:
    • Use debootstrap, it is packaged for many distributions
    • debootstrap jessie /tmp/jessie
  • Chroot with Fedora/CentOS/OpenSUSE:
    • Use rinse, it is packaged for Debian and Ubuntu
      • On other distributions install from source
      • If you are on CentOS/Fedora and want to create a chroot of the same kind, you can use supermin instead
    • rinse --arch amd64 --directory /tmp/centos --distribution centos-7

Entering a chroot ⌘

mount -t proc proc /tmp/jessie/proc
mount -t sysfs sysfs /tmp/jessie/sys
chroot /tmp/jessie
. /etc/profile   # to set the correct $PATH
ls               # runs in a chroot
exit

Some thoughts about chroots ⌘

  • Chroot is a privileged operation
    • Can you think why?
  • Chroot can be escaped from by root
    • Let's say "escaped" means "/etc/hacked" created outside of the chroot
    • Can you think of a few ways?

Trying Out Namespaces ⌘

  • C programmers use these system calls:
    • To create a new namespace for the current process: unshare
    • To create a new process in a new namespace: clone
    • To enter an existing namespace: setns
  • Shell utilities:
    • unshare: creates new namespaces, runs a shell (or anything else) there
    • nsenter: enters existing namespaces

Some examples ⌘

  • Bad example: PID namespace created but /proc still refers to the old one
# unshare --fork -p /bin/bash
# echo $$
1
# pstree
... lots of processes ...
# exit
  • Proper way to create PID namespace
    • Combine it with a mount namespace
    • Mount a private copy of /proc
# unshare --fork -p -m --mount-proc /bin/bash
# pstree
bash───pstree
# exit

Exercise ⌘

  • Create PID and mount namespaces for a new bash process
  • Verify by mounting a tmpfs on /mnt that the mount namespace works
  • Enter this namespace with nsenter

Network namespaces ⌘

  • Separate set of interfaces, routing tables, iptables/ip6tables rules
  • How to create:
# ip netns add <namespace_name>
  • How to move an interface (cannot be undone):
# ip link set <interface> netns <namespace_name>
  • How to run commands:
# ip netns exec <namespace_name> <command> <arguments...>
  • Clean up:
# ip netns del <namespace_name>

Exercise ⌘

  • Connect to a free OpenVPN server: https://www.vpnkeys.com/get-free-vpn-instantly/
    • Known issue: MTU must be limited
    • Don't allow openvpn to pull routes, as this may interfere with VNC
    • Set the DNS server to 8.8.8.8 on the host
  • Move the tun0 interface to a separate network namespace
  • Set up tun0 as the default route in that namespace
  • Run the other browser there, as your user
    • Verify that you are accessing the internet via the VPN from the namespace
  • Can you create a nested connection to the same VPN server?
  • Try to predict what happens if you stop the VPN

Cgroups ⌘

  • Hierarchical organization of processes in a system
    • Used e.g. by systemd to track when a service terminates
  • Resource limiting via controllers
    • CPU, ram, block i/o, bandwidth, ...
  • Controlled via a special filesystem, cgroupfs
  • Documented e.g. in:

Cgroup security model ⌘

  • To move a task to a different cgroup, write access to the target cgroup is required
    • This means: root can escape, i.e. evade resource limits

Controlling the cgroup hierarchy ⌘

  • Direct way: you can modify files under /sys/fs/cgroup
  • Convenient shell wrappers: cgcreate, cgexec
    • Parts of cgroup-tools in Ubuntu, libcgroup-tools in CentOS
    • Use lscgroup to see the available cgroups

Now you understand bocker! ⌘

  • Bocker is a toy container engine
    • ~100 lines of bash code
  • Read the code, play with it
    • bocker pull doesn't work due to Docker registry API change, use bocker init instead
    • Debootstrap a directory, configure openssh-server there, create an image and a container from it
    • See if you can ssh there
    • Can you escape?