Today in bite sized Linux pieces, I want to talk about how Linux boots.

When a Linux Kernel first boots, there is a bunch of behind the scenes work that gets you to the point that you can type in your login credentials. This is the first of a few posts in which I want to cover what happens, from when the Kernel first hands over to user space up to a login prompt.

There’s a number of articles already about the Kernel boot process, how it initialises its data structures, starts the idle process etc, but I don’t want to cover that. I want to talk about User Land - the things that us mere mortals can control, without having to delve (too much) into the Kernel source.

In this entry, we’ll cover the first step of the process - how the kernel goes into User Space by executing an init process, and into some systemd processes (because systemd is the main init system these days, for better or for worse)

In the beginning, there was init

Let’s start at the beginning. Not the real beginning - I don’t want to get too much into Kernel internals (I’ll save that for another post), but the beginning of user space - the init process.

The init process, commonly referred to as “PID 1” (Process ID 1), or just “systemd” as most modern Linux distributions use systemd as an init process, is responsible for setting up the rest of userspace. So how does this process get started? Well there’s two main flows.

initramfs

All (read: most, there seems to be a small trend against initramfs’) modern Kernels contain a CPIO archive called the “initramfs”. The initramfs is extracted into memory by the kernel, which then attempts to execute /init 1:

	if (ramdisk_execute_command) {
		ret = run_init_process(ramdisk_execute_command);
		if (!ret)
			return 0;
		pr_err("Failed to execute %s (error %d)\n",
		       ramdisk_execute_command, ret);
	}

Where ramdisk_execute_command is “/init”

run_init_process 2 sets the first argument to our init process to the location of the executable, debug prints the arguments and environments variables, and then finally executes our program (with kernel_execve).

static int run_init_process(const char *init_filename)
{
	const char *const *p;

	argv_init[0] = init_filename;
	pr_info("Run %s as init process\n", init_filename);
	pr_debug("  with arguments:\n");
	for (p = argv_init; *p; p++)
		pr_debug("    %s\n", *p);
	pr_debug("  with environment:\n");
	for (p = envp_init; *p; p++)
		pr_debug("    %s\n", *p);
	return kernel_execve(init_filename, argv_init, envp_init);
}

And just like that, we have a PID 1 - our first program!

The role of /init in our initramfs is very simple: Load the real file system. How it does that is very system specific specific, but most modern Linux distributions will hand this over to systemd

The “Legacy” flow

For completeness, we can talk about the “legacy” flow that the kernel follows. I say legacy, but there seems to be a growing trend of people who don’t want to run an initramfs in order to improve boot times, so this flow might make a resurgence in the future.

If the execution of /init fails, then the kernel falls through to another attempt to load an init process. Right at the end of kernel_init, the kernel attempts to handover to user space by going through a number of secondary guesses at where an init executable might be 3 :

	if (CONFIG_DEFAULT_INIT[0] != '\0') {
		ret = run_init_process(CONFIG_DEFAULT_INIT);
		if (ret)
			pr_err("Default init %s failed (error %d)\n",
			       CONFIG_DEFAULT_INIT, ret);
		else
			return 0;
	}

	if (!try_to_run_init_process("/sbin/init") ||
	    !try_to_run_init_process("/etc/init") ||
	    !try_to_run_init_process("/bin/init") ||
	    !try_to_run_init_process("/bin/sh"))
		return 0;

Here’s a jist of what that’s doing:

  • If the DEFAULT_INIT Kernel config variable is set, then attempt to execute that as the init process
  • Otherwise run through some sensible defaults (/sbin/init, /etc/init, /bin/init) and try to execute each of those in turn
  • If all those fail then attempt to dump the user into a shell so they can debug this very broken system

As above, most Linux systems these days will have systemd as their init process. On my system, /sbin/init is symlinked:

09/01/2022 14:31:39 AEDT❯ ls -l /sbin/init
lrwxrwxrwx. 1 root root 22 Nov 16 00:21 /sbin/init -> ../lib/systemd/systemd

How systemd executes systemd

If we follow the initramfs flow from above, we have an extra step to go from that to a fully fledged system. How this works is again, system dependent, but let’s go through how systemd in particular works.

systemd operates with “targets”. These targets in and of themselves do nothing, but other services can mark themselves in relation to them. This means that, for example, a service to initialise network devices can hook into network.target to get started when we enter that level.You might know these targets as “run levels”, the concept that systemd has replaced, but the serve the same function.

The difficulty in working out what happens in our initramfs is that systemds targets in the initramfs are actually different than on our main system! That means we’re going to have to look into our initramfs in order to work out whats going on.

Thankfully, Linux distributions store their initramfs’ in /boot, so we can just grab one out and extract it:

~/tmp                                                                                   
09/01/2022 17:20:43 AEDT❯ sudo cp /boot/initramfs-5.14.18-200.fc34.x86_64.img .         
                                                                                        
~/tmp                                                                                   
09/01/2022 17:20:45 AEDT❯ sudo chown colin:colin ./initramfs-5.14.18-200.fc34.x86_64.img

09/01/2022 17:22:09 AEDT❯ cpio -i < initramfs-5.14.18-200.fc34.x86_64.img
410 blocks                                                               
                                                                         
~/tmp                                                                    
09/01/2022 17:22:13 AEDT❯ ls                                             
early_cpio  initramfs-5.14.18-200.fc34.x86_64.img  kernel                

But wait!? Where’s our /init? Where’s our whole file system? This is the first interesting part about the initramfs, it’s actually two cpio archives back to back. One normal, and one gzipped. We can properly extract it like so:

~/tmp 
09/01/2022 17:27:17 AEDT❯ (cpio -i; zcat | cpio -i) < initramfs-5.14.18-200.fc34.x86_64.img
410 blocks
139482 blocks

~/tmp 
09/01/2022 17:27:23 AEDT❯ ls
bin  dev  early_cpio  etc  init  initramfs-5.14.18-200.fc34.x86_64.img  kernel  lib  lib64  proc  root  run  sbin  shutdown  sys  sysroot  tmp  usr  var

Much better! Now we have something to work with.

systemd stores its targets in /usr/lib/systemd/system. In particular, the default.target symlinks to the target that will be started by default when systemd starts:

09/01/2022 17:28:04 AEDT❯ ls usr/lib/systemd/system/default.target -l
lrwxrwxrwx. 1 colin colin 13 Jan  9 17:27 usr/lib/systemd/system/default.target -> initrd.target

So initrd.target is where we need to look. Rather than going into all the files and dependencies, we can use the systemd-analyze tool, that spits out helpful dependency graphs that we can look at. Here’s the one for initrd.target:

The dependency graph for initrd.target

The green arrows there are “depends on” relationships, so we can see that to start initrd.target, we need to start, in particular initrd-root-fs.target - that sounds like it does what we want! The name implies that it loads the root file system, which will be used to replace the current initramfs. Let’s take a look.

initrd-root-fs.target has a few different services that hook into it, all different methods for loading the file system. For example, on blank systems, we have systemd-repart.service that can repartition the root device, or systemd-volatile-root.service that converts the root device into a volatile (tmpfs) system. My current system uses ostree-prepare-root.service to load the operating system from the ostree system. All of these methods have the same outcome though - one way or another, they mount the root file system to the /sysroot directory.

Once the root filesystem has been loaded, there’s only one last step before we are into our system (at least at a file system level). That is initrd-switch-root.service. This does only one thing - calls systemd to use the pivot root syscall to change the root to the /sysroot directory.

Once that’s done, we’re officially in our main file system - we can start loading the users settings! But beyond here is for next time, this is already getting a bit long!

What have we learned?

Over the course of this post, we’ve covered quite a bit! We’ve looked at the two methods to startup the system - initramfs and the legacy flow, we’ve looked at what exactly is in our initramfs, and we’ve looked at what systemd does in particular to ready the system to boot. In future posts, I want to go further into ttys, how ttys are readied for use, and what a login actually looks like on the backend. Stay tuned!

I'm on Twitter: @sinkingpoint and BlueSky: @colindou.ch. Come yell at me!