(mis)Optimization

Reads/writes to registers are quite special. I may even dare to say that they are embodiment of side effects. In the previous example we wrote four different values to the same register. If you didn't know that address was a register, you may have simplified the logic to just write the final value 1 << (11 + 16) into the register.

Actually, LLVM, the compiler's backend / optimizer, does not know we are dealing with a register and will merge the writes thus changing the behavior of our program. Let's check that really quick.

$ cargo run --release (..) Breakpoint 1, registers::__cortex_m_rt_main_trampoline () at src/07-registers/src/main.rs:7 7 #[entry] (gdb) step registers::__cortex_m_rt_main () at src/07-registers/src/main.rs:9 9 aux7::init(); (gdb) next 25 *(GPIOE_BSRR as *mut u32) = 1 << (11 + 16); (gdb) disassemble /m Dump of assembler code for function _ZN9registers18__cortex_m_rt_main17h45b1ef53e18aa8d0E: 8 fn main() -> ! { 0x08000248 <+0>: push {r7, lr} 0x0800024a <+2>: mov r7, sp 9 aux7::init(); 0x0800024c <+4>: bl 0x8000260 <aux7::init> 0x08000250 <+8>: movw r0, #4120 ; 0x1018 0x08000254 <+12>: mov.w r1, #134217728 ; 0x8000000 0x08000258 <+16>: movt r0, #18432 ; 0x4800 10 11 unsafe { 12 // A magic address! 13 const GPIOE_BSRR: u32 = 0x48001018; 14 15 // Turn on the "North" LED (red) 16 *(GPIOE_BSRR as *mut u32) = 1 << 9; 17 18 // Turn on the "East" LED (green) 19 *(GPIOE_BSRR as *mut u32) = 1 << 11; 20 21 // Turn off the "North" LED 22 *(GPIOE_BSRR as *mut u32) = 1 << (9 + 16); 23 24 // Turn off the "East" LED 25 *(GPIOE_BSRR as *mut u32) = 1 << (11 + 16); => 0x0800025c <+20>: str r1, [r0, #0] 0x0800025e <+22>: b.n 0x800025e <registers::__cortex_m_rt_main+22> End of assembler dump.

The state of the LEDs didn't change this time! The str instruction is the one that writes a value to the register. Our debug (unoptimized) program had four of them, one for each write to the register, but the release (optimized) program only has one.

We can check that using objdump and capture the output to out.asm:

# same as cargo objdump -- -d --no-show-raw-insn --print-imm-hex --source target/thumbv7em-none-eabihf/debug/registers cargo objdump --bin registers -- -d --no-show-raw-insn --print-imm-hex --source > debug.txt

Then examine debug.txt looking for main and we see the 4 str instructions:

080001ec <main>: ; #[entry] 80001ec: push {r7, lr} 80001ee: mov r7, sp 80001f0: bl #0x2 80001f4: trap 080001f6 <registers::__cortex_m_rt_main::hc2e3436fa38cd6f2>: ; fn main() -> ! { 80001f6: push {r7, lr} 80001f8: mov r7, sp ; aux7::init(); 80001fa: bl #0x3e 80001fe: b #-0x2 <registers::__cortex_m_rt_main::hc2e3436fa38cd6f2+0xa> ; *(GPIOE_BSRR as *mut u32) = 1 << 9; 8000200: movw r0, #0x2640 8000204: movt r0, #0x800 8000208: ldr r0, [r0] 800020a: movw r1, #0x1018 800020e: movt r1, #0x4800 8000212: str r0, [r1] ; *(GPIOE_BSRR as *mut u32) = 1 << 11; 8000214: movw r0, #0x2648 8000218: movt r0, #0x800 800021c: ldr r0, [r0] 800021e: str r0, [r1] ; *(GPIOE_BSRR as *mut u32) = 1 << (9 + 16); 8000220: movw r0, #0x2650 8000224: movt r0, #0x800 8000228: ldr r0, [r0] 800022a: str r0, [r1] ; *(GPIOE_BSRR as *mut u32) = 1 << (11 + 16); 800022c: movw r0, #0x2638 8000230: movt r0, #0x800 8000234: ldr r0, [r0] 8000236: str r0, [r1] ; loop {} 8000238: b #-0x2 <registers::__cortex_m_rt_main::hc2e3436fa38cd6f2+0x44> 800023a: b #-0x4 <registers::__cortex_m_rt_main::hc2e3436fa38cd6f2+0x44> (..)

How do we prevent LLVM from misoptimizing our program? We use volatile operations instead of plain reads/writes:

#![no_main] #![no_std] use core::ptr; #[allow(unused_imports)] use aux7::entry; #[entry] fn main() -> ! { aux7::init(); unsafe { // A magic address! const GPIOE_BSRR: u32 = 0x48001018; // Turn on the "North" LED (red) ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << 9); // Turn on the "East" LED (green) ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << 11); // Turn off the "North" LED ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << (9 + 16)); // Turn off the "East" LED ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << (11 + 16)); } loop {} }

Generate release.txt using with --release mode.

cargo objdump --release --bin registers -- -d --no-show-raw-insn --print-imm-hex --source > release.txt

Now find the main routine in release.txt and we see the 4 str instructions.

0800023e <main>: ; #[entry] 800023e: push {r7, lr} 8000240: mov r7, sp 8000242: bl #0x2 8000246: trap 08000248 <registers::__cortex_m_rt_main::h45b1ef53e18aa8d0>: ; fn main() -> ! { 8000248: push {r7, lr} 800024a: mov r7, sp ; aux7::init(); 800024c: bl #0x22 8000250: movw r0, #0x1018 8000254: mov.w r1, #0x200 8000258: movt r0, #0x4800 ; intrinsics::volatile_store(dst, src); 800025c: str r1, [r0] 800025e: mov.w r1, #0x800 8000262: str r1, [r0] 8000264: mov.w r1, #0x2000000 8000268: str r1, [r0] 800026a: mov.w r1, #0x8000000 800026e: str r1, [r0] 8000270: b #-0x4 <registers::__cortex_m_rt_main::h45b1ef53e18aa8d0+0x28> (..)

We see that the four writes (str instructions) are preserved. If you run it using gdb you'll also see that we get the expected behavior.

NB: The last next will endlessly execute loop {}, use Ctrl-c to get back to the (gdb) prompt.

$ cargo run --release (..) Breakpoint 1, registers::__cortex_m_rt_main_trampoline () at src/07-registers/src/main.rs:9 9 #[entry] (gdb) step registers::__cortex_m_rt_main () at src/07-registers/src/main.rs:11 11 aux7::init(); (gdb) next 18 ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << 9); (gdb) next 21 ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << 11); (gdb) next 24 ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << (9 + 16)); (gdb) next 27 ptr::write_volatile(GPIOE_BSRR as *mut u32, 1 << (11 + 16)); (gdb) next ^C Program received signal SIGINT, Interrupt. 0x08000270 in registers::__cortex_m_rt_main () at ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1124 1124 intrinsics::volatile_store(dst, src); (gdb)