Why don't we initialize .data and .bss using Rust
Earlier versions of this book initialized the .data and .bss sections using Rust code.
This has proven to have questionable soundness, and the recommended method of
performing the initialization of these sections nowadays relies on assembly.
This chapter discusses the reasons that led to the decision of various crates like cortex-m-rt and riscv-rt to migrate to performing assembly initialization of these sections. There are a decent number of threads where the soundness of such code has been questioned. We will summarize them in this chapter.
The original code used for global data initialization in Rust in this book is listed as follows:
#![no_std] use core::panic::PanicInfo; use core::ptr; #[unsafe(no_mangle)] #[allow(static_mut_refs)] pub unsafe extern "C" fn Reset() -> ! { // NEW! // Initialize RAM unsafe extern "C" { static mut _sbss: u8; static mut _ebss: u8; static mut _sdata: u8; static mut _edata: u8; static _sidata: u8; } let count = unsafe { &_ebss as *const u8 as usize - &_sbss as *const u8 as usize }; unsafe { ptr::write_bytes(&mut _sbss as *mut u8, 0, count) }; let count = unsafe { &_edata as *const u8 as usize - &_sdata as *const u8 as usize }; unsafe { ptr::copy_nonoverlapping(&_sidata as *const u8, &mut _sdata as *mut u8, count) }; // Call user entry point unsafe extern "Rust" { safe fn main() -> !; } main() }
Five extern "C" variables are declared to reference specific memory locations.
Our linker script defines each symbol, so we do not need to worry about their
exact placement.
Pointer proventace
To initialize the .bss section, we take the address of _sbss u8 variable,
which points to the start of the .bss section. Then we write an arbitrary
amount of data to its location. _sbss is declared as an u8 variables, and
the pointer provenance rules only allow us to write an amount of data that fits
within the allocation of our _sbss variable. Despite that, we are writing past
the single byte (as far as Rust is aware, a single byte is allocated at this
address) up until we hit the location of the _ebss.
There is a separate issue in which we actually have an _ebss variable that is
pointing one byte outside of the .bss section. In specific implementations,
accessing this byte might not even be possible if the .bss section exhausted
the available memory. Ideally _ebss needs to be declared as a ZST. And by
extension, because the .bss section can be empty, _sbss should also be a
ZST, because in this case _sbss would also fall outside of the region reserved
for the .bss.
Aliasing
Another potential problem with the code above is aliasing. Consider our linker script.
.bss :
{
_sbss = .;
*(.bss .bss.*);
_ebss = .;
} > RAM
.data : AT(ADDR(.rodata) + SIZEOF(.rodata))
{
_sdata = .;
*(.data .data.*);
_edata = .;
} > RAM
The following situations can occur:
_sbssmight be located at the same address as the first variable in the.bsssection, assuming that the section is not empty._ebsswill be located at the same address as_sdata, and by extension, it will also be located at the same address as the first variable in the.datasection.- If the
.bsssection is empty, both_sbssand_ebsswill alias each other. - If the
.datasection is empty, both_sdataand_edatawill alias each other.
Rust does not allow to have more than one variable to be located at the same address (with ZSTs being a key exception). But even if it did, we are using these variables to write the whole global memory area, which effectively is mutably aliasing all global data defined in the program.
Abstract machine initialization
Another question is whether it is safe to enter any Rust code before the Rust abstract machine has been fully initialized. Can we rely on Rust not using any of the global memory while it is not yet initialized? The answer to this question is not clear (or does not seem clear to the author of the section at the time of this writing).
More potential provenance issues
A clever reader might have seen how we compute the offset between _ebss and _sbss and thought,
couldn't we instad use the offset_from
method of a pointer?
The problem with this approach, however, is that, as we mentioned above, both _ebss
and _sbss belong to different allocations, so they do not share the same pointer
provenance. This is true even if they both are aliased and happen to fall at the
same address (i.e. when the .bss section is empty).
Running Miri on this Rust Playground Snippet shows the undefined behavior.
Ok, but it works, doesn't it?
Yes. While the code provided at the beginning of this chapter does produce the right behavior as of Rust 1.89, the problem is that we cannot rely on this behavior being preserved in future releases, or even in the optimizer doing something funky in the future.
That is why, overall, the recommendation of this books is to not perform the initialization using Rust code for this purpose.