Direct Memory Access (DMA)

This section covers the core requirements for building a memory safe API around DMA transfers.

The DMA peripheral is used to perform memory transfers in parallel to the work of the processor (the execution of the main program). A DMA transfer is more or less equivalent to spawning a thread (see thread::spawn) to do a memcpy. We'll use the fork-join model to illustrate the requirements of a memory safe API.

Consider the following DMA primitives:


#![allow(unused)]
fn main() {
/// A singleton that represents a single DMA channel (channel 1 in this case)
///
/// This singleton has exclusive access to the registers of the DMA channel 1
pub struct Dma1Channel1 {
    // ..
}

impl Dma1Channel1 {
    /// Data will be written to this `address`
    ///
    /// `inc` indicates whether the address will be incremented after every byte transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_destination_address(&mut self, address: usize, inc: bool) {
        // ..
    }

    /// Data will be read from this `address`
    ///
    /// `inc` indicates whether the address will be incremented after every byte transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_source_address(&mut self, address: usize, inc: bool) {
        // ..
    }

    /// Number of bytes to transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_transfer_length(&mut self, len: usize) {
        // ..
    }

    /// Starts the DMA transfer
    ///
    /// NOTE this performs a volatile write
    pub fn start(&mut self) {
        // ..
    }

    /// Stops the DMA transfer
    ///
    /// NOTE this performs a volatile write
    pub fn stop(&mut self) {
        // ..
    }

    /// Returns `true` if there's a transfer in progress
    ///
    /// NOTE this performs a volatile read
    pub fn in_progress() -> bool {
        // ..
    }
}
}

Assume that the Dma1Channel1 is statically configured to work with serial port (AKA UART or USART) #1, Serial1, in one-shot mode (i.e. not circular mode). Serial1 provides the following blocking API:


#![allow(unused)]
fn main() {
/// A singleton that represents serial port #1
pub struct Serial1 {
    // ..
}

impl Serial1 {
    /// Reads out a single byte
    ///
    /// NOTE: blocks if no byte is available to be read
    pub fn read(&mut self) -> Result<u8, Error> {
        // ..
    }

    /// Sends out a single byte
    ///
    /// NOTE: blocks if the output FIFO buffer is full
    pub fn write(&mut self, byte: u8) -> Result<(), Error> {
        // ..
    }
}
}

Let's say we want to extend Serial1 API to (a) asynchronously send out a buffer and (b) asynchronously fill a buffer.

We'll start with a memory unsafe API and we'll iterate on it until it's completely memory safe. On each step we'll show you how the API can be broken to make you aware of the issues that need to be addressed when dealing with asynchronous memory operations.

A first stab

For starters, let's try to use the Write::write_all API as a reference. To keep things simple let's ignore all error handling.


#![allow(unused)]
fn main() {
/// A singleton that represents serial port #1
pub struct Serial1 {
    // NOTE: we extend this struct by adding the DMA channel singleton
    dma: Dma1Channel1,
    // ..
}

impl Serial1 {
    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<'a>(mut self, buffer: &'a [u8]) -> Transfer<&'a [u8]> {
        self.dma.set_destination_address(USART1_TX, false);
        self.dma.set_source_address(buffer.as_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        self.dma.start();

        Transfer { buffer }
    }
}

/// A DMA transfer
pub struct Transfer<B> {
    buffer: B,
}

impl<B> Transfer<B> {
    /// Returns `true` if the DMA transfer has finished
    pub fn is_done(&self) -> bool {
        !Dma1Channel1::in_progress()
    }

    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(self) -> B {
        // Busy wait until the transfer is done
        while !self.is_done() {}

        self.buffer
    }
}
}

NOTE: Transfer could expose a futures or generator based API instead of the API shown above. That's an API design question that has little bearing on the memory safety of the overall API so we won't delve into it in this text.

We can also implement an asynchronous version of Read::read_exact.


#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<'a>(&mut self, buffer: &'a mut [u8]) -> Transfer<&'a mut [u8]> {
        self.dma.set_source_address(USART1_RX, false);
        self.dma
            .set_destination_address(buffer.as_mut_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        self.dma.start();

        Transfer { buffer }
    }
}
}

Here's how to use the write_all API:


#![allow(unused)]
fn main() {
fn write(serial: Serial1) {
    // fire and forget
    serial.write_all(b"Hello, world!\n");

    // do other stuff
}
}

And here's an example of using the read_exact API:


#![allow(unused)]
fn main() {
fn read(mut serial: Serial1) {
    let mut buf = [0; 16];
    let t = serial.read_exact(&mut buf);

    // do other stuff

    t.wait();

    match buf.split(|b| *b == b'\n').next() {
        Some(b"some-command") => { /* do something */ }
        _ => { /* do something else */ }
    }
}
}

`mem::forget`

mem::forget is a safe API. If our API is truly safe then we should be able to use both together without running into undefined behavior. However, that's not the case; consider the following example:


#![allow(unused)]
fn main() {
fn unsound(mut serial: Serial1) {
    start(&mut serial);
    bar();
}

#[inline(never)]
fn start(serial: &mut Serial1) {
    let mut buf = [0; 16];

    // start a DMA transfer and forget the returned `Transfer` value
    mem::forget(serial.read_exact(&mut buf));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

Here we start a DMA transfer, in start, to fill an array allocated on the stack and then mem::forget the returned Transfer value. Then we proceed to return from start and execute the function bar.

This series of operations results in undefined behavior. The DMA transfer writes to stack memory but that memory is released when start returns and then reused by bar to allocate variables like x and y. At runtime this could result in variables x and y changing their value at random times. The DMA transfer could also overwrite the state (e.g. link register) pushed onto the stack by the prologue of function bar.

Note that if we had not use mem::forget, but mem::drop, it would have been possible to make Transfer's destructor stop the DMA transfer and then the program would have been safe. But one can not rely on destructors running to enforce memory safety because mem::forget and memory leaks (see RC cycles) are safe in Rust.

We can fix this particular problem by changing the lifetime of the buffer from 'a to 'static in both APIs.


#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact(&mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        // .. same as before ..
    }
}
}

If we try to replicate the previous problem we note that mem::forget no longer causes problems.


#![allow(unused)]
fn main() {
#[allow(dead_code)]
fn sound(mut serial: Serial1, buf: &'static mut [u8; 16]) {
    // NOTE `buf` is moved into `foo`
    foo(&mut serial, buf);
    bar();
}

#[inline(never)]
fn foo(serial: &mut Serial1, buf: &'static mut [u8]) {
    // start a DMA transfer and forget the returned `Transfer` value
    mem::forget(serial.read_exact(buf));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

As before, the DMA transfer continues after mem::forget-ing the Transfer value. This time that's not an issue because buf is statically allocated (e.g. static mut variable) and not on the stack.

Overlapping use

Our API doesn't prevent the user from using the Serial interface while the DMA transfer is in progress. This could lead the transfer to fail or data to be lost.

There are several ways to prevent overlapping use. One way is to have Transfer take ownership of Serial1 and return it back when wait is called.


#![allow(unused)]
fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    buffer: B,
    // NOTE: added
    serial: Serial1,
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    // NOTE: the return value has changed
    pub fn wait(self) -> (B, Serial1) {
        // Busy wait until the transfer is done
        while !self.is_done() {}

        (self.buffer, self.serial)
    }

    // ..
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    // NOTE we now take `self` by value
    pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        // .. same as before ..

        Transfer {
            buffer,
            // NOTE: added
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    // NOTE we now take `self` by value
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        // .. same as before ..

        Transfer {
            buffer,
            // NOTE: added
            serial: self,
        }
    }
}
}

The move semantics statically prevent access to Serial1 while the transfer is in progress.


#![allow(unused)]
fn main() {
fn read(serial: Serial1, buf: &'static mut [u8; 16]) {
    let t = serial.read_exact(buf);

    // let byte = serial.read(); //~ ERROR: `serial` has been moved

    // .. do stuff ..

    let (serial, buf) = t.wait();

    // .. do more stuff ..
}
}

There are other ways to prevent overlapping use. For example, a (Cell) flag that indicates whether a DMA transfer is in progress could be added to Serial1. When the flag is set read, write, read_exact and write_all would all return an error (e.g. Error::InUse) at runtime. The flag would be set when write_all / read_exact is used and cleared in Transfer.wait.

Compiler (mis)optimizations

The compiler is free to re-order and merge non-volatile memory operations to better optimize a program. With our current API, this freedom can lead to undefined behavior. Consider the following example:


#![allow(unused)]
fn main() {
fn reorder(serial: Serial1, buf: &'static mut [u8]) {
    // zero the buffer (for no particular reason)
    buf.iter_mut().for_each(|byte| *byte = 0);

    let t = serial.read_exact(buf);

    // ... do other stuff ..

    let (buf, serial) = t.wait();

    buf.reverse();

    // .. do stuff with `buf` ..
}
}

Here the compiler is free to move buf.reverse() before t.wait(), which would result in a data race: both the processor and the DMA would end up modifying buf at the same time. Similarly the compiler can move the zeroing operation to after read_exact, which would also result in a data race.

To prevent these problematic reorderings we can use a compiler_fence


#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        self.dma.set_source_address(USART1_RX, false);
        self.dma
            .set_destination_address(buffer.as_mut_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        // NOTE: added
        atomic::compiler_fence(Ordering::Release);

        // NOTE: this is a volatile *write*
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        self.dma.set_destination_address(USART1_TX, false);
        self.dma.set_source_address(buffer.as_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        // NOTE: added
        atomic::compiler_fence(Ordering::Release);

        // NOTE: this is a volatile *write*
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(self) -> (B, Serial1) {
        // NOTE: this is a volatile *read*
        while !self.is_done() {}

        // NOTE: added
        atomic::compiler_fence(Ordering::Acquire);

        (self.buffer, self.serial)
    }

    // ..
}
}

We use Ordering::Release in read_exact and write_all to prevent all preceding memory operations from being moved after self.dma.start(), which performs a volatile write.

Likewise, we use Ordering::Acquire in Transfer.wait to prevent all subsequent memory operations from being moved before self.is_done(), which performs a volatile read.

To better visualize the effect of the fences here's a slightly tweaked version of the example from the previous section. We have added the fences and their orderings in the comments.


#![allow(unused)]
fn main() {
fn reorder(serial: Serial1, buf: &'static mut [u8], x: &mut u32) {
    // zero the buffer (for no particular reason)
    buf.iter_mut().for_each(|byte| *byte = 0);

    *x += 1;

    let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲

    // NOTE: the processor can't access `buf` between the fences
    // ... do other stuff ..
    *x += 2;

    let (buf, serial) = t.wait(); // compiler_fence(Ordering::Acquire) ▼

    *x += 3;

    buf.reverse();

    // .. do stuff with `buf` ..
}
}

The zeroing operation can not be moved after read_exact due to the Release fence. Similarly, the reverse operation can not be moved before wait due to the Acquire fence. The memory operations between both fences can be freely reordered across the fences but none of those operations involves buf so such reorderings do not result in undefined behavior.

Note that compiler_fence is a bit stronger than what's required. For example, the fences will prevent the operations on x from being merged even though we know that buf doesn't overlap with x (due to Rust aliasing rules). However, there exist no intrinsic that's more fine grained than compiler_fence.

Don't we need a memory barrier?

That depends on the target architecture. In the case of Cortex M0 to M4F cores, AN321 says:

3.2 Typical usages

(..)

The use of DMB is rarely needed in Cortex-M processors because they do not reorder memory transactions. However, it is needed if the software is to be reused on other ARM processors, especially multi-master systems. For example:

DMA controller configuration. A barrier is required between a CPU memory access and a DMA operation.

(..)

4.18 Multi-master systems

(..)

Omitting the DMB or DSB instruction in the examples in Figure 41 on page 47 and Figure 42 would not cause any error because the Cortex-M processors:

do not re-order memory transfers

do not permit two write transfers to be overlapped.

Where Figure 41 shows a DMB (memory barrier) instruction being used before starting a DMA transaction.

In the case of Cortex-M7 cores you'll need memory barriers (DMB/DSB) if you are using the data cache (DCache), unless you manually invalidate the buffer used by the DMA. Even with the data cache disabled, memory barriers might still be required to avoid reordering in the store buffer.

If your target is a multi-core system then it's very likely that you'll need memory barriers.

If you do need the memory barrier then you need to use atomic::fence instead of compiler_fence. That should generate a DMB instruction on Cortex-M devices.

Generic buffer

Our API is more restrictive that it needs to be. For example, the following program won't be accepted even though it's valid.


#![allow(unused)]
fn main() {
fn reuse(serial: Serial1, msg: &'static mut [u8]) {
    // send a message
    let t1 = serial.write_all(msg);

    // ..

    let (msg, serial) = t1.wait(); // `msg` is now `&'static [u8]`

    msg.reverse();

    // now send it in reverse
    let t2 = serial.write_all(msg);

    // ..

    let (buf, serial) = t2.wait();

    // ..
}
}

To accept such program we can make the buffer argument generic.


#![allow(unused)]
fn main() {
// as-slice = "0.1.0"
use as_slice::{AsMutSlice, AsSlice};

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: B) -> Transfer<B>
    where
        B: AsMutSlice<Element = u8>,
    {
        // NOTE: added
        let slice = buffer.as_mut_slice();
        let (ptr, len) = (slice.as_mut_ptr(), slice.len());

        self.dma.set_source_address(USART1_RX, false);

        // NOTE: tweaked
        self.dma.set_destination_address(ptr as usize, true);
        self.dma.set_transfer_length(len);

        atomic::compiler_fence(Ordering::Release);
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    fn write_all<B>(mut self, buffer: B) -> Transfer<B>
    where
        B: AsSlice<Element = u8>,
    {
        // NOTE: added
        let slice = buffer.as_slice();
        let (ptr, len) = (slice.as_ptr(), slice.len());

        self.dma.set_destination_address(USART1_TX, false);

        // NOTE: tweaked
        self.dma.set_source_address(ptr as usize, true);
        self.dma.set_transfer_length(len);

        atomic::compiler_fence(Ordering::Release);
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }
}

}

NOTE: AsRef<[u8]> (AsMut<[u8]>) could have been used instead of AsSlice<Element = u8> (AsMutSlice<Element = u8).

Now the reuse program will be accepted.

Immovable buffers

With this modification the API will also accept arrays by value (e.g. [u8; 16]). However, using arrays can result in pointer invalidation. Consider the following program.


#![allow(unused)]
fn main() {
fn invalidate(serial: Serial1) {
    let t = start(serial);

    bar();

    let (buf, serial) = t.wait();
}

#[inline(never)]
fn start(serial: Serial1) -> Transfer<[u8; 16]> {
    // array allocated in this frame
    let buffer = [0; 16];

    serial.read_exact(buffer)
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

The read_exact operation will use the address of the buffer local to the start function. That local buffer will be freed when start returns and the pointer used in read_exact will become invalidated. You'll end up with a situation similar to the unsound example.

To avoid this problem we require that the buffer used with our API retains its memory location even when it's moved. The Pin newtype provides such guarantee. We can update our API to required that all buffers are "pinned" first.

NOTE: To compile all the programs below this point you'll need Rust >=1.33.0. As of time of writing (2019-01-04) that means using the nightly channel.


#![allow(unused)]
fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    // NOTE: changed
    buffer: Pin<B>,
    serial: Serial1,
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: bounds changed
        B: DerefMut,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: bounds changed
        B: Deref,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..
    }
}
}

NOTE: We could have used the StableDeref trait instead of the Pin newtype but opted for Pin since it's provided in the standard library.

With this new API we can use &'static mut references, Box-ed slices, Rc-ed slices, etc.


#![allow(unused)]
fn main() {
fn static_mut(serial: Serial1, buf: &'static mut [u8]) {
    let buf = Pin::new(buf);

    let t = serial.read_exact(buf);

    // ..

    let (buf, serial) = t.wait();

    // ..
}

fn boxed(serial: Serial1, buf: Box<[u8]>) {
    let buf = Pin::new(buf);

    let t = serial.read_exact(buf);

    // ..

    let (buf, serial) = t.wait();

    // ..
}
}

`'static` bound

Does pinning let us safely use stack allocated arrays? The answer is no. Consider the following example.


#![allow(unused)]
fn main() {
fn unsound(serial: Serial1) {
    start(serial);

    bar();
}

// pin-utils = "0.1.0-alpha.4"
use pin_utils::pin_mut;

#[inline(never)]
fn start(serial: Serial1) {
    let buffer = [0; 16];

    // pin the `buffer` to this stack frame
    // `buffer` now has type `Pin<&mut [u8; 16]>`
    pin_mut!(buffer);

    mem::forget(serial.read_exact(buffer));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

As seen many times before, the above program runs into undefined behavior due to stack frame corruption.

The API is unsound for buffers of type Pin<&'a mut [u8]> where 'a is not 'static. To prevent the problem we have to add a 'static bound in some places.


#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: added 'static bound
        B: DerefMut + 'static,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: added 'static bound
        B: Deref + 'static,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..
    }
}
}

Now the problematic program will be rejected.

Destructors

Now that the API accepts Box-es and other types that have destructors we need to decide what to do when Transfer is early-dropped.

Normally, Transfer values are consumed using the wait method but it's also possible to, implicitly or explicitly, drop the value before the transfer is over. For example, dropping a Transfer<Box<[u8]>> value will cause the buffer to be deallocated. This can result in undefined behavior if the transfer is still in progress as the DMA would end up writing to deallocated memory.

In such scenario one option is to make Transfer.drop stop the DMA transfer. The other option is to make Transfer.drop wait for the transfer to finish. We'll pick the former option as it's cheaper.


#![allow(unused)]
fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    // NOTE: always `Some` variant
    inner: Option<Inner<B>>,
}

// NOTE: previously named `Transfer<B>`
struct Inner<B> {
    buffer: Pin<B>,
    serial: Serial1,
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(mut self) -> (Pin<B>, Serial1) {
        while !self.is_done() {}

        atomic::compiler_fence(Ordering::Acquire);

        let inner = self
            .inner
            .take()
            .unwrap_or_else(|| unsafe { hint::unreachable_unchecked() });
        (inner.buffer, inner.serial)
    }
}

impl<B> Drop for Transfer<B> {
    fn drop(&mut self) {
        if let Some(inner) = self.inner.as_mut() {
            // NOTE: this is a volatile write
            inner.serial.dma.stop();

            // we need a read here to make the Acquire fence effective
            // we do *not* need this if `dma.stop` does a RMW operation
            unsafe {
                ptr::read_volatile(&0);
            }

            // we need a fence here for the same reason we need one in `Transfer.wait`
            atomic::compiler_fence(Ordering::Acquire);
        }
    }
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        B: DerefMut + 'static,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..

        Transfer {
            inner: Some(Inner {
                buffer,
                serial: self,
            }),
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        B: Deref + 'static,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..

        Transfer {
            inner: Some(Inner {
                buffer,
                serial: self,
            }),
        }
    }
}
}

Now the DMA transfer will be stopped before the buffer is deallocated.


#![allow(unused)]
fn main() {
fn reuse(serial: Serial1) {
    let buf = Pin::new(Box::new([0; 16]));

    let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲

    // ..

    // this stops the DMA transfer and frees memory
    mem::drop(t); // compiler_fence(Ordering::Acquire) ▼

    // this likely reuses the previous memory allocation
    let mut buf = Box::new([0; 16]);

    // .. do stuff with `buf` ..
}
}

Summary

To sum it up, we need to consider all the following points to achieve memory safe DMA transfers:

Use immovable buffers plus indirection: Pin<B>. Alternatively, you can use the StableDeref trait.
The ownership of the buffer must be passed to the DMA : B: 'static.
Do not rely on destructors running for memory safety. Consider what happens if mem::forget is used with your API.
Do add a custom destructor that stops the DMA transfer, or waits for it to finish. Consider what happens if mem::drop is used with your API.

This text leaves out up several details required to build a production grade DMA abstraction, like configuring the DMA channels (e.g. streams, circular vs one-shot mode, etc.), alignment of buffers, error handling, how to make the abstraction device-agnostic, etc. All those aspects are left as an exercise for the reader / community (:P).

The Embedonomicon