Accessing physical memory from userspace on Linux

Published on January 10, 2023

Why access physical memory?

The ability to read from and write to arbitrary physical address is a very useful and powerful tool. It can give us some insight into what the operating system is doing under the hood, it can be used to interface with devices that use memory-mapped I/O, such as PCI devices, from userspace and it is in general incredibly useful for (hardware) reverse engineering purposes. Thus, we are going to look at how to access arbitrary physical memory on Linux using the /dev/mem interface. Of course, with great power comes great responsibility, and as full access to /dev/mem poses a huge security risk, you should not be doing this on a production system.

Furthermore, since we are trying to access physical memory, we also need to know where the memory of our process is located in physical memory. So we will first be looking at how to map a given virtual address to its physical counterpart.

Looking up physical addresses

Before we can actually use the interface to access physical memory, we first need to look at how to look up the physical address for any given virtual address. Fortunately, since version 2.6.25 the Linux kernel provides such an interface through the /proc/<pid>/pagemap file. This file consists of a 64-bit entry for each virtual page describing attributes such as whether the page is present, swapped, etc. as well as the physical frame number if the page is present but not swapped as described in the documentation.

Thus, to obtain the physical address for any given virtual address, we first need to open this file. Then we need to divide the virtual address by the page size of the platform, which is typically but not necessarily 4 kiB, and multiply it by 8 to get the file offset to seek to. Finally, we can read the 64-bit entry corresponding to that virtual address.

Let's first start with an abstraction for this entry. Since the page can be swapped out, present or neither, we are going to abstract the entry with an enum:

use bitflags::bitflags;
use simple_bits::BitsExt as _;

#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
pub struct SwapType(pub u8);

const SWAPPED: u64 = 1 << 62;
const PRESENT: u64 = 1 << 63;

bitflags! {
    pub struct PageEntryFlags: u64 {
        const SOFT_DIRTY = 1 << 55;
        const EXCLUSIVE = 1 << 56;
        const SHARED = 1 << 61;
    }
}

#[derive(Clone, Debug)]
pub enum PageEntry {
    Unmapped,
    Present {
        pfn: u64,
        flags: PageEntryFlags,
    },
    Swapped {
        ty: SwapType,
        offset: u64,
        flags: PageEntryFlags,
    },
}

Now given the 64-bit entry, we need to map the 64-bit value to the appropriate enum variant and extract the various fields. To achieve that, we implement the From<u64> trait for our enum:

impl From<u64> for PageEntry {
    fn from(value: u64) -> Self {
        let flags = PageEntryFlags::from_bits_truncate(value);

        if value & SWAPPED == SWAPPED {
            Self::Swapped {
                ty: SwapType(value.extract_bits(0..4)),
                offset: value.extract_bits(4..55),
                flags,
            }
        } else if value & PRESENT == PRESENT {
            Self::Present {
                pfn: value.extract_bits(0..55),
                flags,
            }
        } else {
            Self::Unmapped
        }
    }
}

In addition, since we get the physical page frame number, we implement a helper function to query the page size and use it to calculate the actual physical address:

use nix::unistd::{sysconf, SysconfVar::PAGE_SIZE};

impl PageEntry {
    fn physical_address(&self) -> Option<u64> {
        match self {
            Self::Present { pfn, .. } => {
                let page_size = sysconf(PAGE_SIZE)
                    .unwrap_or(None)
                    .unwrap_or(4096) as u64;

                Some(pfn * page_size)
            }
            _ => None,
        }
    }
}

Finally, instead of opening the pagemap interface ourselves and using the pread system call, we can abstract the pagemap interface with functions to open it for the current process or a given process ID as well as a function to read the entry for a given virtual address as follows:

use libc::off_t;
use nix::sys::uio::pread;
use std::fs::File;
use std::os::unix::io::AsRawFd;

#[derive(Debug)]
pub struct PageMap(File);

impl PageMap {
    fn with_self() -> Result<Self, std::io::Error> {
        Ok(Self(File::open("/proc/self/pagemap")?))
    }

    fn with_pid(pid: u32) -> Result<Self, std::io::Error> {
        Ok(Self(File::open(format!("/proc/{pid}/pagemap"))?))
    }

    fn read_entry(&self, address: off_t) -> Result<PageEntry, std::io::Error> {
        let page_size = sysconf(PAGE_SIZE)
            .unwrap_or(None)
            .unwrap_or(4096) as off_t;

        let mut bytes = [0u8; 8];
        pread(self.0.as_raw_fd(), &mut bytes, 8 * address / page_size)?;

        Ok(PageEntry::from(u64::from_ne_bytes(bytes)))
    }
}

To showcase our Rust implementation, we use the mmap-rs crate to map in a page and use the pagemap interface to look up its physical address.

use mmap_rs::{MmapFlags, MmapOptions};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut mapping = MmapOptions::new(MmapOptions::page_size().0)
        .with_flags(MmapFlags::COPY_ON_WRITE)
        .map_mut()?;

    mapping.fill(0xff);

    let pagemap = PageMap::with_self()?;

    let entry = pagemap.read_entry(mapping.as_ptr() as _)?;
    println!("entry: {:x?}", entry.physical_address());

    Ok(())
}

If we just try to run the example using cargo r, we see the following output:

entry: Some(0)

This is because we need the SYS_CAP_ADMIN capability to actually observe the physical addresses, as having information about the physical addresses is useful for Rowhammer attacks. We thus need to build the above example and run it as the root user to see the physical address:

cargo b
sudo ./target/debug/pagemap

Accessing physical memory

To access physical memory, we can use the /dev/mem interface provided by the Linux kernel. To read from or write to physical memory, we simply use the physical address as the file offset to seek to and then perform a read/write. More specifically, we can again use the pread and pwrite offsets to read/write without having to seek. In fact, we can provide a similar abstraction as we did for the pagemap interface in Rust:

use libc::off_t;
use nix::sys::uio::{pread, pwrite};
use std::fs::{File, OpenOptions};
use std::os::unix::io::AsRawFd;

#[derive(Debug)]
pub struct PhysicalMemory(File);

impl PhysicalMemory {
    fn open() -> Result<Self, std::io::Error> {
        let file = OpenOptions::new()
            .read(true)
            .write(true)
            .create(false)
            .open("/dev/mem")?;

        Ok(Self(file))
    }

    fn read(&self, bytes: &mut [u8], address: u64) -> Result<usize, std::io::Error> {
        let n = pread(self.0.as_raw_fd(), bytes, address as _)?;
        Ok(n)
    }

    fn write(&self, bytes: &[u8], address: u64) -> Result<usize, std::io::Error> {
        let n = pwrite(self.0.as_raw_fd(), bytes, address as _)?;
        Ok(n)
    }
}

We can then extend our example from before such that we first look up the physical address of our mapped page through the pagemap interface, and then use that address to read from physical memory. This should hopefully give us the same bytes we wrote to our page. The example now looks like this:

use mmap_rs::{MmapFlags, MmapOptions};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut mapping = MmapOptions::new(MmapOptions::page_size().0)
        .with_flags(MmapFlags::COPY_ON_WRITE)
        .map_mut()?;

    mapping.fill(0xff);
    mapping[0..4].copy_from_slice(b"test");

    let pagemap = PageMap::with_self()?;

    let entry = pagemap.read_entry(mapping.as_ptr() as _)?;

    if let Some(physical_address) = entry.physical_address() {
        let physical_memory = PhysicalMemory::open()?;
        let mut bytes = [0u8; 4];
        let _ = physical_memory.read(&mut bytes, physical_address)?;

        println!("{:x?}", bytes);
    }

    Ok(())
}

We then run the above example, and... we get a permission denied error:

Error: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }

Let's have a look at the Linux kernel source code to see what is actually going on there. In particular, when we try to read from /dev/mem, we end up calling the read_mem() function. More specifically, it checks whether we are allowed to access the page(s) we are trying to read at lines 152-154.

        allowed = page_is_allowed(p >> PAGE_SHIFT);
        if (!allowed)
            goto failed;

There are two possible definitions for the page_is_allowed() function depending on whether the Linux kernel has been compiled with CONFIG_STRICT_DEVMEM or not. If not, then this function always returns 1 and as a result we can access any physical address. Otherwise, this function ends up calling the devmem_is_allowed() function which is defined differently depending on the platform (the link points to the implementation for x86).

On x86 the first megabyte of our physical memory is always accessible, as that traditionally contains the BIOS code and data, which some applications on Linux need access to for emulation purposes. However, since the first megabyte may contain actual free physical memory that can be used by the Linux kernel and/or userspace, accessing any such region simply results in zero filling the buffer for reads. In addition, not all physical memory is backed by DRAM, but may instead be backed by memory-mapped I/O resources. Thus, the /dev/mem interface also provides access to memory-mapped I/O regions, ACPI regions and BIOS regions that live above the first megabyte depending on whether iomem=relaxed or iomem=strict is passed as a kernel command line argument.

Unrestricted access to /dev/mem

Warning: unrestricted access to /dev/mem is potentially dangerous and poses a huge security risk as you can access all physical memory in the system, including physical memory in use by the operating system as well as other applications, as well as to all PCI devices. Be aware of these risks and don't run this on a production system.

First, to make sure we can access any I/O resources, we can simply add the iomem=relaxed kernel command line argument to our GRUB configuration in /etc/default/grub:

GRUB_CMDLINE_LINUX="iomem=relaxed"

Run sudo grub-mkconfig -o /boot/grub/grub.cfg and reboot the system. This is sometimes necessary for userspace applications to access I/O devices (e.g. flashrom).

However, the above setting doesn't give us access to any physical memory backed by DRAM. One option is to build our own Linux kernel with CONFIG_STRICT_DEVMEM=n. However, we can also write a Linux kernel module that uses kernel probes to alter the behaviour of the devmem_is_allowed() function to always return 1. Since the page_is_allowed() and range_is_allowed() functions are typically inlined, we cannot rely on kernel probes to hook these functions.

Before we can build a Linux kernel module, we need to make sure that we have the appropriate build tools and kernel header files installed:

sudo apt install linux-headers-$(uname -r) build-essential

Then we can simply use the following Makefile to build our kernel module:

obj-m += mem_bypass.o

mem_bypass-objs += source/main.o

all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

The actual kernel module code in source/main.c looks like this:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kprobes.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Stephan van Schaik");
MODULE_DESCRIPTION("Bypass /dev/mem checks.");
MODULE_VERSION("1.0");

static int
bypass(struct kretprobe_instance *probe, struct pt_regs *regs)
{
    regs->ax = 1;

    return 0;
}

static struct kretprobe probe = {
    .handler = bypass,
    .maxactive = 20,
};

static int __init mem_bypass_init(void)
{
    probe.kp.symbol_name = "devmem_is_allowed";
    register_kretprobe(&probe);

    return 0;
}

static void __exit mem_bypass_exit(void)
{
    unregister_kretprobe(&probe);
}

module_init(mem_bypass_init);
module_exit(mem_bypass_exit);

Since we simply want to override the return values of the function to always return 1, we register a kretprobe. Whenever the function is about to return, the kernel invokes our bypass() function which simply overrides the return value, which on x86 is passed through the ax register. This way we are always allowed to access any physical address we want.

We compile and load the above kernel module as follows:

make
sudo insmod mem_bypass.ko

Then we can run our example from before as the root user. If everything worked, then we should see the following output:

[74, 65, 73, 74]

Of course, if you are done playing with /dev/mem, then make sure to remove the kernel module:

sudo rmmod mem_bypass.ko

If you like my work or if my work has been useful to you in any way, then feel free to donate me a cup of coffee. Any donation is much appreciated!