CCM RAM usage, Mbed linker magic

I have a mbed app with a webserver, running on a F407VE with 128 kB RAM + 64 kB CCM RAM, this works fine and I have some free RAM. Now I have moved the same app to another custom hardware with also F407VG and same RAM settings, but now my app crashes after trying to allocate some dynamic memory with ‘out of memory’ exception.
I have compared the linker scripts, but the settings are almost the same between F407 VE and VG MCU. The VE has only the addion for MBED_APP_START, but this makes no difference.
It looks like the CCM RAM is not used on the F407VG, but I cannot find where it is used. In the linker script, there is no section for CCM. The size is declared, but is used nowhere.
In the BUILD dir, there is a generated .profile-ld where I can see two RAM sections:

{
    "flags": [
        "-DMBED_BOOT_STACK_SIZE=1024", 
        "-DMBED_RAM1_SIZE=0x10000", 
        "-DMBED_RAM1_START=0x10000000", 
        "-DMBED_RAM_SIZE=0x20000", 
        "-DMBED_RAM_START=0x20000000", 
        "-DMBED_ROM_SIZE=0x100000", 
        "-DMBED_ROM_START=0x8000000", 
        "-DXIP_ENABLE=0", 

It looks like this is some secret knowledge from CMSIS packs, but where is this additional RAM used?
does it deal with the bootloader stuff?

And interestingly, the .profile-ld for the F407VE looks like this:

{
    "flags": [
        "-DMBED_BOOT_STACK_SIZE=1024", 
        "-DXIP_ENABLE=0", 

and there, I have more RAM available.

Very strange, who can explain me this magic?
@theotherjimmy is no longer working at mbed-os?

I found some problems:
the main problem was a missing (deleted) setting in mbed_app.json, there was a buffer too large with its default setting. This was eating up my memory.

By digging into that, I found the reason for the different .profile-ld defines:
They are generated and CMSIS metainformation from mbed-os/tools/arm_pack_manager/index.json is used. When the target_name does not match an entry in this file, the additional RAM/ROM START/SIZE macros are not generated. I’m not sure if this has a further impact, as there were many changes in the past.
Maybe someone of the mbed-team can explain?

And one more thing to note:
The F407 has two RAM regions, the SRAM and CCM. It looks like the CCM is not used, the ram regions are not consecutive. But now there is support for split memory in mbed head manager, some targets use it already by a modified linker script and setting MBED_SPLIT_HEAP.
I will try if it works for this MCU also.

Hello Johannes,

Thank you for sharing this info. It would be great to make the MBED_SPLIT_HEAP work.

Below is a modified GCC_ARM link script
mbed-os\targets\TARGET_STM\TARGET_STM32F4\TARGET_STM32F407xE\device\TOOLCHAIN_GCC_ARM\STM32F407XG.ld
which allows to use the CCM for static variables:

/* Linker script to configure memory regions. */

#if !defined(MBED_APP_START)
  #define MBED_APP_START 0x08000000
#endif

#if !defined(MBED_APP_SIZE)
  #define MBED_APP_SIZE 0x80000
#endif

#if !defined(MBED_BOOT_STACK_SIZE)
    #define MBED_BOOT_STACK_SIZE 0x400
#endif

STACK_SIZE = MBED_BOOT_STACK_SIZE;

M_CRASH_DATA_RAM_SIZE = 0x100;

CCM_SIZE = 64k;

MEMORY
{ 
  FLASH (rx) : ORIGIN = MBED_APP_START, LENGTH = MBED_APP_SIZE
  CCM (rwx) : ORIGIN = 0x10000000, LENGTH = CCM_SIZE
  RAM (rwx) : ORIGIN = 0x20000188, LENGTH = 128k - 0x188 
}

/* Linker script to place sections and symbol values. Should be used together
 * with other linker script that defines memory regions FLASH and RAM.
 * It references following symbols, which must be defined in code:
 *   Reset_Handler : Entry of reset handler
 * 
 * It defines following symbols, which code can use without definition:
 *   __exidx_start
 *   __exidx_end
 *   __etext
 *   __data_start__
 *   __preinit_array_start
 *   __preinit_array_end
 *   __init_array_start
 *   __init_array_end
 *   __fini_array_start
 *   __fini_array_end
 *   __data_end__
 *   __bss_start__
 *   __bss_end__
 *   __end__
 *   end
 *   __HeapLimit
 *   __StackLimit
 *   __StackTop
 *   __stack
 *   _estack
 */
ENTRY(Reset_Handler)

SECTIONS
{
    .text :
    {
        KEEP(*(.isr_vector))
        *(.text*)
        KEEP(*(.init))
        KEEP(*(.fini))

        /* .ctors */
        *crtbegin.o(.ctors)
        *crtbegin?.o(.ctors)
        *(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
        *(SORT(.ctors.*))
        *(.ctors)

        /* .dtors */
        *crtbegin.o(.dtors)
        *crtbegin?.o(.dtors)
        *(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)
        *(SORT(.dtors.*))
        *(.dtors)

        *(.rodata*)

        KEEP(*(.eh_frame*))
    } > FLASH

    .ARM.extab :
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
    } > FLASH

    __exidx_start = .;
    .ARM.exidx :
    {
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    } > FLASH
    __exidx_end = .;

    __etext = .;
    _sidata = .;

    .ccm :
    {
        . = ALIGN(8);
        __CCM__ = .;
        __CCM_START__ = .; /* Create a global symbol at ccm start */
        KEEP(*(.keep.ccm))
        *(.m_ccm)     /* This is a user defined section */
        . += CCM_SIZE;
        . = ALIGN(8);
        __CCM_END__ = .; /* Define a global symbol at ccm end */
    } > CCM

    .crash_data_ram :
    {
        . = ALIGN(8);
        __CRASH_DATA_RAM__ = .;
        __CRASH_DATA_RAM_START__ = .; /* Create a global symbol at data start */
        KEEP(*(.keep.crash_data_ram))
        *(.m_crash_data_ram)     /* This is a user defined section */
        . += M_CRASH_DATA_RAM_SIZE;
        . = ALIGN(8);
        __CRASH_DATA_RAM_END__ = .; /* Define a global symbol at data end */
    } > RAM 

    .data : AT (__etext)
    {
        __data_start__ = .;
        _sdata = .;
        *(vtable)
        *(.data*)

        . = ALIGN(8);
        /* preinit data */
        PROVIDE_HIDDEN (__preinit_array_start = .);
        KEEP(*(.preinit_array))
        PROVIDE_HIDDEN (__preinit_array_end = .);

        . = ALIGN(8);
        /* init data */
        PROVIDE_HIDDEN (__init_array_start = .);
        KEEP(*(SORT(.init_array.*)))
        KEEP(*(.init_array))
        PROVIDE_HIDDEN (__init_array_end = .);


        . = ALIGN(8);
        /* finit data */
        PROVIDE_HIDDEN (__fini_array_start = .);
        KEEP(*(SORT(.fini_array.*)))
        KEEP(*(.fini_array))
        PROVIDE_HIDDEN (__fini_array_end = .);

        KEEP(*(.jcr*))
        . = ALIGN(8);
        /* All data end */
        __data_end__ = .;
        _edata = .;

    } > RAM

    .bss :
    {
        . = ALIGN(8);
        __bss_start__ = .;
        _sbss = .;
        *(.bss*)
        *(COMMON)
        . = ALIGN(8);
        __bss_end__ = .;
        _ebss = .;
    } > RAM

    .heap (COPY):
    {
        __end__ = .;
        end = __end__;
        *(.heap*)
        . = ORIGIN(RAM) + LENGTH(RAM) - STACK_SIZE;
        __HeapLimit = .;
    } > RAM

    /* .stack_dummy section doesn't contains any symbols. It is only
     * used for linker to calculate size of stack sections, and assign
     * values to stack symbols later */
    .stack_dummy (COPY):
    {
        *(.stack*)
    } > RAM

    /* Set stack top to end of RAM, and stack limit move down by
     * size of stack_dummy section */
    __StackTop = ORIGIN(RAM) + LENGTH(RAM);
    _estack = __StackTop;
    __StackLimit = __StackTop - STACK_SIZE;
    PROVIDE(__stack = __StackTop);

    /* Check if data + heap + stack exceeds RAM limit */
    ASSERT(__StackLimit >= __HeapLimit, "region RAM overflowed with stack")
}

Then it could be used in main.cpp as:

...
static uint8_t bigBuffer[0x10000] __attribute__((section(".ccm")));   // 64kB buffer
...

bigBuffer[0] = 0x51;
...

Thanks Zoltan,

your linker script is also useful. I have now copied and modified a script from a L4 device and added the MBED_SPLIT_HEAP to my target definition. The compiler is running…
I’m not sure if there are side effects, the CCM is accessible by the cpu core only. This may have an impact on memory that should be used with DMA, I gues this will not work. I have not checked yet if the L4 SRAM 1 and 2 are different than RAM and CCM.

compile+link ok, but does not work, looks like an early crash.

this is the linker script (for a F407VG):

/* Linker script to configure memory regions. */

#if !defined(MBED_APP_START)
  #define MBED_APP_START 0x08000000
#endif

#if !defined(MBED_APP_SIZE)
  #define MBED_APP_SIZE 1024k
#endif

#if !defined(MBED_BOOT_STACK_SIZE)
    #define MBED_BOOT_STACK_SIZE 0x400
#endif

STACK_SIZE = MBED_BOOT_STACK_SIZE;

/* Linker script to configure memory regions. */
MEMORY
{
  FLASH (rx)   : ORIGIN = MBED_APP_START, LENGTH = MBED_APP_SIZE
  SRAM2 (rwx)  : ORIGIN = 0x10000188, LENGTH = 64k - 0x188
  SRAM1 (rwx)  : ORIGIN = 0x20000000, LENGTH = 128k
}

/* Linker script to place sections and symbol values. Should be used together
 * with other linker script that defines memory regions FLASH and RAM.
 * It references following symbols, which must be defined in code:
 *   Reset_Handler : Entry of reset handler
 *
 * It defines following symbols, which code can use without definition:
 *   __exidx_start
 *   __exidx_end
 *   __etext
 *   __data_start__
 *   __preinit_array_start
 *   __preinit_array_end
 *   __init_array_start
 *   __init_array_end
 *   __fini_array_start
 *   __fini_array_end
 *   __data_end__
 *   __bss_start__
 *   __bss_end__
 *   __end__
 *   end
 *   __HeapLimit
 *   __StackLimit
 *   __StackTop
 *   __stack
 *   _estack
 */
ENTRY(Reset_Handler)

SECTIONS
{
    .text :
    {
        KEEP(*(.isr_vector))
        *(.text*)
        KEEP(*(.init))
        KEEP(*(.fini))

        /* .ctors */
        *crtbegin.o(.ctors)
        *crtbegin?.o(.ctors)
        *(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
        *(SORT(.ctors.*))
        *(.ctors)

        /* .dtors */
        *crtbegin.o(.dtors)
        *crtbegin?.o(.dtors)
        *(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)
        *(SORT(.dtors.*))
        *(.dtors)

        *(.rodata*)

        KEEP(*(.eh_frame*))
    } > FLASH

    .ARM.extab :
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
    } > FLASH

    __exidx_start = .;
    .ARM.exidx :
    {
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    } > FLASH
    __exidx_end = .;

    __etext = .;
    _sidata = .;

    /* .stack section doesn't contains any symbols. It is only
     * used for linker to reserve space for the isr stack section
     * WARNING: .stack should come immediately after the last secure memory
     * section.  This provides stack overflow detection. */
    .stack (NOLOAD):
    {
        __StackLimit = .;
        *(.stack*);
        . += STACK_SIZE - (. - __StackLimit);
    } > SRAM2

    /* Set stack top to end of RAM, and stack limit move down by
     * size of stack_dummy section */
    __StackTop = ADDR(.stack) + SIZEOF(.stack);
    _estack = __StackTop;
    __StackLimit = ADDR(.stack);
    PROVIDE(__stack = __StackTop);

    /* Place holder for additional heap */
    .heap_0 (COPY):
    {
        __mbed_sbrk_start_0 = .;
        . += (ORIGIN(SRAM2) + LENGTH(SRAM2) - .);
        __mbed_krbs_start_0 = .;
    } > SRAM2

    /* Check if heap exceeds SRAM2 */
    ASSERT(__mbed_krbs_start_0 <= (ORIGIN(SRAM2)+LENGTH(SRAM2)), "Heap is too big for SRAM2")

    .data : AT (__etext)
    {
        __data_start__ = .;
        _sdata = .;
        *(vtable)
        *(.data*)

        . = ALIGN(8);
        /* preinit data */
        PROVIDE_HIDDEN (__preinit_array_start = .);
        KEEP(*(.preinit_array))
        PROVIDE_HIDDEN (__preinit_array_end = .);

        . = ALIGN(8);
        /* init data */
        PROVIDE_HIDDEN (__init_array_start = .);
        KEEP(*(SORT(.init_array.*)))
        KEEP(*(.init_array))
        PROVIDE_HIDDEN (__init_array_end = .);

        . = ALIGN(8);
        /* finit data */
        PROVIDE_HIDDEN (__fini_array_start = .);
        KEEP(*(SORT(.fini_array.*)))
        KEEP(*(.fini_array))
        PROVIDE_HIDDEN (__fini_array_end = .);

        KEEP(*(.jcr*))
        . = ALIGN(8);
        /* All data end */
        __data_end__ = .;
        _edata = .;

    } > SRAM1

    /* Check if bss exceeds SRAM1 */
    ASSERT(__data_end__ <= (ORIGIN(SRAM1)+LENGTH(SRAM1)), ".data is too big for SRAM1")

    .bss :
    {
        . = ALIGN(8);
        __bss_start__ = .;
        _sbss = .;
        *(.bss*)
        *(COMMON)
        . = ALIGN(8);
        __bss_end__ = .;
        _ebss = .;
    } > SRAM1

    /* Check if bss exceeds SRAM1 */
    ASSERT(__bss_end__ <= (ORIGIN(SRAM1)+LENGTH(SRAM1)), "BSS is too big for SRAM1")

    /* Placeholder for default single heap */
    .heap (COPY):
    {
        __end__ = .;
        end = __end__;
        __mbed_sbrk_start = .;
        *(.heap*)
        . += (ORIGIN(SRAM1) + LENGTH(SRAM1) - .);
        __mbed_krbs_start = .;
        __HeapLimit = .;
    } > SRAM1

    /* Check if heap exceeds SRAM1 */
    ASSERT(__HeapLimit <= (ORIGIN(SRAM1)+LENGTH(SRAM1)), "Heap is too big for SRAM1")
}

not successful yet. I’ve tried also swapping RAM1/2, then I get an exception ‘unable to allocate thread stack’.
The CCM cannot be used for DMA, I’ve checked this. The EMAC code does not force the Rx/Tx buffers to specified sections, so this must be modified also for ethernet usage.
It looks easier to use your solution for static memory.
And the L4 SRAM1/2 bus matrix is different than the F4 architecture.

Another alternative which could be handy too is to use the CCM for the stack.
Below is the modified GCC_ARM link script
mbed-os\targets\TARGET_STM\TARGET_STM32F4\TARGET_STM32F407xE\device\TOOLCHAIN_GCC_ARM\STM32F407XG.ld:

/* Linker script to configure memory regions. */

#if !defined(MBED_APP_START)
  #define MBED_APP_START 0x08000000
#endif

#if !defined(MBED_APP_SIZE)
  #define MBED_APP_SIZE 0x80000
#endif

#if !defined(MBED_BOOT_STACK_SIZE)
    #define MBED_BOOT_STACK_SIZE 0x400
#endif

STACK_SIZE = MBED_BOOT_STACK_SIZE;

M_CRASH_DATA_RAM_SIZE = 0x100;

CCM_SIZE = 64k;

MEMORY
{ 
  FLASH (rx) : ORIGIN = MBED_APP_START, LENGTH = MBED_APP_SIZE
  CCM (rwx) : ORIGIN = 0x10000000, LENGTH = CCM_SIZE
  RAM (rwx) : ORIGIN = 0x20000188, LENGTH = 128k - 0x188 
}

/* Linker script to place sections and symbol values. Should be used together
 * with other linker script that defines memory regions FLASH and RAM.
 * It references following symbols, which must be defined in code:
 *   Reset_Handler : Entry of reset handler
 * 
 * It defines following symbols, which code can use without definition:
 *   __exidx_start
 *   __exidx_end
 *   __etext
 *   __data_start__
 *   __preinit_array_start
 *   __preinit_array_end
 *   __init_array_start
 *   __init_array_end
 *   __fini_array_start
 *   __fini_array_end
 *   __data_end__
 *   __bss_start__
 *   __bss_end__
 *   __end__
 *   end
 *   __HeapLimit
 *   __StackLimit
 *   __StackTop
 *   __stack
 *   _estack
 */
ENTRY(Reset_Handler)

SECTIONS
{
    .text :
    {
        KEEP(*(.isr_vector))
        *(.text*)
        KEEP(*(.init))
        KEEP(*(.fini))

        /* .ctors */
        *crtbegin.o(.ctors)
        *crtbegin?.o(.ctors)
        *(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
        *(SORT(.ctors.*))
        *(.ctors)

        /* .dtors */
        *crtbegin.o(.dtors)
        *crtbegin?.o(.dtors)
        *(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)
        *(SORT(.dtors.*))
        *(.dtors)

        *(.rodata*)

        KEEP(*(.eh_frame*))
    } > FLASH

    .ARM.extab :
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
    } > FLASH

    __exidx_start = .;
    .ARM.exidx :
    {
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    } > FLASH
    __exidx_end = .;

    __etext = .;
    _sidata = .;
    
    .ccm :
    {
        . = ALIGN(8);
        __CCM__ = .;
        __CCM_START__ = .; /* Create a global symbol at ccm start */
        KEEP(*(.keep.ccm))
        *(.m_ccm)     /* This is a user defined section */
        . += CCM_SIZE;
        . = ALIGN(8);
        __CCM_END__ = .; /* Define a global symbol at ccm end */
    } > CCM

    .crash_data_ram :
    {
        . = ALIGN(8);
        __CRASH_DATA_RAM__ = .;
        __CRASH_DATA_RAM_START__ = .; /* Create a global symbol at data start */
        KEEP(*(.keep.crash_data_ram))
        *(.m_crash_data_ram)     /* This is a user defined section */
        . += M_CRASH_DATA_RAM_SIZE;
        . = ALIGN(8);
        __CRASH_DATA_RAM_END__ = .; /* Define a global symbol at data end */
    } > RAM 

    .data : AT (__etext)
    {
        __data_start__ = .;
        _sdata = .;
        *(vtable)
        *(.data*)

        . = ALIGN(8);
        /* preinit data */
        PROVIDE_HIDDEN (__preinit_array_start = .);
        KEEP(*(.preinit_array))
        PROVIDE_HIDDEN (__preinit_array_end = .);

        . = ALIGN(8);
        /* init data */
        PROVIDE_HIDDEN (__init_array_start = .);
        KEEP(*(SORT(.init_array.*)))
        KEEP(*(.init_array))
        PROVIDE_HIDDEN (__init_array_end = .);


        . = ALIGN(8);
        /* finit data */
        PROVIDE_HIDDEN (__fini_array_start = .);
        KEEP(*(SORT(.fini_array.*)))
        KEEP(*(.fini_array))
        PROVIDE_HIDDEN (__fini_array_end = .);

        KEEP(*(.jcr*))
        . = ALIGN(8);
        /* All data end */
        __data_end__ = .;
        _edata = .;

    } > RAM

    .bss :
    {
        . = ALIGN(8);
        __bss_start__ = .;
        _sbss = .;
        *(.bss*)
        *(COMMON)
        . = ALIGN(8);
        __bss_end__ = .;
        _ebss = .;
    } > RAM

    .heap (COPY):
    {
        __end__ = .;
        end = __end__;
        *(.heap*)
        . = ORIGIN(RAM) + LENGTH(RAM);
        __HeapLimit = .;
    } > RAM

    /* .stack_dummy section doesn't contains any symbols. It is only
     * used for linker to calculate size of stack sections, and assign
     * values to stack symbols later */
    .stack_dummy (COPY):
    {
        *(.stack*)
    } > CCM

    /* Set stack top to end of CCM, and stack limit move down by
     * size of stack_dummy section */
    __StackTop = ORIGIN(CCM) + LENGTH(CCM);
    _estack = __StackTop;
    __StackLimit = __StackTop - STACK_SIZE;
    PROVIDE(__stack = __StackTop);
}

Edit: Unfortunately, this works only with Mbed OS-5 Bare-metal and Mbed OS-2 :frowning: . The Mbed OS-5 RTOS moves the stack to the SRAM.

yes, seems to be better also than leaving the additional 64 k RAM unused :slight_smile:

I will try your first suggestion and write some memory pool manager, an object that gets the large chunk and returns smaller blocks of memory, also like malloc. This can be used for static buffers that need to be allocated once. This is handy for my webserver project where I have some worker threads which need own buffers and thread stack. The mbed Thread class can use user supplied memory for the stack as I understand.

Different linker files for one target are not supported yet, I think that would be also a nice extension to Mbed when the linkerscript is selectable in mbed_app.json.

Indeed Core-Coupled-Memory RAM in Cortex-M4 is useful when core want use one more bus for accessing data without contention. Some pratical usage notes from a User Guide OpenSTM32 Using CCM Memory .
As a side note, if we were talking about Tightly
Coupled Memories or TCM for very low interrupt latency, TCM RAM for data or code is read/written or read by core with zero wait states no matter core system clock frequency set. So it is useful for stack, critical data, but also for fast executing code, such as critical section code, etc.