Arm Mbed OS support forum

STM32F769 JPEG decode with DMA bug

I am trying to use the JPEG decoder with DMA to decode a jpeg image.
The example from STMCubeF7 named JPEG_DecodingUsingFs_DMA works perfectly on the board.

But when I try integrating it into my Mbed OS project, the program get stuck when enabling the DMA for the JPEG input, which happens by calling HAL_JPEG_Decode_DMA.
And more precisely, it is stuck when setting the IDMAEN (input DMA enable) bit of the JPEG control register.

// in JPEG_DMA_StartProcess (in stm32f7xx_hal_jpeg.c), macro JPEG_ENABLE_DMA
hjpeg->Instance->CR |= JPEG_CR_ODMAEN; // works for out DMA enable
hjpeg->Instance->CR |= JPEG_CR_IDMAEN; // freezes here

The DMA works, the JPEG decoder also works (in polling mode), but using JPEG decoder in DMA mode fails.

I did implement HAL_JPEG_MspInit, JPEG_IRQHandler, DMA2_Stream3_IRQHandler and DMA2_Stream4_IRQHandler.

The DMA2 is correctly initialized (all registers values are correct). The JPEG codec is also correctly initialized (registers values are correct).

I have no clue why setting a bit in the JPEG control register would completely freeze the program.
I think it may be because of some configuration in Mbed OS but I can’t think of anything precisely, I checked almost all config files without results.

Tools used

mbed os : 6.11.0-rc1
mbed-cli : 1.10.5
arm-none-eabi-g++ : 11.1.0


Do you start all expected clocks ?

Src/stm32f7xx_hal_msp.c: __HAL_RCC_JPEG_CLK_ENABLE();
Src/stm32f7xx_hal_msp.c: __HAL_RCC_DMA2_CLK_ENABLE();

Yes I do, in HAL_JPEG_MspInit.
JPEG works in polling mode (I can display an Image, even a Video) but the performance are very low. That’s why I want to use JPEG decoder in DMA mode to be able to play a video at a correct frame rate.

The F7 is more complicated with different busses for memory and cache. To check if this is a cache problem, you can disable the D-cache at the beginning of your main() with

    printf("disable D-Cache\n");

this is the sledgehammer method, but used in Mbed also for the ethernet emac for F7. It reveals also that better support for the MPU is missing, its a global resource without management in the OS. But the MPU is also used for controlling cache settings and this is very important when you use DMA with M7 cores.
And it is necessary to control the memory layout in the linker script, for adjusting cache settings for SRAM in the MPU, the used memory needs to be aligned.

Mbed needs also some easier method for using a custom linker file, instead of creating a custom target. For better control, the used RAM must be located in the linker. The readme for the example has also a short section about this: 
@Note If the user code size exceeds the DTCM-RAM size or starts from internal cacheable memories (SRAM1 and SRAM2),that is shared between several processors,
      then it is highly recommended to enable the CPU cache and maintain its coherence at application level.
      The address and the size of cacheable buffers (shared between CPU and other masters)  must be properly updated to be aligned to cache line size (32 bytes).

@Note It is recommended to enable the cache and maintain its coherence, but depending on the use case
      It is also possible to configure the MPU as "Write through", to guarantee the write access coherence.
      In that case, the MPU must be configured as Cacheable/Bufferable/Not Shareable.
      Even though the user must manage the cache coherence for read accesses.
      Please refer to the AN4838 “Managing memory protection unit (MPU) in STM32 MCUs”
      Please refer to the AN4839 “Level 1 cache on STM32F7 Series”

Thanks for your reply @JojoS

I tried disabling DCache, didn’t change anything.

I started looking into AN4838 and AN4839 to learn more abour MPU and cache on F7.

I also noticed that the linker script used by Mbed OS is very different than the one used in the ST example.
I never had to write a .ld file so I don"t understand everything, but I noticed the .ld file in my Mbed OS project has ALIGN(8) commands, but the on used in ST example has mostly ALIGN(4) .

Another thing I noticed, is the size of RAM and FLASH, very different in MbedOS.

// .ld file used in Mbed OS project
STACK_SIZE = 0x400;
  FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 0x200000
  RAM (rwx) : ORIGIN = (0x20000000 + (((126 * 4) + 7) & 0xFFFFFFF8)), LENGTH = (0x60000 + 0x20000 - (((126 * 4) + 7) & 0xFFFFFFF8))


// .ld file used in ST example
/* Highest address of the user mode stack */
_estack = 0x20080000;    /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x400;      /* required amount of heap  */
_Min_Stack_Size = 0x1000; /* required amount of stack */

/* Specify the memory areas */
RAM (xrw)      : ORIGIN = 0x20000000, LENGTH = 512K
FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 2048K

the alignement of 8 is ok, and Mbed uses the RAM begin for the ISR vector table, thats the calculated offset.
Then it will be some other reason for the freeze.

Have you included these ISR handlers in your Mbed code? And remember to use extern "C" when these lines are used in a .cpp unit.

Yes I did inlude them. Except I use DMA_Stream3 and DMA_Stream4 which are also streams for JPEG_In and JPEG_Out

with extern “C” linkage when they are in a .cpp file?

Like this

#ifndef __STM32F7xx_IT_H
#define __STM32F7xx_IT_H

#ifdef __cplusplus
extern "C" {

void JPEG_IRQHandler(void);
void DMA2_Stream3_IRQHandler(void);
void DMA2_Stream4_IRQHandler(void);

#ifdef __cplusplus

#endif /* __STM32F7xx_IT_H */

I didn’t put extern C in .cpp , only in header .

that looks good.

I have the same board and could also try later to debug when you put the project in some public place.

Hello again,
Thanks for your precious help.

I tried a few more things, and I was very surprised to see that stepping through the program line by line with the debugger did not trigger the strange freeze.
In fact, I was able to decode a JPEG image in DMA mode when stepping with the debugger.

However, I noticed that from times to times, the program enters in the function mbed_die(), which only do a while(1); and I think this is why my program completely freeze, it is stuck in an infinite loop …

Do you have any idea on why it sometimes works in debugger but never in normal run ?

I’ll see if I can setup a github repo to share the minmal working example.

Thanks a lot.

One big difference between debug and normal mode is deepsleep.

To check this, you can add sleep_manager_lock_deep_sleep(); at the beginning of your application.

Then, if this is the problem and you want deepsleep back, you need to add some clock check during your program, because maybe they have been reset during depsleep ?