How much faster can DMA make SPI based LCD driver(ILI9341)?

Hi,

I am trying to figure out a way to improve the speed of a SPI based LCD driver for ILI9341. It takes >300ms to update a portion of the LCD on STM32F401RE, with SPI clock at 50MHz.

The options seem to be either use DMA or use parallel port. Can anyone offer some ballpark estimate on how much faster DMA or parallel port is than SPI without DMA?

@hudakz, can you offer some input from your work on:

Thanks.

Hello Zhiyong,

When running the Simple pong game (built offline with GCC Arm using Mbed OS 2) on an STM32F407VE black board the tft_boxfill function was about five times faster when using DMA:

Modified the Racket.cpp:

...
void Racket::paint(uint16_t clr /*= TFT_WHITE*/ )
{
    Timer timer;

    timer.start();
    tft_boxfill(xPos - width / 2, yPos - height / 2, xPos, yPos + height / 2, clr);
    timer.stop();
    printf("elapsed time = %dus\r\n", timer.read_us());
}
...

The tft.cpp without DMA:

...
void tft_boxfill(int x1, int y1, int x2, int y2, uint16_t color)
{
    uint16_t  pixel[x2 - x1];
    int i;
    int j;

    tft_set_window(x1, y1, x2, y2);

    for (i = 0; i < (x2 - x1 + 1); i++) {
        pixel[i] = color;
    }

    for (i = 0; i < (y2 - y1 + 1); i++) {
        while (SpiHandle.State != HAL_SPI_STATE_READY){}
        HAL_SPI_Transmit(&SpiHandle, (uint8_t*)pixel, (x2 - x1) * 2, 100);
//        HAL_SPI_Transmit_DMA(&SpiHandle, (uint8_t*)pixel, (x2 - x1) * 2);
    }
}
...

took 2447 us to execute.

The tft.cpp using DMA:

...
void tft_boxfill(int x1, int y1, int x2, int y2, uint16_t color)
{
    uint16_t  pixel[x2 - x1];
    int i;
    int j;

    tft_set_window(x1, y1, x2, y2);

    for (i = 0; i < (x2 - x1 + 1); i++) {
        pixel[i] = color;
    }

    for (i = 0; i < (y2 - y1 + 1); i++) {
        while (SpiHandle.State != HAL_SPI_STATE_READY){}
//        HAL_SPI_Transmit(&SpiHandle, (uint8_t*)pixel, (x2 - x1) * 2, 100);
        HAL_SPI_Transmit_DMA(&SpiHandle, (uint8_t*)pixel, (x2 - x1) * 2);
    }
}
...

took 487 us to execute.

2447 / 487 ~= 5

That’s quite some improvement.

Any idea about parallel port vs. SPI + DMA? Assuming bottleneck is at transmission to LCD, are we looking at another several times faster or roughly on par?

Thanks.

Unfortunately, I have no experience with parallel ILI9341 yet. Maybe someone else can share his/her experience?

A pretty fast interface for TFT with controllers is FSMC, but after a short check, I see the F401 does not have it.
I’m using FSMC with a F407: you just need to initialize FSMC to the desired mode, then you have two addresses, one for writing command and one for data. So setting pixels is writing a ‘set address window’ command and then writing continous pixel data to the data address. FSMC is generating the sequence of the signals. Setting one RGB565 pixel takes about 100 ns when a block of many pixels is written. This can be combined with DMA also.
The FSMC uses 16 bit parallel data + control signals, maybe it can be used also with 8 bit data.

How long does it take to draw the entire screen at either 240X320 or higher resolution with FSMC on F407?
My estimation is SPI + DMA should be enough for one of our project that only needs 240X320. But for the other project that needs 800X600, SPI+DMA might not be adequate.

I have not measured it exatly, can do some measurements later. It should be about 20-30 ms without DMA.

Thank you Johannes for the info! I haven’t heard about the FSMC driver before but it sounded like something that’s worth to try. So here is my trial program along with some test results.