UART losing data

Hi All,

Acknowledgement - something wrong, not sure what:
I’ve noticed a number of threads talking about issues with UARTs losing data, and here’s another… I know that people insist this is fixed, so I’m not saying this is an mbed bug, but it is a bug I can’t track down to my code, my config or the hardware signals, so more brains would be appreciated!

Problem summary
I’m doing everything I can to get reliable communication with a custom board’s UART based on STM32L432, but it just won’t behave nicely!

I’m losing bytes in a minimal test sending from a USB UART adapter to the device and reporting back the received bytes and the number of bytes the device received (main firmware code below).

Baud rate matters
At 115200 baud, it loses 0, 1 or 2 bytes in a 37 byte test message (estimate 80% >1 byte loss).
At 19200 baud, it receives no data at all and the chip doesn’t send any serial pulses on the line TX - refusing to output at 19200 baud? may actually be receiving, but more likely some settings is invalid - this could be a separate bug??
At 9600 baud, it receives 100% of the data over 50+ tests.

Excluded USB dongle
To rule out the sender: I’ve swapped the USB dongle and get the same results. Sending from one USB dongle to the other works flawlessly at all baud.

mbed config settings of interest
I’m using:

  • the ‘stdin’ port (which I believe is an LPUART, be great if someone knows a method to confirm?)
  • platform.stdio-buffered-serial = 1
  • drivers.uart-serial-rxbuf-size = 256
  • target.console-uart = 1
  • target.lpuart_clock_source = {USE_LPUART_CLK_PCLK1, USE_LPUART_CLK_LSE} (tried both settings, issues persist)

Are there other settings that I should investigate? Or is there something up with my code?

Checked on a scope
I have looked at the wave forms and they seem fine - nice square waves and the scope decodes the data fine. I’ve identified a “lost” byte and confirmed that their do not appear to be any anomalies.

Sample erroneous output:

started.
ABCdefghijkTotal: 11
lmnopqrstuvwTotal: 23
xyz123456780Total: 35

Total: 36
Active...
ATotal: 1
CdefghijklmnTotal: 13
opqrstuvwxyzTotal: 25
1234567890
Total: 36

NB: The missing byte in the first block is ‘9’, the missing byte in the second block is ‘B’

The missing ‘9’ is between other characters that are successfully captured, so this doesn’t seem possible it is my code.

Is that a reasonable conclusion?

Code

#include "mbed.h"
#include "events/EventQueue.h"

#define MAX_NUMBER_OF_EVENTS            20

#define USE_UART


EventQueue eq(MAX_NUMBER_OF_EVENTS * EVENTS_EVENT_SIZE);


 #ifdef USE_UART
 FileHandle
     *serial_in,
     *serial_out;

 int total = 0;

 void processChars()
 {
     char buffer[100];
     bool report = false;

     while(serial_in->readable())
     {
         //docs say this is non-blocking while 'readable()' returns true.
         int read = serial_in->read(buffer,100); //not using the data, just counting characters read.
         serial_out->write(buffer,read);

         if(read > 0) //should always be true.
         {
             total += read;
             report = true;
         }
     }

     if(report)
         printf("Total: %d\n",total);
 }
 #endif


 void report()
 {
     printf("Active...\n");
 #ifdef USE_UART
     total=0;
 #endif
 }


 void initSerialMonitor()
 {

 #ifdef USE_UART
     //get access to stdin and attach a sigio handler.
     serial_in = mbed::mbed_file_handle(0);
     serial_out = mbed::mbed_file_handle(1);

     BufferedSerial *bs = ((BufferedSerial *)serial_in); //serial_in is already configured to be a BufferedSerial object using: "platform.stdio-buffered-serial":1
     eq.call_every(1ms,processChars);
 #endif
     eq.call_every(10s,report);
 }


 int main()
 {
     initSerialMonitor();

     printf("started.\n");
     eq.dispatch_forever();
     printf("ended.\n");

     return 0;
 }

Thanks for reading!

Will.

Hi All,

I discovered the answer to my question (in part): the board is using the LPUART in it’s current configuration - this is specified in the PeripheralPins.c file for the target. (i.e. check mbed-os/target/TARGET_xxx/xxx/xxx directories if you are not using a custom_target).

Almost at the same time - I’ve been informed that there are significant rate limits for the LPUART peripheral when clocked from LSE, so I guess that it uses PCLK1 automatically above those limits.

I know there is no HSE, so I guess there could be timing accuracy issues with the clock chain (though it will be calibrating to the LSE I guess), but it could explain some of the issues here.

I understand the full USART/UART peripherals are more reliable as they can use 16x oversampling, so I’ll try that tomorrow and see if I get more reliable results.

Good catch, try to add in your local mbed_app.json something like:

    "target_overrides": {
        "XXX": {
            "target.lpuart_clock_source": "USE_LPUART_CLK_HSI",
        },

Hi @jeromecoutant , thanks for your feedback!

Testing USE_LPUART_CLK_HSI

I gave that a go and I got output that looks like the frequency is incorrect, so I guess there is some clock scaling that isn’t being picked up correctly.
Periodic ‘active’ messages are are interpreted as follows (raw data piped in to hex dump to ensure non-renederable chars are picked up):

00000000  a1 2c 57 cb 95 b9 2e 2e  0d 0a a1 2c 57 cb 95 b9  |.,W........,W...|
00000010  2e 2e 0d 0a a1 2c 57 cb  95 b9 2e 2e 0d 0a a1 2c  |.....,W........,|
00000020  57 cb 95 b9 2e 2e 0d 0a  a1 2c 57 cb 95 b9 2e 2e  |W........,W.....|
00000030  0d 0a a1 2c 57 cb 95 b9  2e 2e 0d 0a a1 2c 57 cb  |...,W........,W.|

Use UART2 instead of LPUART
Not being deterred, I changed the setting back (for completeness) and changed the peripheral pinmap to use UART2 on the same pins. Still no joy!
I haven’t done any more than change the pinmap (disable option for LPUART on target pins and enable the UART2 on them instead as the only UART RX/TX options). I don’t know if there is any more required to change the output to UART2 (any thoughts appreciated), but I believe it was using UART2 and got exactly the same results. (I confirmed that leaving the setting USE_LPUART_CLK_HSI in place while using UART2 did not cause invalid output, so that gives me confidence that I am actually using UART2)

This is the output using UART-2@115200 baud:

 ABCTotal: 3
 defghijklnTotal: 13
 opqrstuvwxyTotal: 24
 z1234567890
 Total: 36
 Active...
 ABCdefTotal: 6
 ghijklmnopqrTotal: 18
 stuvwxyz13Total: 28
 4567890
 Total: 36
 ABCdefghiTotal: 45
 jklmnopqrstuTotal: 57
 vwxyz12346Total: 67
 7890
 Total: 72
 ABCdefghTotal: 80
 ijklmnopqrstTotal: 92
 uvwxyz1235Total: 102
 67890
 Total: 108
 ABCdefghijkmTotal: 120
 nopqrstuvwxzTotal: 132
 123456780Total: 141

 Total: 142
 ABCdefgiTotal: 150
 jklmnopqrstTotal: 161
 uvwxyz1234Total: 171
 567890
 Total: 178
 Active...
 Active...
 Active...
 Active...
 ABCdefghijkmTotal: 12
 nopqrstuvwxTotal: 23
 yz1234567890Total: 35

 Total: 36
 ABCdefghijTotal: 46
 klmnopqrstvTotal: 57
 wxyz123457Total: 67
 890
 Total: 71
 Active...

An Odd Pattern
Looking at the chain of missing bytes:
defghijklnTotal: 13 missing m
stuvwxyz13Total: 28 missing 2
vwxyz12346Total: 67 missing 5
uvwxyz1235Total: 102 missing 4
ABCdefghijkmTotal: 120 missing l
nopqrstuvwxzTotal: 132 missing y
ABCdefgiTotal: 150: missing h
ABCdefghijkmTotal: 12 missing l
nopqrstuvwxTotal: 23 missing y
klmnopqrstvTotal: 57 missing u
wxyz123457Total: 67 missing 6

What is striking is that it is the penultimate byte that gets dropped from the ‘block’ every time, if any byte is missing.

I’m now leaning back towards this being a software issue…

Testing different polling pattern
Well, that ‘odd pattern’ was short lived (good!) I changed the polling pattern to every 2ms, and the missing character can now be partway through the ‘block’, but it seems more reliable than it was (i.e. most of the time it doesn’t lose data at all).

Current conclusions
Standing around scratching my head!
It seems somewhere between hardware and software, so perhaps hardware configuration?
That would make sense in some respects as other people made this board and I’m just modifying the firmware for new functionality.

Is there anything else I should check in the board configuration?

Thanks for reading

Will.

For the record, I’ve just instantiated a DeepSleepLock in the main function to ensure the system is not sleeping: no impact - bytes are still lost.

By the way:

custom board’s UART based on STM32L432

Don’t hesitate to share your custom boards in GitHub - ARMmbed/stm32customtargets: Enable the support of your custom boards in mbed-os 6

Thx

Maybe this 1ms time should be tuned for high baud rate ?

Maybe also read period is paused during write period ?

I didn’t check if this code is up to date, but you can have a look on:
https://os.mbed.com/cookbook/Serial-Interrupts

Jerome

@jeromecoutant Thanks for flagging the repo to share custom boards, I was unaware of it.
I’d love to share this custom board - but it’s not mine to share.

Thanks also for the suggestions! I’ve been using the BufferedSerial class which I understood was the best way to get serial data reliably. Unfortunately this seems to be incorrect!

I’ve just used the UnbufferedSerial and it works flawlessly, then returned to BufferedSerial and I’m losing data again. I’m certain I’m not overflowing the buffers on BufferedSerial, so I guess some other interference is going on.

This time I’ve disabled the stdin/stdout and used Unbuffered/Buffered Serial directly, so I can exclude anything complex on the stdin/stdout console. It simply seems to lose data as part of the buffering!

I’ll strip this down to a minimal example or what doesn’t work and what does, but I am pretty sure that it’s not hardware now.

I’ve just ensured I’m on the latest mbed release (6.14), this code gives terrible results:

#include "mbed.h"

#define MAX_NUMBER_OF_EVENTS            20

EventQueue eq(MAX_NUMBER_OF_EVENTS * EVENTS_EVENT_SIZE);

BufferedSerial buf(PA_2,PA_3,115200);

int total = 0;

void processChars()
{
    char buffer[100];
    bool report = false;

    while(buf.readable())
    {
        int read = buf.read(buffer,100);
        buf.write(buffer,read); //echo the data read straight back
    }
}

void initSerialMonitor()
{
    eq.call_every(1ms,processChars);
}



int main()
{
    DeepSleepLock lock;
    initSerialMonitor();

    printf("started.\n");
    eq.dispatch_forever();
    printf("ended.\n");

    return 0;
}

I’m sending data every ~0.1s so there’s some good ‘down time’. I’m convinced this is not exceeding any buffer sizes…

Output:

*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvxyz1234567890
*abdefghijklmnopqrstuvwxyz1234567890
*abcdefghijkmnopqrstuvwxyz123456780
*abcdefghijklmnopqrstuvwxyz123456789
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*acdefghijklmnopqrstuvwxyz1234567890
*abcefghijklmnpqrstuvwxy1234567890
*abcdefghijklmnopqrstuvwxyz123456890
*abcdefghiklmnopqrstvwxyz1234567890
*bcdefghijklnopqrstuvwxyz1234567890
*abcdefghijklmnopqrsuvwxyz1234567890
*abcdefghijklmnopqrtuvwxyz123567890
*abcdefghijklmnopqrstuvwxyz123467890
*acdefghijklmopqrstuvwxz1234567890
*abcdefghijklmnopqrstuvwxyz134567890
*abcdefghijlmnopqrstuvwxyz1234567890
*---
*abcdefghijklmnopqrsuvwxyz1234567890
*abcdefghijklmnopqrstuvwxz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijlmnopqrstuvwyz1234567890
*abcdefghijklmnopqrstuvwyz123456780
*abcdefghijklmnopqrstvwxyz1234567890
*abcdefghijklmnopqrstuvwxy1234567890
*acdefghijklmopqrstuvwxyz123456789090
*abcdefghijlmnopqrstuvwxyz1234567890
*abcdefghijkmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstvwxyz123457890
*abcdfghijklmnopqrstuvwxyz1234567890
*abcdefghjklmnopqrstuvwxyz123457890
*abcdefghijklmnopqrstuvwxz1234567890
*abcdefhijklmnopqstuvwxyz1234567890

(I’ve just had a post sent for moderation, so this may not make much sense/be out of order - my next post may come before this one…)

Conversely, this code works perfectly:

 #include "mbed.h"

 #define MAX_NUMBER_OF_EVENTS            20

 EventQueue eq(MAX_NUMBER_OF_EVENTS * EVENTS_EVENT_SIZE);

 UnbufferedSerial unbuf(PA_2,PA_3,115200);

 void on_rx_interrupt()
 {
     uint8_t c;

     if (unbuf.read(&c, 1)) {
         unbuf.write(&c, 1);
     }
 }


 void initSerialMonitor()
 {
     unbuf.attach(on_rx_interrupt);
 }

 int main()
 {
     DeepSleepLock lock;
     initSerialMonitor();

     eq.dispatch_forever();

     return 0;
 }

Output:

*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*---
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890
*abcdefghijklmnopqrstuvwxyz1234567890

I guess I’ll use a circular buffer for TX and RX and avoid the BufferedSerial class…

Curious about the cause, but I’ve lost too many hours to this already… If anyone has any theories to test I’d be game, but I’m not going digging any further.

For the record, I posted two more posts last night (they’ve been flagged as spam so should be released by a human at some point): they demonstrate that UnbufferedSerial echo works reliably, but BufferedSerial echoing does not.

Given all the encouragement I’ve read about using the BufferedSerial over all the other options, this is very surprising, particularly when BufferedSerial works at lower baud, but not reliably at 115200.

I guess this is something to do with a DMA specific issue (if it uses DMA?) or device config in some way, but it is clearly software not hardware. I’ve confirmed that the LPUART1 and USART2 peripherals both operate very well indeed at 115200 (though LPUART1 still won’t operate at 19200, which I suspect is a hardware issue).

I’d suggest that people having trouble with serial data loss run a very simple echo test using the two different methods (UnbufferedSerial and BufferedSerial) of controlling serial and see if that makes a difference to their situation - do not just accept the assertions that BufferedSerial is ‘better’ as correct for your situation!

Hi @jeromecoutant ,

I spotted this github issue that is very relevant BufferedSerial skips characters in some circumstances #13422 - and you are the last person to comment. Should I tag on these details as ‘potentially relevant’, or start a new issue?

Thanks,

Will.

Or to ask the question another way:

What is easier for the team to work with:
having two issues that turn out to be one, or one issue that needs to split in to two?

This is a very quick (almost cryptic) update in case it gives someone else a useful clue:

  1. The system was being tested from (and powered by) a Raspberry Pi. Moving testing to a fully powered x86 seemed to make it more stable (probably due to power source improvement), though not entirely sure what the explanation is.
  2. The BufferedSerial, UnbufferedSerial and STM32 HAL implementations all work very well, IF (and only if) the MCU is in a ‘busy waiting for events’ loop. Without the busy-waiting event loop (i.e. using dispatch_forever) they all have issues and produce very poor results. I suspect holding the deep-sleep lock isn’t enough to prevent some level of sleep that is adversely affecting the system.

These implementations seem to give an error rate of:

  • STM32-HAL: 0.0012%
  • UnbufferedSerial: 0.0000%
  • BufferedSerial: <not measured accurately, but anecdotally: ‘reasonable’>

The STM32-HAL implementation is simply ‘read data and store in circular buffer’ on interrupt, then call an event in the event loop. UnbufferedSerial implementation does the same. Not sure how STM32HAL implementation can be failing very slightly more than mbed implementation (which is based on the STM32-HAL), but there is a minor difference somewhere along the way.

1 Like

P.S. For completeness on methodology: I had inserted a ‘failure packet’ once every 2000 packets and then send data from host PC over USB converter to the device as quickly as possible. After a fair few million packets (115200-8-N-1) the error rates were calculated, accounting for the ‘expected’ failures given the number of successes.

Hi All,

This has just bitten me again (yesterday).

Despite having tested the above code and discovered that using a ‘while busy’ loop to wait for events prevents the issue in my exploration above, I’ve integrated it in to another mbed app and it resurfaced when using 115200-8-N-1 (it works well at 19200-8-E-2).

The other mbed app is using threads (unfortunately), so I guess this is something to do with that?

Whatever the coding context, I can’t see how any of this can be user code, other than if it were going in to a deep sleep. As I understand it, the UART peripheral should be entirely independently accepting data and then triggering an interrupt (or DMA?) when the data is available. The ISR will then remove the data from the peripheral buffer and store it elsewhere, then let the MCU get back to whatever it was doing/wants to do next.

I’ll try and do some more digging, but time is a luxury I don’t have right now.

This is very much an outstanding issue though - all thoughts on possible explanations appreciated!