What are the possible causes that an EventQueue stops working?

Hi,

One of the EventQueues in our Mbed based application recently stopped working. By design, upon system starts, main thread will start another thread and dispatch an EventQueue forever.

Somethin g like the following:

#include "mbed.h"

EventQueue radioRoutineEvQue;
Thread radioThread;

void radioRoutine()
{
    // do something mundane
}

int main()
{
    Watchdog &watchdog = Watchdog::get_instance();
    watchdog.start(30000);  // Max allowed: 32.8 seconds
    printf("%d: System boot\r\n", time(nullptr));

    radioRoutineEvQue.call_every(10, &radioRoutine);
    radioThread.start(callback(&radioRoutineEvQue, &EventQueue::dispatch_forever)); 

    while(true) {
        // do something
        watchdog.kick();
        ThisThread::sleep_for(1000);
    }
}

In one of the test units, radioRoutine() was found not being called. Because the main thread was still running, this went on for several days, which is very bad.

My questions are: 1) what can cause an EventQueue to stop working? Memory overflow? 2) Is there a way to gracefully handle an EventQueue that stopped working?

Thanks in advance.

maybe a timer overflow? The old ticker class had a problem for a long time with handling timer overflows. I think it was about 3 weeks runtime for a µs counter for an overflow.
Is the real code using a static function for radioRoutine() or is it a member function? I’m not sure if calling a member function is adding some dynamic memory usage.

It calls a member function. It is hard to imagine how EventQueue itself just stops working. Is it possible radioThread somehow crashed or stopped running? Or stuck in some sort of waiting (for Mutex, EventFlag etc.)?

This seems to be a very rare situation, so far I have seen it happen only once.

We’ve had this issue before, very hard to diagnose, using cortex debug often removes the issue.

In our case is was a mix between ISR / Mutex / EventQueue.

Memory overflow can also be the issue – what is happening in radioRoutine?

The routine itself just transforms and moves bytes array around. I printed out memory usage of each thread, found that radioThread typically uses 3896B, which is very close to the reserved 4096B. My speculation is, at some point, system restarted by watchdog, then didn’t have enough RAM in that thread to start radioRoutineEvQue.

is this using mutex? is it waiting on some operations? using SPI, I2C or something else?

When the watchdog restarts the system, memory usage goes back to zero and the system starts from scratch.

That being said, context switching and stuff can clearly overflow the reserved space when you’re so close. Try using more memory to see if the issue persists.

There is a mutex, but it is very unlikely that was the cause.

This is a weird part that I don’t fully understand. The memory usage by that thread is usually 3096B when binary is compiled by ARMC6. If binary is compiled by GCC, then the memory usage is over 4KB outright. There are some other stuff going on in that thread, depending on factors I don’t fully understand, memory usage by that thread may be slightly more or less. Because it is close to reserved 4KB, in certain scenario, it will fail to start EventQueue. I increased reserved memory to 8KB, hopefully that will prevent it from happening again.

There may be a time overflow, or there may be another event that causes it to enter the waiting queue.