ESP32/WiFi not working under certain conditions with Mbed-OS-6

Hi,

I have been trying to get both Ethernet and ESP32/WiFi to work on Mbed-OS 6. When system boots, it will try to connect to internet through Ethernet, if that doesn’t work, then it will try to connect via WiFi/ESP32. Such scheme has been working on Mbed-OS-5 just fine, and a simple test code also works fine on Mbed-OS-6. The problem is such scheme no longer works in our production code when migrated to Mbed-OS-6 from 5.

If I unplug the ethernet cable when system boots, after about 60 seconds, connecting through ethernet will timeout, then system will try to connect via ESP32. At this point in our production code, a -3010 error code will be thrown. This seems to be caused by failure to setup DHCP on ESP32:

int ESP32Interface::connect()
{
    if (_ap_ssid[0] == 0) {
        return NSAPI_ERROR_NO_SSID;
    }

    if (!_esp->dhcp(_dhcp, 1)) {
        return NSAPI_ERROR_DHCP_FAILURE;
    }

    if (!_dhcp) {
        if (!_esp->set_network(_ip_address.get_ip_address(), _netmask.get_ip_address(), _gateway.get_ip_address())) {
            return NSAPI_ERROR_DEVICE_ERROR;
        }
    }

    set_connection_status(NSAPI_STATUS_CONNECTING);
    if (!_esp->connect(_ap_ssid, _ap_pass)) {
        set_connection_status(NSAPI_STATUS_DISCONNECTED);
        return NSAPI_ERROR_NO_CONNECTION;
    }

    return NSAPI_ERROR_OK;
}

Occasionally system can go pass this step, but it will then throw a -3012 error code:

NSAPI_ERROR_DEVICE_ERROR        = -3012,     /*!< failure interfacing with the network processor */

If I turn on debug mode when instantiate ESP32 like this:

ESP32Interface wifi(WIFI_RADIO_EN, WIFI_RADIO_BOOT, WIFI_RADIO_TX, WIFI_RADIO_RX, true, /*WIFI_RADIO_RTS*/ NC, /*WIFI_RADIO_CTS*/ NC, 115200);

I usually can only see two character: AT in console, then system hangs.

This seems to point to failure of UART communication between MCU and ESP32.

Can somebody offer some pointers how to go from here?

Thanks,
ZL

I’d suggest attaching a debugger, getting it to hang, and then grabbing a stacktrace from the ESP32 driver thread (you will probably have to use PyOCD as it’s the only debugger that understands Mbed threads :frowning: ).

Also, is this with Mbed CE or ARM Mbed OS 6?

Mbed-OS-v6.17. The crash is caused by something else. I reused the UART port for both application and STDIO. This works sometimes.

The problem with ESP32 seem to be related to DNS:
AT> AT+CIPSTART=1,"UDP","8.8.8.8",53 AT? OK%n AT< AT< AT+CIPSTART=1,"UDP","8.8.8.8",53 AT< ERROR AT(Timeout) AT> AT+CIPSTART=1,"UDP","209.244.0.3",53 AT? OK%n AT< AT+CIPSTART=1,"UDP","209.244.0.3",53 AT< ERROR AT(Timeout) AT> AT+CIPSTART=1,"UDP","84.200.69.80",53 AT? OK%n AT< AT+CIPSTART=1,"UDP","84.200.69.80",53 AT< ERROR AT(Timeout)
If the following line:

int ret = wifi.connect((char*)"AP-SSID:, (char*)"PASSWORD", NSAPI_SECURITY_WPA_WPA2);

runs after the following line that times out after ~60s

int ret = eth.connect();

Then ESP32 will timeout when trying to connect to DNS servers. If I just replace the failed eth.connect() line with a simple delay:

ThisThread::sleep_for(60s);

ESP32 will work just fine.

This seems rather bizarre and I don’t know what to make of it.

Is it possible to grab a Wireshark capture of the traffic produced on the wifi network?

It turns out another thread tries to access ESP32 instance to get RSSI while the wifi.connect() sequence is still running. This causes a dead lock and both sequence time out. Somehow this bug doesn’t present itself when the same code base is compiled with Mbed-OS v5.15. I guess either there is some slight difference in how soon Ethernet connect timeout or the locking mechanism in UARTSerial and UnbufferedSerial.

AT? ready%n
AT< �
AT= ready
AT> AT+RST
AT? OK%n
AT< 
AT< AT+RST
AT= OK
AT? ready%n
AT< 
AT= ready
AT> AT+UART_CUR=115200,8,1,0,0
AT? OK%n
AT< 
AT< AT+UART_CUR=115200,8,1,0,0
AT= OK
AT> AT+GMR
AT? AT version:%*hhx.%*hhx.%*hhx.%*hhx%n
AT< 
AT< AT+GMR
AT= AT version:3.4.0.0
AT? OK%n
AT< (c31b833 - ESP32S2 - Jun  7 2024 03:46:38)
AT< SDK version:v5.0.6-dirty
AT< compile time(70ff5889):Jun  7 2024 04:50:14
AT< Bin version:v3.4.0.0(MINI)
AT= OK
AT> AT+CWMODE=1
AT? OK%n
AT< 
AT< AT+CWMODE=1
AT= OK
AT> AT+CIPMUX=1
AT? OK%n
AT< 
AT< AT+CIPMUX=1
AT= OK
AT> AT+CWAUTOCONN=0
AT? OK%n
AT< 
AT< AT+CWAUTOCONN=0
AT= OK
AT> AT+CWQAP
AT? OK%n
AT< 
AT< AT+CWQAP
AT= OK
// Another thread tries to get WiFi RSSI here
AT> AT+CWJAP?
AT? +CWJAP:"%*32[^"]","%*17[^"]"%n
AT< 
AT< AT+CWJAP?
AT< OK
AT(Timeout)
AT> AT+CWDHCP=1,1
AT? OK%n
AT< �
AT< ready
AT(Timeout)
AT> AT+CWDHCP=1,1
AT? OK%n
AT< AT+CWDHCP=1,1
AT= OK

In theory this shouldn’t happen because there is a mutex in place:

rtos::Mutex _smutex; // Protect serial port access

But somehow a different thread was able to send out a bunch of AT commands while the mutex is being locked:

bool ESP32::dhcp(bool enabled, int mode)
{
    //only 3 valid modes
    if (mode < 0 || mode > 2) {
        return false;
    }

    _smutex.lock();
    _startup_wifi(); // <-- AT command to get RSSI is sent out while this sequence is still running.
    bool done = _parser.send("AT+CWDHCP=%d,%d", enabled?1:0, mode)
       && _parser.recv("OK");
    _smutex.unlock();

    return done;
}