ESP8266 constantly lose connection with WiFi router?

Hi,

This is more likely a problem of ESP8266 itself, but I will post it here just in case there is something in the ESP8266 driver.

With Mbed-OS-5.15, ESP8266 mostly working fine. But upon careful inspection, I noticed that it will constantly lose connection with router then re-connects by itself shortly after.

Some post claims that the problem will go away once sleep mode is disabled. In mbed’s case, setting sleep mode at the time of connect to ESP8266 doesn’t seem do help much if at all.

// ESP8266.cpp
nsapi_error_t ESP8266::connect(const char *ap, const char *passPhrase)
{
    nsapi_error_t ret = NSAPI_ERROR_OK;

    _smutex.lock();
    set_timeout(ESP8266_CONNECT_TIMEOUT);

    bool res = _parser.send("AT+CWJAP_CUR=\"%s\",\"%s\"", ap, passPhrase);
    if (!res || !_parser.recv("OK\n")) {
        if (_fail) {
            if (_connect_error == 1) {
                ret = NSAPI_ERROR_CONNECTION_TIMEOUT;
            } else if (_connect_error == 2) {
                ret = NSAPI_ERROR_AUTH_FAILURE;
            } else if (_connect_error == 3) {
                ret = NSAPI_ERROR_NO_SSID;
            } else {
                ret = NSAPI_ERROR_NO_CONNECTION;
            }
            _fail = false;
            _connect_error = 0;
        }
    }
   // The following are added lines to ensure sleep mode is turned off
    res = _parser.send("AT+SLEEP=0");
    if (!res || !_parser.recv("OK\n")) {
        printf("%d: ESP8266 set sleep mode failed\r\n", time(nullptr));
    } else {
        printf("%d: ESP8266 sleep mode set to 0\r\n", time(nullptr));
    }

    set_timeout();
    _smutex.unlock();

    return ret;
}

Withe sleep mode turned off, I can see 30mA increase of current draw of the whole system, presumably can be attributed to ESP8266. But ESP8266 still keeps dropping connection.

If we use ESP8266 to connect to a WebSocket server, the connections will disconnect and reconnect mostly in 10 minutes, sometimes longer, like 40 minutes, depending time of the day or something else. If we repeatedly use ESP8266 to make HTTP calls, sometimes we will see:

NSAPI_ERROR_CONNECTION_LOST     = -3016,     /*!< connection lost */

Such problems don’t happen if we use Ethernet or cellular connection.

I assume the only solution is to customize ESP8266 firmware. Right now we use AT-nonos-1.7.4. Is there anything we can do on MBed side?

Thanks.

I think the root problem is ESP8266/32 being an inexpensive WiFi solution is just not powerful enough to handle concurrent or consecutive network requests. When it is busy, it appears to non-responding, and therefore Mbed thinks it has lost connection.

1 Like

Try to increase the serial baud rate to the ESP8266
default 115200 is way too slow.

You will need to update the ESP8266 firmware with this command:

uint32_t    baud=230400;
uint32_t    databits=8;
uint32_t    stopbits=1;
uint32_t    parity=0;
uint32_t    flowcontrol=0;
"AT+UART_DEF=%d,%d,%d,%d,%d",baud,databits,stopbits,parity,flowcontrol

Then set

"esp8266.serial-baudrate"           : 230400,

in mbed_app.json

You can go up to 460800 on faster platforms like the NUCLEO_F767ZI

Possibly not the best WiFi solution for Mbed nowadays.
Using my own ESP8266 Bare Metal driver it does stay connected at speeds up to 921600 but loose the Mbed-OS functionality.

I will give this a try. Meanwhile, could you elaborate on the rationale why slow baud rate can cause such problem?

During my test, lost of connection happens more often when there is high traffic on local network, such as when kids are streaming videos over Wi-Fi. The other factor that increases the frequency of connection lost is amount of data sent over ESP8266. When I limited WS messages sent over ESP8266 to 1 message every 5 minutes, plus WS ping-pong every 10 seconds, I saw only one connection lost in 10 hours. On the other hand, if I ramp up message count to 1 messae per minute, while kids streaming videos, I see lost of connection every 10 minutes to 1 hour.

Newer chips like ESP32 seems to do better/lose connection less often, but the problem is still there.

May I ask what is the nature of your application, ESP8266 only becomes a problem when I use WebSocket, and continuously runs ping-pong every 10 seconds plus application layer traffic. Also what would you recommend for WiFi solution? Newer ESP32 seems to do better but still has problems.

Depends what you are doing, as a WEB server I couldn’t send detailed web pages reliably at 115200, the FAVICON file wouldn’t get to the WEB Client fast enough and very really loaded. At 460800 it worked okay but still a bit patchy. 921600 and it would be acceptable.
Works much better If you can get the data throughput fast enough and will work with more than one connection. try at 230400 first

I’m using a Mbed TLS web socket connection to Google Firebase.
I had a similar issue two years ago and couldn’t get a resolve.
But this was on both Ethernet and WiFi, you say GSM and Ethernet is working okay so probably not a related issue.

So as a workaround I would try to send data, if it failed, reconnect the TLS socket and send again.
At that time I couldn’t find a method to test if the TLS connection was live.
I didn’t actually need permanent connection, I use Firebase as a central database go between for several devices.
That worked okay for me so left it at that.

I haven’t tried it recently, I might dig it out a see if anything has improved.

However, I’ve reluctantly moved over to the 16Mb Flash/4Mb PSRAM ESP32 for now, the built-in event driven WiFi works really well and we needed a sensible firmware ota update option, GSM and MQTT. This works out of the box on the ESP32, one tiny device add a modem and it does it all.
Dreadful IDE and not ARM cores.
But I wouldn’t say its perfect, I still have to test MQTT broker connection on GSM, the operators will drop a TCP connection after a while if there’s no traffic. The modem indicates its connected to cell however you can’t reach it.
This doesn’t happen on WiFi, we have continuous MQTT connections uninterrupted for months.

Thanks very much for the detailed response.

MQTT seems to use a much more permissive keep-alive mechanism, and I think that is the key. Once I loosened ping-pong requirement in WebSocket to allow 2 errors in 5 minutes period, WS connection stays on for 10s of hours until I disconnect devices for other reasons.

I also use MBed TLS Socket to connect to my own device gateway, not in production yet, but so far everything seems work well.

I agree using ESP32 as a host looks like a very attractive solution now. If the project I am working on was not started 3 years ago, I probably have switched to ESP32 in a heartbeat. Paying up to $40/piece for STM32F4 vs $4/piece for ESP32 that I would have to buy anyway is quite an easy decision.