LWIP application with two TCPSocket servers in separate threads

I have an application I’ve written that has two network servers running, each in a Thread.

Thread 1 - HTTP Server bound to port 80
Thread 2 - TCP Server bound to port 9100 (custom protocol)
Each thread initializes it’s own TCPSocket to act as a server and listening for connection attempts.

What happens is that after I start up both threads so that both server TCPSockets() have been initialized and accepting() connections is that I can only interact with one server at a time. I can connect to the HTTP Server on Thread 1 get back the correct HTML etc. I can also make a connection to the TCP Server on Thread 2 and do whatever is needed.

However, if I have the TCP Server on Thread 2 handling a connection sending and receiving data and attempt to access the HTTPServer on Thread 1 I encounter a hardfault and the board resets.

The two server threads aren’t sharing any resources with the exception of the EthernetInterface.
If I pass the same EthernetInterface to both threads the threads are initialized correctly.

However, if I instantiate two EthernetInterface objects like this:

EthernetInterface  httpnet = EthernetInterface();
EthernetInterface  ecnet = EthernetInterface();

and then instantiate my two Server threads like this:

    NetServer server( &ecnet, 9100, jobQueue);
    HTTPServer webServer( &httpnet, 80 );

    server.start();
    webServer.start();
    processor.start();
    watchdog.start();

The HTTPServer thread fails to initialize the TCPSocket.
But if I send the same EthernetInterface object to the constructors of the server threads then they are both initialized and the servers can accept connections - just not simultaneoulsy.

Both threads do the following to initialize and get the server socket ready to accept connections.

// Initialize network socket, bind and start listening for connections
bool NetServer::Init()
{
    bool retCode = false;
    SocketAddress server_address;
    nsapi_error_t result;

    m_interface->attach(&util::net::NetworkStatus_callback);

    result = NSAPI_ERROR_OK;
    if( NSAPI_STATUS_DISCONNECTED == m_interface->get_connection_status() )
    {
        result = m_interface->connect();
    }

    if( NSAPI_ERROR_OK == result )
    {
        if( NSAPI_ERROR_OK == m_interface->get_ip_address(&server_address))
        {
            m_ipAddress = std::string(server_address.get_ip_address());
            printf("\nServer IP address: %s\n", m_ipAddress.c_str());  

            server_address.set_port(m_port);
            if( NSAPI_ERROR_OK == m_serverSock.open(m_interface))
            {
                if( NSAPI_ERROR_OK == m_serverSock.bind( server_address ))
                {
                    if( NSAPI_ERROR_OK == m_serverSock.listen(5))
                    {
                        retCode = true;
                    }
                }
            }
        }
    }
    return (retCode);
}

Note that m_interface is the EthernetInterface passed in the constructor.

Any idea what I might be doing wrong?
I could share more of the code if needed.

My hardware in the NXP RT1050 EVKB with 100mbit Ethernet.

Nobody?
Is this board not capable of supporting multiple simultaneous server sockets listening on different ports and serving both ports concurrently?

have you tried to increase the number of sockets and tcp-sockets? Or use a smaller number for backlog in listen (should work also with 0 or 1) ? A webbrowser tries to establish multiple connections, I’m not sure if the higher backlog number requires also a higher number of sockets.
Your first attempt with one instance of EthernetInterface was ok and should work. I have used this on a STM32 board with different servers in different threads and it works.

settings for the max sockets in mbed_app.json :

{
    "target_overrides": {
        "*": {
            "lwip.socket-max": 10,
            "lwip.tcp-socket-max": 10
        }
    }
} 
1 Like

Hi @JojoS ,

Thank you for the reply.
Here is my mbed_app.json file. Do you see any issues?

{
    "macros": [
        "MBED_HEAP_STATS_ENABLED=1"
    ],
    "target_overrides": {
        "*": {
            "drivers.uart-serial-rxbuf-size": 512,
            "drivers.uart-serial-txbuf-size": 512,
            "platform.stdio-baud-rate": 115200,
            "platform.stdio-buffered-serial": 1,
            "target.printf_lib": "std",
            "platform.cpu-stats-enabled": false,

            "lwip.tcpip-thread-stacksize": 1328,
            "lwip.default-thread-stacksize": 640,
            "lwip.memp-num-tcp-seg": 32,
            "lwip.tcp-mss": 1440,
            "lwip.tcp-snd-buf": "(8 * TCP_MSS)",
            "lwip.tcp-wnd": "(TCP_MSS * 8)",
            "lwip.pbuf-pool-size": 16,
            "lwip.mem-size": 51200,
            "lwip.tcp-server-max": 6,
            "lwip.tcp-socket-max": 6,
            "target.macros_remove" : ["HYPERFLASH_BOOT"]
        }
    }
}

I just noticed I don’t have socket max set. I will try that.

@JojoS

Even with those changes I’m still having issues. I guess I will have to do a deep-dive into what Mbed OS is doing with it’s LWIP wrapper classes.

I can not give you a good feedback, but I can confirm you that LwIP is working as expected. Maybe you should reuse the EthernetInterface instance instead of creating new ones for each server (I really don’t know the consequences of creating two interfaces in Mbed OS). For your reference, this example is working using K64F Ethernet:

#include "mbed.h"
#include "mbed_trace.h"
#define TRACE_GROUP "main"

NetworkInterface *nw = NetworkInterface::get_default_instance();
Thread thread1;
Thread thread2;

static rtos::Mutex trace_mutex;


static void trace_wait() 
{
    trace_mutex.lock();
}

static void trace_release() 
{
    trace_mutex.unlock();
}

static void trace_open()
{
    mbed_trace_init();
    mbed_trace_mutex_wait_function_set(trace_wait);
    mbed_trace_mutex_release_function_set(trace_release);
}

static void trace_close()
{
    mbed_trace_free();
}

void start_server(uint32_t port) 
{
    nsapi_error_t error;
    SocketAddress address;
    TCPSocket sock;
    Socket *client;

    if (NSAPI_ERROR_OK == nw->get_ip_address(&address)) {
        tr_info("%s [%d] - addr: %s:%d", __FUNCTION__, (int)rtos::ThisThread::get_id(), address.get_ip_address(), port);
        address.set_port(port);

        if (NSAPI_ERROR_OK == sock.open(nw)) {
            if (NSAPI_ERROR_OK == sock.bind(address)) {
                if (NSAPI_ERROR_OK == sock.listen(5)) {
                    while (true) {
                        tr_info("%s [%d] - accepting", __FUNCTION__, (int)rtos::ThisThread::get_id());
                        client = sock.accept(&error);
                        if (error == NSAPI_ERROR_WOULD_BLOCK) {
                            tr_info("%s [%d] - Timeout", __FUNCTION__, (int)rtos::ThisThread::get_id());
                            continue;
                        } else if (error != NSAPI_ERROR_OK) {
                            tr_err("%s [%d] - error %d", __FUNCTION__, (int)rtos::ThisThread::get_id(), error);
                            continue;
                        } else if (error == NSAPI_ERROR_OK) {
                            tr_info("%s [%d] - OK!", __FUNCTION__, (int)rtos::ThisThread::get_id());
                            client->close();
                            break;
                        }
                    }
                }
            }
        }
    }
}

void start_server1() {
    start_server(80);
}

void start_server2() {
    start_server(9000);
}


int main(void)
{
    printf("\033[2J"); // Clean screen
    trace_open();

    tr_info("%s - Connecting Network", __FUNCTION__);
    if (nw->connect() != 0) {
        tr_err("%s - Network not connected", __FUNCTION__);
        return 1;
    }
    tr_info("%s - Network is connected", __FUNCTION__);
    thread1.start(callback(start_server1));
    thread2.start(callback(start_server2));
    thread1.join();
    thread2.join();
    return 0;
}

Output:

[INFO][main]: main - Connecting Network
[INFO][main]: main - Network is connected
[INFO][main]: start_server [536874204] - addr: xxx.xxx.xxx.xxx:80
[INFO][main]: start_server [536874392] - addr: xxx.xxx.xxx.xxx:9000
[INFO][main]: start_server [536874204] - accepting
[INFO][main]: start_server [536874392] - accepting
[INFO][main]: start_server [536874204] - OK!
[INFO][main]: start_server [536874392] - OK!
1 Like

@davidAlonso ,

Thank you for the reply. I can get both servers up and running with the same default NetworkInterface instance. Both servers can even handle requests.

However, the problem is that as soon as both servers attempt to handle requests at the same time a hardfault occurs.

So I can hit the HTTP server all day long. And I can hit the other server all day long. But hit them both simultaneously and boom… HardFault.

Probably time to figure out how to decode the HardFault register information.

I am working in a application based on a multithreaded environment with network connection and I haven’t any HardFault problems. Maybe interacting with an unexpected closed socket or something similar is your problem.

I suggest following Mbed OS Error Handling and enabling all the available reporting information from mbed, such as thread stats, filename and number and so on.

1 Like

@davidAlonso ,

Thank you for the link. I’ll give it a look.

I will also try your example code and see if I can get both servers to work simultaneously. If I can then I will know that I’m just doing something wrong in my code.

I have two “server” classes that incorporate their own Threads… so perhaps something there is wrong. It’s just weird that individually they work, but simultaneously they don’t. And they don’t share any resources so I don’t expect any of the usual thread resource sharing issues to apply.

I will go back to basics and see how your code works on my platform.

Thank you again. I appreciate you taking the time to help out.

Just for reference, this is how each server thread is structured.

#include <cstddef>
#include <cstdint>
#include <stdio.h>

#include "TACT_util.h"
#include "TACT_Events.h"
#include "TACT_NetServer.h"


NetServer::NetServer( NetworkInterface *interface, int port, Queue<Job, QUEUE_SIZE> &queue) : m_interface(interface),  m_port( port ), m_jobQueue(queue) {}

NetServer::~NetServer()
{
    m_serverSock.close();
    m_interface->disconnect();
}

void NetServer::start()
{
    m_running = false;
    if( true == this->Init() )
    {
        m_thread.start(callback(this, &NetServer::Server));
    }
}

std::string NetServer::ipAddress()
{
    return( m_ipAddress );
}

bool NetServer::isRunning()
{
    return( m_running );
}

// Initialize network socket, bind and start listening for connections
bool NetServer::Init()
{
    bool retCode = false;
    SocketAddress server_address;
    nsapi_error_t result;

    m_interface->attach(&util::net::NetworkStatus_callback);

    result = NSAPI_ERROR_OK;
    if( NSAPI_STATUS_DISCONNECTED == m_interface->get_connection_status() )
    {
        result = m_interface->connect();
    }

    if( NSAPI_ERROR_OK == result )
    {
        if( NSAPI_ERROR_OK == m_interface->get_ip_address(&server_address))
        {
            m_ipAddress = std::string(server_address.get_ip_address());
            printf("\nServer IP address: %s\n", m_ipAddress.c_str());  

            server_address.set_port(m_port);
            if( NSAPI_ERROR_OK == m_serverSock.open(m_interface))
            {
                if( NSAPI_ERROR_OK == m_serverSock.bind( server_address ))
                {
                    if( NSAPI_ERROR_OK == m_serverSock.listen(5))
                    {
                        retCode = true;
                    }
                }
            }
        }
    }
    return (retCode);
}

void NetServer::Server()
{
    int jobID = 0;
    TCPSocket *client_sock;
    SocketAddress clientAddress;
    nsapi_error_t error;
    int bytes;
    int size = 1550;
    Job *job = nullptr;
    int connections = 0;

    m_running = true;

    while( true )
    {
        util::PrintHeapInfo();
        printf("EC Server listening for connections....\n");
        printf("====================================\n");

        util::SetGreenLed(OFF);
        client_sock = m_serverSock.accept( &error );
        if( NSAPI_ERROR_OK == error )
        {
            // client connected
            client_sock->getpeername(&clientAddress);
            printf("\n\n[%d]Accepted %s:%d\n", connections++,clientAddress.get_ip_address(), clientAddress.get_port()); 
            do
            {
                util::SetGreenLed(ON);
                uint8_t *printData = (uint8_t *)malloc(size);
                memset(printData, 0x00, size);

                bytes = client_sock->recv(static_cast<void *>(printData), size);
                util::SetGreenLed(ON);
                if( bytes <= 0 )
                {
                    free(printData);
                    if( bytes < 0 )
                    {
                        printf("\nSocket Read Error: %s\n", util::NSAPISErrorMap[bytes].c_str());
                    }
                }
                else
                {
                    job = new Job( client_sock, (char *)printData, bytes, jobID++ );
                    if( job )
                    {
                        do
                        {
                            m_watchdog.kick();
                        }while( !m_jobQueue.try_put_for(10ms,job) );
                        //printf("NetServer queue job id %d\n", jobID++);
                    }
                }
            }while (bytes > 0);
            util::SetGreenLed(OFF);
            printf("NetServer() read everything\n\n");
            client_sock->close();
        }
    }
}

The Job class just encapsulates an element of a Queue. Another thread pulls jobs off the Queue and does some processing. The thread that pulls from the Queue also frees that malloc’ed memory. This way I don’t tightly couple the network server with the processing of the incoming requests. This NetServer class on it’s own can run for days without issue.

The HTTP server class is the same except that after accept() the processing is different and doesn’t use a Queue. It just parses out the HTTP headers and handles each request as needed.

I would recommend you to remove or trace possible HardFault points, I can see some possible failure points such as:

Here code is not checking if malloc has correctly allocated memory.

I don’t know the definition of NSAPISErrorMap, but if it is a string array, bytes are always negative and you are accessing to a negative index, i.e., util::NSAPISErrorMap[-3012]

Anyway, you can use mbed-trace to trace your error by adding much more logging or use PyOCD + GDB to debug.

1 Like

Yes, you are correct I should be checking the result of malloc. Good call. I guess it’s possible that when both servers are handling requests malloc might start to fail. My bad for not checking the return.

The util::NSAPISErrorMap[bytes].c_str() isn’t a problem. It’s just a map as shown below.
bytes represents a possible error code. The std::map just allows me to easily print text associated with an error code.

std::map<int, std::string> NSAPISErrorMap =
    {
        {NSAPI_ERROR_OK,"NSAPI_ERROR_OK"},
        {NSAPI_ERROR_WOULD_BLOCK,"NSAPI_ERROR_WOULD_BLOCK"},
        {NSAPI_ERROR_UNSUPPORTED,"NSAPI_ERROR_UNSUPPORTED"},
        {NSAPI_ERROR_PARAMETER,"NSAPI_ERROR_PARAMETER"},
        {NSAPI_ERROR_NO_CONNECTION,"NSAPI_ERROR_NO_CONNECTION"},
        {NSAPI_ERROR_NO_SOCKET,"NSAPI_ERROR_NO_SOCKET"},
        {NSAPI_ERROR_NO_ADDRESS,"NSAPI_ERROR_NO_ADDRESS"},
        {NSAPI_ERROR_NO_MEMORY,"NSAPI_ERROR_NO_MEMORY"},
        {NSAPI_ERROR_NO_SSID,"NSAPI_ERROR_NO_SSID"},
        {NSAPI_ERROR_DNS_FAILURE,"NSAPI_ERROR_DNS_FAILURE"},
        {NSAPI_ERROR_DHCP_FAILURE,"NSAPI_ERROR_DHCP_FAILURE"},
        {NSAPI_ERROR_AUTH_FAILURE,"NSAPI_ERROR_AUTH_FAILURE"},
        {NSAPI_ERROR_DEVICE_ERROR,"NSAPI_ERROR_DEVICE_ERROR"},
        {NSAPI_ERROR_IN_PROGRESS,"NSAPI_ERROR_IN_PROGRESS"},
        {NSAPI_ERROR_ALREADY,"NSAPI_ERROR_ALREADY"},
        {NSAPI_ERROR_IS_CONNECTED,"NSAPI_ERROR_IS_CONNECTED"},
        {NSAPI_ERROR_CONNECTION_LOST,"NSAPI_ERROR_CONNECTION_LOST"},
        {NSAPI_ERROR_CONNECTION_TIMEOUT,"NSAPI_ERROR_CONNECTION_TIMEOUT"},
        {NSAPI_ERROR_ADDRESS_IN_USE,"NSAPI_ERROR_ADDRESS_IN_USE"},
        {NSAPI_ERROR_TIMEOUT,"NSAPI_ERROR_TIMEOUT"},
        {NSAPI_ERROR_BUSY,"NSAPI_ERROR_BUSY"}
    };

But yes, I will instrument the code with mbed-trace and use GDB. That’s the only way I will find the issue.

Tracing your available heap memory can be a good point too, you can also use mbed API to do that :slightly_smiling_face:

1 Like

@davidAlonso ,

I trace heap usage. It doesn’t grow once it reaches it’s maximum usage.

@davidAlonso @JojoS ,

I’ve got the situation sorted out. Thank you both for the help.
I reduced memory usage and as a “side effect” I no longer have the issue.

So I do believe the unchecked malloc() was at one point failing… though I don’t have actual evidence of that. However, both servers and now able to work concurrently without issue.

Cheers!