Reusing TCP Sockets may fail if disconnect handshake is not completed

I’m writing an application to read/write data to a Modbus/TCP server (slave). The Program that 'm using to emulate the server is Mod_RSSim (Modbus simulator download | SourceForge.net). I’m testing this on a Nucleo F767zi board.
Now, this emulator (mod_rssim) for some reason does not complete the disconnect handshake when the socket is closed from the board. I know this from inspecting the data traffic with Wireshark. I would not expect this to be a problem for the board, but it is. It looks like the board closes the connection but the socket is not 100% released to mbed. If I try to open it again, it looks like I’m getting a new socket, since after 4 successful connections and disconnections, when calling open(), I get error -3005 (NSAPI_ERROR_NO_SOCKET) and in successive calls, I get -3003 (NSAPI_ERROR_PARAMETER), and it never recovers (keeps returning -3003 forever). Now, if after closing the socket I sleep the thread for 5 seconds or longer, this error does not happen (I can happily open and close the connection as many times as I want). The problem is that I have to access different servers periodically, so I need to close the socket to reuse it for a different connection, and I cannot afford to sit idle for 5 seconds after closing a socket.
I think this should be considered a bug, but since I’m new to mbed maybe I’m missing something. Any ideas?
This is a minimal test code:

#include <mbed.h>
#include <EthernetInterface.h>

TCPSocket testSocket;
EthernetInterface eth;

int main() {

  eth.connect();

  int cycleCounter = 0;
  nsapi_size_or_error_t retCode;

  SocketAddress sockAddr("192.168.5.125", 502);
  
  while(1) {
    //Keep track of how many cycles
    printf("Cycle %d\r\n", cycleCounter++);
    //open the socket
    retCode = testSocket.open(&eth);
    printf("Open returned %d\r\n", retCode);
    
    if (retCode != NSAPI_ERROR_OK ){
      //Socket.Open() failed. No point in tryig to connect
      ThisThread::sleep_for(1s);
      continue;
    }
    //connect to target
    retCode = testSocket.connect(sockAddr);
    printf("Connect returned %d\r\n", retCode);
    //disconnect and close socket
    retCode = testSocket.close();
    printf("Close returned %d\r\n", retCode);

    // ThisThread::sleep_for(5s);//required, otherwise testSocket.open() will fail after 4 loops if disconnect handshake is not completed

  }

}

I’ve seen behavior similar to this in the past on another system that uses the LwIP IP stack. It has a pool of sockets available, and a timeout after which a socket in “shutting down” mode will be reclaimed into the pool. You can change this config option via the tcp-close-timeout config option. So, if you make your mbed_app.json something like:

{
    target_overrides: {
        "*": {
             "lwip.tcp-close-timeout": 10
        }
    }
}

That will reduce the timeout to 10 milliseconds (defaults to one second). So, the sockets should become available quite soon after the other end of the connection disconnects.

It is intended behaviour that a socket is locked for some time after closing, but usually that happens on the server side. There is a socket option SO_REUSEADDR to set this timeout to zero. Can you check the server side for this?

@MultipleMonomials Thanks for the suggestion. I tried it but it somehow had kind of the opposite effect. It fails after 8 cycles, even with the 5 seconds delay. Not sure why it failed after 8 cycles, instead of 4 (so this seems like an improvement), but still not working for me

@JojoS this makes sense, thank you. I’ve found an interesting explanation here:

This post also explains the risks if doing so, but I’ll give it a try. To clarify, I would have control only on the client side, not on the server, but let’s see if this works. I’ll post back here

@JojoS I tried setting NSAPI_REUSEADDR on the client side, but still fails after 4 tries if I don’t use the delay:

#include <mbed.h>
#include <EthernetInterface.h>

TCPSocket testSocket;
EthernetInterface eth;

int main() {

  eth.connect();

  int cycleCounter = 0;
  nsapi_size_or_error_t retCode;

  SocketAddress sockAddr("192.168.5.125", 502);

  const int one = 1;
  
  while(1) {
    //Keep track of how many cycles
    printf("Cycle %d\r\n", cycleCounter++);
    //open the socket
    retCode = testSocket.open(&eth);
    printf("Open returned %d\r\n", retCode);

    retCode = testSocket.setsockopt(NSAPI_SOCKET, NSAPI_REUSEADDR, &one, sizeof(one) );
    printf("setSockOpt returned %d\r\n", retCode);


    if (retCode != NSAPI_ERROR_OK ){
      //Socket.Open() failed. No point in tryig to connect
      ThisThread::sleep_for(1s);
      continue;
    }
    //connect to target
    retCode = testSocket.connect(sockAddr);
    printf("Connect returned %d\r\n", retCode);
    //disconnect and close socket
    retCode = testSocket.close();
    printf("Close returned %d\r\n", retCode);

    ThisThread::sleep_for(5s);

  }

}




yes, it will work only on the server side.

Have you tested Jamies suggestion also? This sounds also reasonable.

Like I mentioned above, @MultipleMonomials 's suggestion has some weird effect. It fails even with the delay, but it fails after 8 cycles, not after 4.

ok, had missed that.
This sounds like the sockets are not freed and eaten up. The default should be to have 10 sockets, but could be increased. In this case only for testing if it is the problem.

The default (at least in mbed 6) seems to be 4 sockets. I already tried increasing the number, and it works. It looks like if I give it enough time (by having a larger pool of sockets), the problem goes away. Another workaround that I found was to actually call delete on the socket and reallocating it with new. This is the only way I could find to recover after getting error -3005. But that is something I’d prefer not to do. If I’m going to be using an object for the whole duration of the application, I strongly prefer to allocate it statically.
I think I understand the idea of the socket staying around in case it later receives the disconnect handshake and can respond to it to complete a clean disconnect. But I also believe that the owner of the socket (particularly on a resource-constrained platform), should be able to decide when it has waited long enough and be able to 100% reclaim and reuse that socket.

It sounds like you’ve delved deep into socket management within mbed 6. Increasing the number of sockets seems to offer a workaround, allowing more time for recovery from the -3005 error. The approach of deleting and reallocating the socket works but isn’t ideal for long-term usage.

Your preference for statically allocating the socket makes sense, especially considering resource constraints. Having control over when to reclaim and reuse sockets seems crucial for efficient resource utilization on platforms with limitations.

It’s a bit like finding the right menu item—you want the option that best suits your needs without unnecessary complexities. Hopefully, further exploration or insights from the community might offer a smoother solution, making socket management as easy as selecting from a kfc breakfast menu.