Call to mbedtls_ssl_handshake is crashing hard

We have an application where we are using the mbedtls SSL libraries to run a server that communicates via SSL sockets. This code has been working for a while now, but has recently started crashing.

MbedTLS version 2.7.0
Operating system is Ubuntu 16.04 LTS
This is the SERVER side of the SSL connection. We are running an SSL server.

As the title says, the call to mbedtls_ssl_handlshake crashes. First the SSL context is initialized and set up with code like the following:

mbedtls_ssl_config_defaults(&cx->config, MBEDTLS_SSL_IS_SERVER,MBEDTLS_SSL_TRANSPORT_STREAM,0);
mbedtls_ssl_conf_min_version(&cx->config, MBEDTLS_SSL_MAJOR_VERSION_3,MBEDTLS_SSL_MINOR_VERSION_0);
mbedtls_ssl_conf_session_cache(&cx->config,&sd->cache, mbedtls_ssl_cache_get,mbedtls_ssl_cache_set);
I’ve cut out all the error checking for clarity, but this is the gist of the code. The vast majority of the time, the above code works, and the SSL communications are all successful. Every once in a while, the last line of code (mbedtls_ssl_handshake) crashes. The problem is how to troubleshoot this.

As you can see, I’ve set up a callback to read and write to the physical socket (with mbedtls_ssl_set_bio). I’ve put debug/log statements into those callback functions to see what is being sent and received during the handshake. Basically the following happens:

  1. Read 5 bytes from the client.
  2. Read another 192 bytes from the client.
  3. Write 96 bytes to the client.
  4. Write another 1743 bytes to the client.
  5. Application CRASHES at this time, before returning from mbedtls_ssl_handshake.

So the question is, any suggestions on troubleshooting this? I was thinking about running it in Valgrind to look for memory errors, but the problem only occurs on our production server where there are lots of SSL connections being established, and I don’t want Valgrind to slow down our production server. I can’t reproduce it in a test environment.

One interesting sidenote, is that all the crashes that I’ve cataloged appear to be hackers from China/Russia. We wouldn’t have any legitimate users from those areas. But the fact that this is a public server listening on port 443 allows hackers everywhere to try and have a crack at our server. Perhaps the hackers are sending malformed SSL packets? I’ve never seen a crash with a legitimate user…

I believe there maybe a newer version of mbedtls. However, not sure if it is going to help with the crash. Wondering if you can provide any debug details such as crash dump, wireshark packet exchange, etc. when you see this happen. Thanks

In addition to @vijhar01 request, would you be able to enable the logs, with MBEDTLS_DEBUG_C and call for mbedtls_ssl_conf_dbg adn mbedtls_debug_set_threshold ?
It should narrow down the location where the crash happens.

I have added the call the mbedtls_debug_set_threshold to increase the debugging level.

Also, I recompiled the library with MBEDTLS_THREADING_C and MBEDTLS_THREADING_PTHREAD, since this is a multi-threaded application. Some contexts exist once per thread, but others are shared between threads. The contexts that are shared by multiple threads are:

  • mbedtls_ssl_cache_context
  • mbedtls_entropy_context

I don’t know if sharing the above contexts without MBEDTLS_THREADING_PTHREAD enabled will cause a crash. Maybe someone can comment on this.

Thanks, -Eric

If MBEDTLS_THREADING_C is defined, these contexts should be protected with a mutex, assuming you have pthread in your system. If you don’t have pthread in your system, you will need to implement your own mutex, and set it thorough MBEDTLS_THREADING_ALT

I guess my question was, can you share the mbedtls_ssl_cache_context or the
mbedtls_entropy_context in concurrent use with no threading options defined?

As I mentioned in my initial post, our platform is Ubuntu 16.04, so I have pthreads.

In any case, after recompiling the library with both MBEDTLS_THREADING_C and MBEDTLS_THREADING_PTHREADS, the crash seems to have gone away. It’s been running for 5 days now with no problem. Previously, it would typically crash within a day. So it seems the answer to my question is that you CANNOT have multiple threads share a mbedtls_ssl_cache_context or mbedtls_entropy_context, unless you define the threading options.

I will continue to monitor and see if the crash recurs.

Sorry, I missed you using Ubuntu on my last answer.
Yes, if you are using a mutithreaded environment, you must enable the threading options, otherwise, you will get failures and crashes related to concurrency like you encountered