Mbedtls_ssl_handshake() segfault after ~1000 iterations

meanwhile I could confirm that the segfault is triggered by the way my code stores the sockets (and the mbedtls contexts along with it).

“same key” is also ruled out.

I’ll update when I know the actual reason.

Hello Ron,

here is the mbedtls_ssl_context tlsCtx data passed in mbedtls_ssl_handshake(&tlsCtx). Its from the debugger and should be the call that triggers the segfault:

{
 conf = 0x7fb6a0095120,
 state = 2,
 renego_status = 0,
 renego_records_seen = 0,
 major_ver = 3,
 minor_ver = 1,
 badmac_seen = 0,
 f_send = 0x7fb6a9fa2870 ,
 f_recv = 0x0,
 f_recv_timeout = 0x7fb6a9fa2770 ,
 p_bio = 0x7fb6a0003400,
 session_in = 0x0,
 session_out = 0x0,
 session = 0x0,
 session_negotiate = 0x7fb6a000c830,
 handshake = 0x7fb6a000c8d0,
 transform_in = 0x0,
 transform_out = 0x0,
 transform = 0x0,
 transform_negotiate = 0x7fb6a000c6f0,
 p_timer = 0x0,
 f_set_timer = 0x0,
 f_get_timer = 0x0,
 in_buf = 0x7fb6a0004430 "",
 in_ctr = 0x7fb6a0004430 "",
 in_hdr = 0x7fb6a0004438 "\026\003\003",
 in_len = 0x7fb6a000443b "",
 in_iv = 0x7fb6a000443d "",
 in_msg = 0x7fb6a000443d "",
 in_offt = 0x0,
 in_msgtype = 0,
 in_msglen = 0,
 in_left = 0,
 in_epoch = 0,
 next_record_offset = 0,
 in_window_top = 0,
 in_window = 0,
 in_hslen = 0,
 nb_zero = 0,
 keep_current_message = 0,
 disable_datagram_packing = 0 '\000',
 out_buf = 0x7fb6a0008590 "",
 out_ctr = 0x7fb6a0008590 "",
 out_hdr = 0x7fb6a0008598 "\026\003\001\001u\001",
 out_len = 0x7fb6a000859b "\001u\001",
 out_iv = 0x7fb6a000859d "\001",
 out_msg = 0x7fb6a000859d "\001",
 out_msgtype = 22,
 out_msglen = 373,
 out_left = 0,
 cur_out_ctr = "\000\000\000\000\000\000\000\001",
 mtu = 0,
 split_done = 0 '\000',
 client_auth = 0,
 hostname = 0x0,
 alpn_chosen = 0x0,
 cli_id = 0x0,
 cli_id_len = 0,
 secure_renegotiation = 0,
 verify_data_len = 0,
 own_verify_data = '\000' ,
 peer_verify_data = '\000' 
}

can you spot anything odd?

mbedtls_ssl_read_record () and mbedtls_ssl_fetch_input () are at the bottom of the backtrace.

The mbedtls_net_context netCtx.fd has a value of 1037 when I see it last (passed to mbedtls_ssl_set_bio() right before the call to mbedtls_ssl_handshake(&tlsCtx).

Unfortunately, I can’t see at the moment something strange. Perhaps there is some data corruption, that overrides some of the pointers.
As for the fd value of 1037. Is this reasonable in our system? How many open file descriptor can be open in one single moment in your system?
Regards

I think I am able to open much more:

$ ulimit -n
1048576

at each a single moment even the 1037 should not be in use, because I am running the test with two threads only. I think it just goes up that much because the [linux] kernel does not immediately re-issue the numbers.

(I see FD number re-using when I run 20000 iterations with a single thread - then there is lots of time between the uses)