Hi,
Since my last post, we’ve installed a HughesNet satellite dish at our office so that I can reproduce the problem and I’ve enabled detailed logging of the TLS handshake process. I’ve compared the output for both a failed handshake over the satellite connection and a successful handshake over a landline connection, and have found that the handshakes are in lockstep right up to step 12 (MBEDTLS_SSL_SERVER_CHANGE_CIPHER_SPEC). As previously suspected, that step hangs up when it never receives the expected response from the remote server.
In the following log fragments, the lines that start with a timestamp are my own application-layer diagnostic messages and lines starting with ‘//’ are my own comments. Here is a fragment of the log output from a successfull connection:
// Step 11 beginning...
06/04 07:51:41.888 I SysLog [DIAG] IOT TLS handshake step (2830187) Data: 11/0
06/04 07:51:41.888 I LogNVM Write: ChipSelect: 01, VAddr: 257C, PAddr: 257C, Count:6
ssl_cli.c:3510: client state: 11
ssl_tls.c:2760: => flush output
ssl_tls.c:2772: <= flush output
ssl_tls.c:6419: => write finished
ssl_tls.c:6293: => calc finished tls sha384
ssl_tls.c:6317: dumping 'calc finished result' (12 bytes)
ssl_tls.c:6317: 0000: 51 70 e3 62 d9 72 48 ec 2e f5 c2 00 Qp.b.rH.....
ssl_tls.c:6323: <= calc finished
ssl_tls.c:6464: switching to new transform spec for outbound data
06/04 07:51:42.406 I SysLog [DIAG] IOT cipher (2830239) Data: 16/1
06/04 07:51:42.406 I LogNVM Write: ChipSelect: 01, VAddr: 2581, PAddr: 2581, Count:7
ssl_tls.c:3189: => write handshake message
ssl_tls.c:3348: => write record
ssl_tls.c:1447: => encrypt buf
ssl_tls.c:1620: before encrypt: msglen = 24, including 0 bytes of padding
ssl_tls.c:1783: <= encrypt buf
ssl_tls.c:3428: output record: msgtype = 22, version = [3:3], msglen = 40
ssl_tls.c:2760: => flush output
ssl_tls.c:2779: message length: 45, out_left: 45
ssl_tls.c:2784: ssl->f_send() returned 45 (-0xffffffd3)
ssl_tls.c:2812: <= flush output
ssl_tls.c:3481: <= write record
ssl_tls.c:3325: <= write handshake message
ssl_tls.c:6534: <= write finished
06/04 07:51:42.414 I SysLog [DIAG] IOT TLS handshake step (2830240) Data: 12/0
06/04 07:51:42.415 I LogNVM Write: ChipSelect: 01, VAddr: 2587, PAddr: 2587, Count:6
ssl_cli.c:3510: client state: 12
ssl_tls.c:2760: => flush output
ssl_tls.c:2772: <= flush output
ssl_tls.c:5907: => parse change cipher spec
ssl_tls.c:4316: => read record
ssl_tls.c:2538: => fetch input
ssl_tls.c:2701: in_left: 0, nb_want: 5
ssl_tls.c:2726: in_left: 0, nb_want: 5
ssl_tls.c:2727: ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
ssl_tls.c:2747: <= fetch input
ssl_tls.c:4061: input record: msgtype = 20, version = [3:3], msglen = 1
ssl_tls.c:2538: => fetch input
ssl_tls.c:2701: in_left: 5, nb_want: 6
ssl_tls.c:2726: in_left: 5, nb_want: 6
ssl_tls.c:2727: ssl->f_recv(_timeout)() returned 1 (-0xffffffff)
ssl_tls.c:2747: <= fetch input
ssl_tls.c:4390: <= read record
ssl_tls.c:5940: switching to new transform spec for inbound data
06/04 07:51:42.922 I SysLog [DIAG] IOT cipher (2830291) Data: 16/2
06/04 07:51:42.922 I LogNVM Write: ChipSelect: 01, VAddr: 258C, PAddr: 258C, Count:7
ssl_tls.c:5987: <= parse change cipher spec
06/04 07:51:42.928 I SysLog [DIAG] IOT TLS handshake step (2830291) Data: 13/0
06/04 07:51:42.928 I LogNVM Write: ChipSelect: 01, VAddr: 2592, PAddr: 2592, Count:6
ssl_cli.c:3510: client state: 13
ssl_tls.c:2760: => flush output
ssl_tls.c:2772: <= flush output
ssl_tls.c:6551: => parse finished
ssl_tls.c:6293: => calc finished tls sha384
ssl_tls.c:6317: dumping 'calc finished result' (12 bytes)
ssl_tls.c:6317: 0000: 8f f9 3e 9a 82 cb 1a a7 86 ff af a2 ..>.........
ssl_tls.c:6323: <= calc finished
ssl_tls.c:4316: => read record
ssl_tls.c:2538: => fetch input
ssl_tls.c:2701: in_left: 0, nb_want: 5
ssl_tls.c:2726: in_left: 0, nb_want: 5
ssl_tls.c:2727: ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
ssl_tls.c:2747: <= fetch input
ssl_tls.c:4061: input record: msgtype = 22, version = [3:3], msglen = 40
ssl_tls.c:2538: => fetch input
ssl_tls.c:2701: in_left: 5, nb_want: 45
ssl_tls.c:2726: in_left: 5, nb_want: 45
ssl_tls.c:2727: ssl->f_recv(_timeout)() returned 40 (-0xffffffd8)
ssl_tls.c:2747: <= fetch input
ssl_tls.c:1796: => decrypt buf
ssl_tls.c:2378: <= decrypt buf
ssl_tls.c:3631: handshake message: msglen = 16, type = 20, hslen = 16
ssl_tls.c:4390: <= read record
ssl_tls.c:6619: <= parse finished
// Step 13 completed, handshake continues to completion...
Here is a log fragment from a failed handshake over the satellite connection:
// Step 11 beginning... Up to this point the output exactly matches a successful connection
// except where expected due to randomized data.
06/04 07:54:18.897 I SysLog [DIAG] IOT TLS handshake step (2845888) Data: 11/0
06/04 07:54:18.897 I LogNVM Write: ChipSelect: 01, VAddr: 2631, PAddr: 2631, Count:6
ssl_cli.c:3510: client state: 11
ssl_tls.c:2760: => flush output
ssl_tls.c:2772: <= flush output
ssl_tls.c:6419: => write finished
ssl_tls.c:6293: => calc finished tls sha384
ssl_tls.c:6317: dumping 'calc finished result' (12 bytes)
ssl_tls.c:6317: 0000: 4c c3 3c 85 22 c4 ca 7c 89 4c ab 97 L.<."..|.L..
ssl_tls.c:6323: <= calc finished
ssl_tls.c:6464: switching to new transform spec for outbound data
06/04 07:54:19.412 I SysLog [DIAG] IOT cipher (2845940) Data: 16/1
06/04 07:54:19.412 I LogNVM Write: ChipSelect: 01, VAddr: 2636, PAddr: 2636, Count:7
ssl_tls.c:3189: => write handshake message
ssl_tls.c:3348: => write record
ssl_tls.c:1447: => encrypt buf
ssl_tls.c:1620: before encrypt: msglen = 24, including 0 bytes of padding
ssl_tls.c:1783: <= encrypt buf
ssl_tls.c:3428: output record: msgtype = 22, version = [3:3], msglen = 40
ssl_tls.c:2760: => flush output
ssl_tls.c:2779: message length: 45, out_left: 45
// The transmission in step 11 appears successful:
ssl_tls.c:2784: ssl->f_send() returned 45 (-0xffffffd3)
ssl_tls.c:2812: <= flush output
ssl_tls.c:3481: <= write record
ssl_tls.c:3325: <= write handshake message
ssl_tls.c:6534: <= write finished
06/04 07:54:19.420 I SysLog [DIAG] IOT TLS handshake step (2845941) Data: 12/0
06/04 07:54:19.420 I LogNVM Write: ChipSelect: 01, VAddr: 263C, PAddr: 263C, Count:6
ssl_cli.c:3510: client state: 12
ssl_tls.c:2760: => flush output
ssl_tls.c:2772: <= flush output
ssl_tls.c:5907: => parse change cipher spec
ssl_tls.c:4316: => read record
ssl_tls.c:2538: => fetch input
ssl_tls.c:2701: in_left: 0, nb_want: 5
ssl_tls.c:2726: in_left: 0, nb_want: 5
// At this point, mbedtls_ssl_fetch_input() returns WANT_READ and the handshake step exits
// and retries. We never see any data from the server again and the connection attempt
// eventually times out.
ssl_tls.c:8172: <= handshake
ssl_tls.c:8122: => handshake
ssl_cli.c:3510: client state: 12
ssl_tls.c:2760: => flush output
ssl_tls.c:2772: <= flush output
ssl_tls.c:5907: => parse change cipher spec
ssl_tls.c:4316: => read record
ssl_tls.c:2538: => fetch input
ssl_tls.c:2701: in_left: 0, nb_want: 5
ssl_tls.c:2726: in_left: 0, nb_want: 5
ssl_tls.c:8172: <= handshake
// The previous 10 lines between '=> handshake' and '<= handshake' repeat every 5 seconds (the
// defined timeout for a read) until my code aborts the handshake after a minute)
In both cases, my code is the client and Amazon AWS IOT Core is the server. I see three logical explanations for what I’m seeing:
- The data transmitted in client step 11 is never received at AWS.
- AWS receives the client cipher spec change message in step 11 but never responds.
- AWS sends the data expected by my client in step 12 but it is never received by the client.
I view theory 2 as being very unlikely and assume that the satellite carrier is dropping packets for some reason. Can anyone offer suggestions on what I should be digging into next? How can I determine where the data loss is happening?
Thanks!
Patrick