Slow ECDHE handshake

We have a range of IoT devices running mbedTLS on STM32F4 processors. We are currently using 2048-bit RSA keys on these devices and performance is tolerable for HTTPS access though we would like initial handshakes to be faster. On board key generation is understandably slow, often taking in excess of five minutes to complete.

In light of trends in the industry and the impending launch of TLS 1.3 we are interested in switching to ECC for our secure communications. I recently performed an experiment and was disappointed to find that the ECDHE key exchange is the only form supported by current browsers. Generating the ephemeral key required by this protocol takes a few seconds to complete on our hardware. This ends up delaying the TLS handshake longer than it takes with RSA. We have session caching enabled so additional sockets are set up quickly but the first handshake is problematic under ECDHE.

Based on past profiling it seems like the bulk of the key generation time is spent on the Montgomery multiplication used for primality testing. Without having a SIMD unit available we’re sort of stuck with the portable C implementation which appears to be reasonably optimized.

Is there any avenue we can pursue to speed up the TLS handshake with ECC? This seems like a major showstopper if RSA key exchange is no longer to be supported in TLS 1.3 and ECDHE is too processor intensive to be of practical use on limited hardware.

Hi @kthibedeau
I think that this post and this question may assisst you.
In addition, since the key generation is the part that consumes most time for you, if you have a HW acceleration in your platform for your AES engine (assuming you are using ctr_drbg as your f_rng function), it might assist you in speeding up your operation.
Regards,
Mbed TLS Team member
Ron