Handshake duration - how to speedup?


I’m using mbedtls on a MCU Cortex-M4 (stm32f417 - 168 MHz). For the moment the handshake step take 3 secondes.

The STM32F417 integrates a crypto/hash processor providing hardware acceleration for AES 128, 192, 256, Triple DES, and hash (MD5, SHA-1)

In my situation, client and server perform authentication, each use Elliptic Curve (prime256v1) and AES256 for the private key.

How can we improve this step ? Is it possible to complete this step in less than a second ?

How do I make sure I’m using the MCU’s hardware acceleration?
Is it possible to add a coprocessor ?

Best regards,