How much can arm hardware, cryptography accelerators and TRNGs speed up a TLS handshake

I’ve designed a system on a Microchip PIC32MX470F512H to query a payment gateway, aws type, using Mbed tls. I spent about a month or so playing with configurations but I can’t decrease the transaction time to less than 20 seconds. Curl --verbose states that it uses ECDHE-RSA-AES128-GCM-SHA256. I’ve even attached the gsm modem to my linux pc and run my entire program from it instead to prove that it is my hardware that slowed things down.

I currently use an 8 bit PIC18F47K40 with a SIM800C flashed with tls1.2 firmware instead of bluetooth and the complete transaction, including a 2 second poll period takes a maximum of 5 seconds. The actual tls1.2 exchange is completed in less than 2 seconds.

I’m now designing a new pcb and am looking at a PIC32CM2532LS00064 which has:
“Arm® Cortex®-M23” core and
Arm TrustZone for flexible hardware isolation of memories and peripherals
AES-256/192/128, SHA-256, and GCM cryptography accelerators
Device Identity Composition Engine (DICE) security standard support
One True Random Generator (TRNG)

I had my PIC32MX470 running flat out at 120MHz but there was very little time difference when running at 64MHz. I made an entropy routine using unix time to seed a getentropy function but that’s quick so maybe the TRNG won’t have too much effect apart from making the code simpler.
99% of my slowdown is in mbedtls_ssl_handshake( &ssl ) ) != 0 ) after a long period too complex for a debugger to follow the reads and writes are at normal execution speed.

How much can I expect the hardware features to speed up mbedtls_ssl_handshake?
Are there any other ways to accelerate mbedtls_ssl_handshake

For H/W crypto acceleration, you could choose the platform from mbed-os/connectivity/drivers/mbedtls at master · ARMmbed/mbed-os · GitHub.
MbedTLS H/W crypto could really improve the performance encrypt/decrypt sevreal times at least. In my experience @Pelion, the whole TLS handshake maybe only improve 50% caused by some response time blocking in Cloud.

In an effort to enhance the performance of the mbedtls_ssl_handshake process on our Microchip PIC32CM2532LS00064-based system, we are leveraging the hardware features provided by this advanced microcontroller. With dedicated cryptography accelerators for AES and SHA, we’ve configured mbedtls to make optimal use of these hardware capabilities, significantly speeding up cryptographic operations during the TLS handshake. The inclusion of a True Random Number Generator (TRNG) is ensuring a high-quality seed for our entropy pool, contributing to the efficiency and security of our cryptographic processes. Additionally, we are mindful of our memory access patterns, aligning data structures to maximize cache usage.

Through careful selection of cipher suites and TLS versions, we aim to strike a balance between security and efficiency. By continually optimizing our firmware, exploring parallel processing where applicable, and staying updated with the latest mbedtls releases, we are committed to achieving a swift and secure TLS handshake process on our new PIC32CM2532LS00064-based platform. As suggested For H/W crypto acceleration, you could choose the platform from mbed-os/connectivity/drivers/mbedtls at master · ARMmbed/mbed-os · GitHublink

Thank you