Allocation requirements during handshake

I’m trying to use mbedTLS/lwip with NO_SYS=1 (no operating system). So a single thread.

Now I’m experimenting on Windows with success: I can compile and run the project and it works. However my target platform will be a LPC1769 with 32+32kB RAM. It seem too low, anyway I want to try.

I need only one MQTT TLs connection (ti Amazon AWS or Google Cloud or Microsoft Azure or similar servers). I’m using MBEDTLS_MEMORY_BUFFER_ALLOC_C to avoid fragmentation, so I declared a static buffer and called mbedtls_memory_buffer_alloc_init().

I defined MBEDTLS_SSL_IN_CONTENT_LEN as 16384 and MBEDTLS_SSL_OUT_CONTENT_LEN as 2048 in config.h.

I thought a static buffer around 16384+2048=18432kB was sufficient, but this isn’t true in my tests under Windows. I need to declare the static buffer for mbedtls_calloc() around 30kB, otherwise I will have allocation failure.

I noticed during connection initialization mbedTLS allocates 16717 byte (for in buffer), 2381 (for output buffer), 208 (for mbedtls_ssl_transform struct), 128 (for mbedtls_ssl_session) and 1968 (for mbedtls_ssl_handshake_params). Now we are at 21402 bytes.

During handshake, ssl_parse_certificate_chain() starts allocating many additional bytes. So I need to define a static buffer for mbedtls_calloc() at least 38000 bytes to avoid out of memory. It seems I need additional 38000-21402=16598 bytes!!

Is it normal? What are the exact alloc requirements of ssl_parse_certificate_chain()? How to dimension the static buffer? Is it possible to reduce it in some way?

Hi @pozzugno
Thank you for your question

The bottleneck in the TLS handshake is usually the certificate message, which is dependent on how many certificates in the message, and what is the signature algorithm used for signing the certificates, as RSA keys are much bigger than the ECDSA keys, with same security strength.
Since you are planning to connect only to one MQTT server, it is safe to assume you can reduce the size of MBEDTLS_SSL_IN_CONTENT_LEN as well. Have you tried reducing it to 4096 bytes(depending on the size of the certificate message)?

I would suggest you disable RSA in your configuration, assuming the MQTT server you wish to connect to supports ECDSA based ciphersuites.

Please update the new memory limitations, once you modify with my suggestions.
Regards,
Mbed TLS Team member
Ron

Since you are planning to connect only to one MQTT server, it is safe to assume you can reduce the size of MBEDTLS_SSL_IN_CONTENT_LEN as well. Have you tried reducing it to 4096 bytes(depending on the size of the certificate message)?

This is a good suggestione. 4096 is too low, I set it to 8192.

I would suggest you disable RSA in your configuration, assuming the MQTT server you wish to connect to supports ECDSA based ciphersuites.

Even if AWS says it supports ECDSA, when I disable RSA cipersuites, leaving only ECDSA, the AWS server replies with a fatal alert (Handshake Failure). Enabling RSA works well. I will try to check this with AWS directly (maybe you or someone else in the forum know something?).

Now come back to my small RAM (two 32kB separated banks). I increased complexity adding CA root for server authentication and client certificate (with its private key) to let the server authenticate the client (it seems it is mandatory in AWS IoT server).

So I call altcp_tls_create_config_client_2wayauth (lwip function) that parse and import certificates and keys. Only this function allocates around 8kB!!
I declared three static const unsigned char array for CA root, client certificate and client private key. They are in PEM format. I think mbedTLS parse and import them in suitable data structures. Is it possible to create those const data structures at compile time so I can put them in ROM, saving RAM?

Connection setup allocates around 10kB (8kB for input buffer and 2kB for output buffer). This is ok.

Lastly, during handshake, mbedTLS allocates other 32kB!!! After establishing MQTT connection (on Windows compilation), the static allocator have reached around 50kB of allocation. This is too much for my poor 32kB+32kB RAM MCU.

Of course this size of 50kB doesn’t take care of fragmentation that could appear in the static buffer. This 50kB is calculated as the greater pointer address returned by mbedtls_calloc, respect the origin of static buffer.

Even if I will be able to switch from RSA to ECDSA, do you think I will have big improvements on RAM usage?

Since altcp_tls_create_config_client_2wayauth as a lwip function, there is not much I can help on that matter. However, do you really need this function? Can’t you just use the Mbed TLS functions:
mbedtls_ssl_conf_ca_chain() and mbedtls_ssl_conf_own_cert() ? What’s the added value of using the LwIP function?

Lastly, during handshake, mbedTLS allocates other 32kB !!!

Do you happen to know what modules utilize the heap?

As you can see from this PR, we are working on reducing RAM usage, specifically in the x509 module.

If you use ECDSA instead of RSA, The certificate message would be smaller ( perhaps even 4096 would be enough), and probably the RAM usage during handshake would be smaller as well, as certificates signed with ECDSA instead of RSA are significantly smaller, with same security strength.

Are you using your certificates in PEM or DER format?

Since altcp_tls_create_config_client_2wayauth as a lwip function, there is not much I can help on that matter. However, do you really need this function? Can’t you just use the Mbed TLS functions: mbedtls_ssl_conf_ca_chain() and mbedtls_ssl_conf_own_cert() ? What’s the added value of using the LwIP function?

The LwIP function altcp_tls_create_config_client_2wayauth really calls mbedTLS functions mbedtls_ssl_conf_ca_chain() and mbedtls_ssl_conf_own_cert(). I don’t think there’s much improvement here if I call mbedTLS functions directly.
LwIP includes an adaptation layer between its TCP/IP stack and mbedTLS library that is named altcp_tls_mbedtls and I’m using it.

Do you happen to know what modules utilize the heap?

Parsing certificates and initial setup, 6kB.
Connection 8+2kB (input and output buffer).
Handshake: mbed_ssl_parse_certificate (10kB); ssl_parse_server_key_exchange (6kB); ssl_write_client_key_exchange (13kB); a few other kilobytes…

I’m using static buffer for calloc. The measures of allocation requirements above are made reading the greatest pointer returned by calloc respect the last greatest pointer returned.

As you can see from this PR, we are working on reducing RAM usage, specifically in the x509 module.

Yes, I hope this can be improved. I was thinking at another solution. Why not creating a tool that “converts” certificate to a binary blob that can be linked in the program, so going in Flash and not RAM? This binary blob should be the same result of run-time parsing of the certificate. In this case, at run-time we “attach” the binary blob as the certificate, we don’t need the original certificate.

If you use ECDSA instead of RSA, The certificate message would be smaller ( perhaps even 4096 would be enough), and probably the RAM usage during handshake would be smaller as well, as certificates signed with ECDSA instead of RSA are significantly smaller, with same security strength.

I tried to enable only MBEDTLS_TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, but the server replies with a fatal error immediately after the Client Hello message. It seems the server doesn’t support this ciphersuite, however here Amazon says it supports this (I tried to ask in their forum, but without an answer until now).
So I enable MBEDTLS_TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 too.

At the moment I’m using AWS-generated certificates for the client, and they use RSA. I know it’s possible to use personal certificate, however I didn’t checked yet.

The CA root certificates for AWS server authentication are here. As you can see, Amazon CA Root are RSA or ECC based certificate. I’m using RSA. I tried to use ECC-based certificates, but the handshake fails in mbedtls_x509_crt_verify_restartable. I don’t know whay, it seems flags aren’t zero at the end of the function, so the error code MBEDTLS_ERR_X509_CERT_VERIFY_FAILED is raised.

I compared RSA CA1 and ECC CA3 with MBEDTLS_SSL_VERIFY_NONE and the difference in the heap allocation after successful connection is aroung 1kB.

Are you using your certificates in PEM or DER format?

I was using PER encoded certificates. After your question, I tried to convert PEM to DER certificates. I didn’t notice big differences in RAM allocation during parsing in TLS connection setup. I observed only around 30 bytes difference.

Thank you for your information.

May I know what Mbed TLS version you are using?
I believe that this merged PR would help you in your quest for RAM reduction, as long as you have MBEDTLS_SSL_KEEP_PEER_CERTIFICATE undefined.

I was thinking at another solution. Why not creating a tool that “converts” certificate to a binary blob that can be linked in the program, so going in Flash and not RAM?

This could work for the “Parsing certificates and initial setup”, and you can probably try this in your code( having the contexts stored in binary format in your flash and send an address to them), however for parsing the server certificates and key exchanges will not do, as it should be during the negotiation, and you should not assume in advance what the server certificates are.

Regards,
Ron

May I know what Mbed TLS version you are using?

I’m using mbedTLS 2.16.0 as downloaded from your download web page.

I believe that this merged PR would help you in your quest for RAM reduction, as long as you have MBEDTLS_SSL_KEEP_PEER_CERTIFICATE undefined.

I downloaded the source code from github and undefined MBEDTLS_SSL_KEEP_PEER_CERTIFICATE macro in config.h. Now RAM requirements changes from around 50kB to 36kB. It doesn’t fit in 32kB RAM bank of my MCU, but it’s a big improvement.

I was thinking at another solution. Why not creating a tool that “converts” certificate to a binary blob that can be linked in the program, so going in Flash and not RAM?

This could work for the “Parsing certificates and initial setup”, and you can probably try this in your code( having the contexts stored in binary format in your flash and send an address to them), however for parsing the server certificates and key exchanges will not do, as it should be during the negotiation, and you should not assume in advance what the server certificates are.

Maybe with this “trick” I could save 4-5kB and stay below my upper 32kB limit. I will try.

@pozzugno thank you for your input. 14KB RAM reduction is a lot!

Another API that will assisst you for RAM reduction on your own certificates, was introduced in this merged PR

It introduces the API mbedtls_x509_parse_der_nocopy() which allows to parse the certificates without having a copy of the certificate locally. This will require your certificate to be in DER format though.

This API is used by my previous reference, when MBEDTLS_SSL_KEEP_PEER_CERTIFICATE ( which already saved you 14 KB), for parsing the peer certificate. Now I am suggesting this API to be used on your local ca certificate list and own certificate (setup phase).

I hope this will decrease even further your RAM usage, below your 32 KB limit.
Regards,
Ron

I tested mbedtls_x509_parse_der_nocopy() instead of the simpler mbedtls_x509_crt_parse() and, after converting the certificate in DER format, it works. I can connect to the server.

Unfortunately the RAM requirements changes from 36kB to 34kB only, it’s a little improvement, but it’s better than nothing.

Is there some other improvements on mbedtls_pk_parse_key()?

Hi @pozzugno

If mbedtls_x509_parse_der_nocopy() in your initialization and setup reduces the RAM from 6KB to 4KB, it probably means that the certificate is 2 KB in size, and the other 4 KB are unrelated ( for example the public key)
Have you tried reducing the value of MBEDTLS_SSL_IN_CONTENT_LEN to 6 KB?

In addition, depending on the RSA key size that you are using, you can modify the value of MBEDTLS_MPI_MAX_SIZE. The default is 1024 bytes, which fits 8192 bit RSA key size. However, if the certificates are signed with at most 4096 bit key, you can modify this value to 512. ( and even 128 if it is an unlikely 2048 bit key size). This should reduce the memory footprint some more.
This article can give you some more information.

Is there some other improvements on mbedtls_pk_parse_key() ?

Unfortunately, not at the moment.

Regards,
Ron

After many other tests I found the following.

First of all AWS IoT gateway can be contacted through two different endpoint: Verisign and ATS. The first is deprecated, because its certificate is issued by a CA that has deprecated by most modern browser in October 2018. The second is the suggested endpoint to use.

However ATS endpoint doesn’t support ECDHE_ECDSA ciphersuites. Indeed, if I disable ECDHE_RSA ciphersuites and keep only ECDHE_ECDSA, the server immediately answers with a Handshake Failure. If I enable ECDHE_RSA ciphersuite, it works.

Regarding client certificate (the Thing in IoT), the one created in AWS Management Console is RSA-based. It’s very big for many resource constrained devices. Luckily it’s possible to use AWS CLI (Command Line Interface tool) to create and attach an EC-based certificate to a Thing.
With EC-based client certificate I’m able to connect to the server with my small RAM.

However I don’t understand two things.

First of all, even with EC-based client certificate, I can’t disable RSA algorithm completely from mbedTLS configuration. If I do, the parsing of client certificate gives an error.
What? Isn’t the certificate EC-based? Yes, however RSA is needed. Maybe because the digital signature algorithm of the certificate is always RSA, only the public key is ECC (I see this by opening the .crt file in Windows).
Is it possible to create a completely EC-based certificate without RSA, so I can disable it completely? I don’t know, this should be a question for Amazon.

Another another fun thing.

mbedTLS needs 31656 bytes during handshake with ATS endpoint (RSA), but needs 32264 bytes during handshake with Verisign (ECDSA).

I understood ECDSA was much more lighter in term of RAM requirements.

Thank you for your input.

I understood ECDSA was much more lighter in term of RAM requirements.

Yes, ECDSA should be lighter in term of RAM requirement, as the key size is smaller. Do you happen to have additional information as for the RAM usage bottleneck?
Regards