Hi, I’m currently working on a STM32L552 (clock frequency 110MHz, I use the TRNG of the MCU).
I’m working on a project that needs the generation of a RSA key pair.
During the development I noticed that the time used to generate the key pair is not always the same.
I have measured the execution times for keys of 512, 1024 and 2048 bits (I’ve measured the RSA keys generation 10 times for each key size).
These are the times (for RSA keys generation):
- 512 bit keys takes from 2 to 4 sec
- 1024 bit keys takes from 10 to 50 sec
- 2048 bit keys takes from 48sec to 8min
Why this important difference in times (specially for 2048 bit keys)?
Could be due to avoid side-channel attacks? (but it seems strange such an important difference in time).
Note: these times are calculated in debug mode with -O0 optimization.
You might check how many random number operations are done in each case.
You might want to seed a pseudo-random number generator with the TRNG and then use the PRNG from then on. mbedTLS has a way of doing that where you give the PRNG a function to call for your TRNG. See mbedtls_ctr_drbg_init() and mbedtls_ctr_drbg_seed().
I think the key generation is looking for prime numbers by getting a random number and then checking for primes. That would explain part of the variation in times. Generating prime numbers may also be a variable time operation if they check for and reject long runs of 0s or 1s.
Of course, you might have already done these things. I hope this is useful.
Thanks for the reply.
I confirm that I seed the pseudo-random number generator with the TRNG.
I’ve produced the following table:
[Table 1]
The column “Time” refers to the time (in ms) used to generate the rsa key pair.
The column “Operations” refers to the number of operations executed in order to find the prime numbers P, Q of the rsa key pair. I have obtained this number by the adding a counter before the invocation of
MBEDTLS_MPI_CHK( mbedtls_mpi_fill_random( X, n * ciL, f_rng, p_rng ) ); inside the method mbedtls_mpi_gen_prime of the file bignum.c.
From this data it’s possible to see that the more operations are executed to find the prime numbers, the more is the time used to generate the key pair. The only exceptions are data in the table with “Order” 7 and 4 which are inverted (with a negligible difference).
[Table 2]
For the sake of clarity:
- Data listed in Table 1 were obtained by executing the code in debug mode and with -O0 compile optimization
- Data listed in Table 2 were obtained by executing the code NOT in debug mode and with -O3 compile optimization
I hope this data can be helpful.