TINYAES128 Template attack failure for unknown reasons

Hi everyone.

I use the CWLite-32 ARM (STM32F), and my goal is to perform a template attack on TINYAES128 to recover the full secret key (16 bytes). I do not know the key, as encryption using the secret key is done with a given .hex file which used it (that file always uses this key and is otherwise identical to the regular TINYAES128 implementation).
I have read material on template attacks from a few sources, including the deprecated ChipWhisperer wiki, and written my own code to perform the attack on all 16 bytes.

After completing the attack and recovering a best guess for the key, I tried encrypting bytearray([0]*16) first using the recovered key, and then using the secret key (I made sure to reprogram the target before each of these encryptions to use the correct implementation).
I noticed that the 2 ciphertexts obtained are different, so the recovered key is probably incorrect.

I started changing stuff in the code trying to figure out what the problem is with no luck.
Here’s a list of what I tried to do, and what I’ve noticed so far:

  • Pick POIs once based off SAD and then based off SNR
  • Use 1, 2, …, up to 10 POIs for each subkey/byte (instead of 5)
  • Make sure attack traces are captured immediately after the profiling traces
  • Use 5000 to 30000 profiling traces and 100 to 5000 attack traces
  • In the profiling phase I have attempted both capturing 1 trace for each random key-plaintext pair, and capturing 10 traces for each pair and saving the mean trace (to reduce noise)
  • Capturing 3000 to 5000 samples per trace (at different attempts)

After all of these, I still end up getting an incorrect key.
I have noticed a few things that might indicate the source of the problem, but I couldn’t make use of them:
The covariance matrices (created in the profiling phase) are filled with very low-scaled values, usually ranging from 1e-4 to 1e-8.
Example image, covariance matrix for first subkey/byte, hamming-weight 0:

Using SciPy’s multivariate normal, the .logpdf() function works, but trying to use np.log(multivariate_normal.pdf()) throws a division by 0 error. I assume this is because the values are incredibly small, but I do not know if it’s usual. Maximum log-likelihood values are usually in the scale of (negative) millions.
Additionally, each time I capture new traces and run the exact same code, the recovered key changes, yet the best guess of each subkey/key-byte appears to be dominant after enough attack traces, even though acquiring new attack traces yields a different value.

One important note I’d add is that I even tried completely copy-pasting the code from the tutorial in the CW wiki into a loop of 16 iterations to recover each byte, and still got an incorrect key (and the covariance matrices are of the sane scale as previously), which makes me think the problem is not in the code.
Turning off my antivirus, using a different USB port and even a different computer also did not help.

At that point, I’m quite clueless on what could be the root of the problem, so any assistance will be greatly appreciated!

Can you do a sanity check attack with known key? That may give some hints about what’s wrong.

Hey.

So after I’ve made the post I did try to set the unknown key myself, to a known value, and see what I get.
It seems like the result still doesn’t match the key and changes every time I capture new attack traces.
If it means anything, I’ve noticed that usually either no subkey matches the correct key, or just 1 or 2 subkeys match the correct key.

Well, at least that’s consistent :slight_smile: So what exactly do you do? Template attack on the key itself, or template-based DPA on some AES intermediate? Do you target Hamming weight or exact value?

I do the template attack on each subkey separately, but I’m attacking the output of the first round’s SBOX. The templates are assigned based off the hamming-weight of said output, instead of exact value, so I used 10000 traces in the profiling phase (and captured each one 10 times and averaged, to reduce noise), then used 1000 traces in the attack phase.

I would check the process step by step…

  • Does the position of the POIs look reasonable compared to the shape of the traces?
  • If you do alignment, are both sets (for training and for attack) aligned using the same process?
  • If you don’t use alignment, do the traces look well aligned, or could alignment help?

Additional thoughts:

  • Are your covariance matrices with small values computed from the averaged traces? If yes, that might explain the relatively small values, as averaging decreases the variance of traces.
  • If you use one covariance matrix per template, you could try to use pooled covariance matrix instead: Efficient Template Attacks
  • Try targeting the exact value instead of HW.
  • If one or two subkeys are correctly identified, maybe you just need more traces for training. It definitely makes sense to run your analysis on known-key traces first, then you can fine tune the attack and run it on unknown key.

The position of the POIs generally corresponds to where the SBOX calculation is performed so I believe it looks fine. Plotting the SNR does show very high peaks at a few select points, and is very low at the rest, which I think is the expected outcome.

I don’t use alignment, but the traces look well aligned. I will re-check that to be completely sure.

Also, for now, I do compute the covariance matrices from the averaged traces, but I also used non-averaged traces for the profiling phase in earlier attempts and calculated the covariance matrices using those. Both have led to such small values.

Sadly I can’t target the exact value instead of HW, but I will check out the link you’ve added and try capturing much more traces this time, and update afterwards.

Thanks!

1 Like

Cool, I’d be curious to hear an update :slight_smile: Btw, to pinpoint the important from the paper - pooled matrix is just one matrix computed from all traces (of all classes) and shared in template of every class. This approach often gives better results.

Hey - so a quick update:
I didn’t have much time so I didn’t attempt using the pooled covariance matrix instead of the per-group covariance matrix, but, I’ve now captured 250k random profiling traces, this time without averaging each key-plaintext pair using 10 traces. Then, captured 1k attack traces again, and attempted the attack.
I’ve gotten the identical / near identical POIs to the previous case where I used 5k averaged profiling traces, and the SNR peaks of each subkey appear to be in the same sample indexes, and values didn’t change by much (but still decreases a little, which is probably expected).

I performed the attack on the known key that I set after we talked previously, and only 1 recovered byte was actually correct, similarly to before.

But - I noticed something strange. Here’s a plot of 2 traces captured for the same plaintext, one by programming the target to use the .hex file with the constant “unknown” key, and the other by programming the target to use the .hex file which allows setting the key, and I set the key to be the “unknown” key used in the other file. Here are the results:

This does happen for any other choice of plaintext that I’ve attempted, and does not seem to be a result of random noise. I assume that somehow, having the option to set the key affects the target’s power consumption, even though the traces are only captured over the encrypt function.
I can try and standardize the traces to be between [0, 1] but I’m not sure if it could result in loss of data or if maybe the problem even lies elsewhere.

Also, as I built the firmware for both files myself, I know for a fact that the only difference between the 2 is using a constant key vs having a set_key() function with no default key.