TINYAES128 Template attack failure for unknown reasons

Roei · June 14, 2025, 5:02pm

Hi everyone.

I use the CWLite-32 ARM (STM32F), and my goal is to perform a template attack on TINYAES128 to recover the full secret key (16 bytes). I do not know the key, as encryption using the secret key is done with a given .hex file which used it (that file always uses this key and is otherwise identical to the regular TINYAES128 implementation).
I have read material on template attacks from a few sources, including the deprecated ChipWhisperer wiki, and written my own code to perform the attack on all 16 bytes.

After completing the attack and recovering a best guess for the key, I tried encrypting bytearray([0]*16) first using the recovered key, and then using the secret key (I made sure to reprogram the target before each of these encryptions to use the correct implementation).
I noticed that the 2 ciphertexts obtained are different, so the recovered key is probably incorrect.

I started changing stuff in the code trying to figure out what the problem is with no luck.
Here’s a list of what I tried to do, and what I’ve noticed so far:

Pick POIs once based off SAD and then based off SNR
Use 1, 2, …, up to 10 POIs for each subkey/byte (instead of 5)
Make sure attack traces are captured immediately after the profiling traces
Use 5000 to 30000 profiling traces and 100 to 5000 attack traces
In the profiling phase I have attempted both capturing 1 trace for each random key-plaintext pair, and capturing 10 traces for each pair and saving the mean trace (to reduce noise)
Capturing 3000 to 5000 samples per trace (at different attempts)

After all of these, I still end up getting an incorrect key.
I have noticed a few things that might indicate the source of the problem, but I couldn’t make use of them:
The covariance matrices (created in the profiling phase) are filled with very low-scaled values, usually ranging from 1e-4 to 1e-8.
Example image, covariance matrix for first subkey/byte, hamming-weight 0:

Using SciPy’s multivariate normal, the .logpdf() function works, but trying to use np.log(multivariate_normal.pdf()) throws a division by 0 error. I assume this is because the values are incredibly small, but I do not know if it’s usual. Maximum log-likelihood values are usually in the scale of (negative) millions.
Additionally, each time I capture new traces and run the exact same code, the recovered key changes, yet the best guess of each subkey/key-byte appears to be dominant after enough attack traces, even though acquiring new attack traces yields a different value.

One important note I’d add is that I even tried completely copy-pasting the code from the tutorial in the CW wiki into a loop of 16 iterations to recover each byte, and still got an incorrect key (and the covariance matrices are of the sane scale as previously), which makes me think the problem is not in the code.
Turning off my antivirus, using a different USB port and even a different computer also did not help.

At that point, I’m quite clueless on what could be the root of the problem, so any assistance will be greatly appreciated!

TurangaLeela · June 22, 2025, 8:15am

Can you do a sanity check attack with known key? That may give some hints about what’s wrong.

Roei · June 22, 2025, 9:08am

Hey.

So after I’ve made the post I did try to set the unknown key myself, to a known value, and see what I get.
It seems like the result still doesn’t match the key and changes every time I capture new attack traces.
If it means anything, I’ve noticed that usually either no subkey matches the correct key, or just 1 or 2 subkeys match the correct key.

TurangaLeela · June 22, 2025, 9:27am

Well, at least that’s consistent So what exactly do you do? Template attack on the key itself, or template-based DPA on some AES intermediate? Do you target Hamming weight or exact value?

Roei · June 22, 2025, 10:50am

I do the template attack on each subkey separately, but I’m attacking the output of the first round’s SBOX. The templates are assigned based off the hamming-weight of said output, instead of exact value, so I used 10000 traces in the profiling phase (and captured each one 10 times and averaged, to reduce noise), then used 1000 traces in the attack phase.

TurangaLeela · June 22, 2025, 2:00pm

I would check the process step by step…

Does the position of the POIs look reasonable compared to the shape of the traces?
If you do alignment, are both sets (for training and for attack) aligned using the same process?
If you don’t use alignment, do the traces look well aligned, or could alignment help?

Additional thoughts:

Are your covariance matrices with small values computed from the averaged traces? If yes, that might explain the relatively small values, as averaging decreases the variance of traces.
If you use one covariance matrix per template, you could try to use pooled covariance matrix instead: Efficient Template Attacks
Try targeting the exact value instead of HW.
If one or two subkeys are correctly identified, maybe you just need more traces for training. It definitely makes sense to run your analysis on known-key traces first, then you can fine tune the attack and run it on unknown key.

Roei · June 22, 2025, 5:14pm

The position of the POIs generally corresponds to where the SBOX calculation is performed so I believe it looks fine. Plotting the SNR does show very high peaks at a few select points, and is very low at the rest, which I think is the expected outcome.

I don’t use alignment, but the traces look well aligned. I will re-check that to be completely sure.

Also, for now, I do compute the covariance matrices from the averaged traces, but I also used non-averaged traces for the profiling phase in earlier attempts and calculated the covariance matrices using those. Both have led to such small values.

Sadly I can’t target the exact value instead of HW, but I will check out the link you’ve added and try capturing much more traces this time, and update afterwards.

Thanks!

TurangaLeela · June 22, 2025, 7:24pm

Cool, I’d be curious to hear an update Btw, to pinpoint the important from the paper - pooled matrix is just one matrix computed from all traces (of all classes) and shared in template of every class. This approach often gives better results.

Roei · June 26, 2025, 11:17am

Hey - so a quick update:
I didn’t have much time so I didn’t attempt using the pooled covariance matrix instead of the per-group covariance matrix, but, I’ve now captured 250k random profiling traces, this time without averaging each key-plaintext pair using 10 traces. Then, captured 1k attack traces again, and attempted the attack.
I’ve gotten the identical / near identical POIs to the previous case where I used 5k averaged profiling traces, and the SNR peaks of each subkey appear to be in the same sample indexes, and values didn’t change by much (but still decreases a little, which is probably expected).

I performed the attack on the known key that I set after we talked previously, and only 1 recovered byte was actually correct, similarly to before.

But - I noticed something strange. Here’s a plot of 2 traces captured for the same plaintext, one by programming the target to use the .hex file with the constant “unknown” key, and the other by programming the target to use the .hex file which allows setting the key, and I set the key to be the “unknown” key used in the other file. Here are the results:

This does happen for any other choice of plaintext that I’ve attempted, and does not seem to be a result of random noise. I assume that somehow, having the option to set the key affects the target’s power consumption, even though the traces are only captured over the encrypt function.
I can try and standardize the traces to be between [0, 1] but I’m not sure if it could result in loss of data or if maybe the problem even lies elsewhere.

Also, as I built the firmware for both files myself, I know for a fact that the only difference between the 2 is using a constant key vs having a set_key() function with no default key.

jpthibault · June 26, 2025, 10:20pm

That is a larger difference than I would expect to see, but it is possible for the sending of the key to affect the power trace. If you are using the slower SS version 1 protocol, then it takes quite a while to send the key, which spaces out the encryptions more. This makes it so that when the target begins the encryption, it has been idle for longer than it would have been if the key wasn’t sent, and this can lead to differences in the power trace.

You can add always_send_key=True to cw.capture_trace() to eliminate this variation.

Roei · June 27, 2025, 11:18am

I received captured traces from my instructor that he captured himself to test if the code itself works properly, and after checking it seems like the issue is indeed with the traces.
We thought that the chip itself may be faulty or cause the problem as I capture my traces exactly like him, but after digging some more, I’ve noticed that capturing traces from the compiled hex file we’re given seems to be fine, and the problem is with file we compile ourself to capture profiling traces.

Since I do know the actual correct key now, I took the C source file and replaced the erased secret key with the one I found, which means this code is identical to the one of the compiled hex file we’re given. I compiled the code on my own, captured traces (constant key - only encryptions without sending new keys), and a similar notable difference can be observed again.

My thought is that the way the hex file we’re given was compiled is different from the way I compile the same C source code file (maybe the make-file we’re given is problematic, or a different OS, etc…), but this isn’t an issue related to ChipWhisperer itself in that case, so I’ll have to wait and update if that was indeed the issue after talking with my instructor again.

Either way - setting always_send_key=True is something I will take into account the next time I average traces of the same input, it will probably improve the results.

jpthibault · June 27, 2025, 3:43pm

Different compiler versions can absolutely have an impact.