Masked AES implementation for CW targets

Hi,

I was wondering if someone knows of any masked AES implementations that are ready-to-use on a CW target?

Thanks!

Yes, several implementations have been contributed by @jmichel:

You’ll find them here: https://github.com/newaetech/chipwhisperer/tree/develop/hardware/victims/firmware/crypto

Thanks! But I’m having trouble cloning the repository, I get an error that it’s not found:

Does it mean I don’t have permission to clone it?

That’s not how git works, you can’t clone an arbitrary subdir of a repo.

First, clone the entire ChipWhisperer repo, if you haven’t already:
git clone https://github.com/newaetech/chipwhisperer.git

Then, cd to hardware/victims/firmware/crypto and grab the submodules you want, e.g.:
git submodule update --init Higher-Order-Masked-AES-128/

Thank you for the prompt reply, I’m quite a newbie with these things. The same goes for how I should adjust the lines for compiling and programming. I therefore need a little bit of help with how I should change these lines in order to use the masked implementation:

I’m sorry for very elementary questions, I’m just not so versed in these things yet. I appreciate your help.

As an alternative, for our research, I’m trying to keep pre-compiled firmware for all targets and all AES implementations on a github. That’s the only way I can ensure reproducibility of our results.
You can find/download all ELF + HEX files there: https://github.com/jmichelp/chipwhisperer-firmware
This way you only care about flashing the correct file.

Thank you! This was invaluable, I really appreciate it. I tried the “MASKEDAES_ANSSI” for the STM32F3 and the number of leakage points doing TVLA dropped from 16350/20000 points in the unmasked TinyAES128 implementation down to 708/20000. I was wondering if you could either explain or direct me to documents where I can read about the different implementations. In particular I am curious about the various implementations for the STM32F3, the ones named
“MASKEDAES_ANSSI”
“MASKEDAES_ANSSI+KEYSCHEDULE”
“MASKEDAES_ANSSI+UNROLLED”
“MASKEDAES_ANSSI+UNROLLED+KEYSCHEDULE”
“MBEDTLS”

I assume all are the 128-bit version?

And I assume that the “TINYAES128C” is just the unmasked version?

Thanks!

Yes they are all AES128 implementations.

MBEDTLS is basically what it says: unprotected AES implementation from MbedTLS library :slight_smile:. It’s faster than TinyAES but shouldn’t be much harder to attack.

MaskedAES ANSSI is the implementation from ANSSI coming from their Github account: https://github.com/ANSSI-FR/SecAESSTM32. The documentation is available there.
The keywords after the implementation are the options the firmware was compiled with:

  • KEYSCHEDULE will include the key scheduling as part of the trace (typically on Chipwhisperer, the trigger happens right after key scheduling). This is useful here because this implementation as the capability of also masking the key schedule. Therefore capturing it allows to assess the efficiency of the protection.
  • UNROLLED will unroll most of the loops and therefore is faster, yielding shorter traces at the expense of requiring more flash.
  • UNROLLED+KEYSCHEDULE means that the above 2 options were set to compile the firmware :slight_smile:

Hi,

Thank you! I will look into the theory of the masked implementations a bit more. But I’m wondering, how many traces should I expect to collect for extracting the key using CPA? First I did a TVLA and there is not much leakage:

image

(there are 708/20000 leakage points)

Then I did a CPA with own code, attaching only the first byte, and ended up with this, using 29999 traces (got a time-out for the first trace):

image

To make sure it wasn’t my own code that caused issues, I ran the data set through CW CPA tutorial and ended up with this guessed key compared to the actual key:

image

I’m just wondering what I should expect, and what experience you have with this. I’ve used the regular SMA connection for collecting the traces (I’m planning on running the dataset through some machine learning code, but haven’t had time yet).

Thanks!

Here you can find the published paper by ANSSI team about how they attacked their own implementation: https://eprint.iacr.org/2021/592.pdf

Basically this is the kind of implementation that you attack with template attacks. CPA shouldn’t be able to recover the key.

We’re using this implementation in our own research to study how much more efficient deep learning is against protected implementations (see: https://elie.net/blog/security/hacker-guide-to-deep-learning-side-channel-attacks-the-theory/ and the links to our 2 presentations which are contained in this blog post).

Hi!

Thank you. That’s sort of relieving that I wasn’t supposed to be able to extract the key with CPA. I thought I had blundered in some way. I actually read the blog post and watched your presentation earlier this week. Very interesting stuff. I’m very new to machine learning and deep learning (just started looking into it a few weeks ago), and at the moment I find it quite overwhelming, but on the other hand I also find it quite intriguing how it can be used for SCA.

I’ve been trying to play around with Guassian Naive Bayes, SVM, MLP, and Random Forest, but I find the results to be fluctuating and diverging very easily depending on the parameters I choose. Would you say that those techniques are insufficient to break the ANSSI implementation, and that stronger deep learning methods is the way to go, or should I be able to extract the key with simpler machine learning algorithms?

It’s rather hard to give a clear and definitive answer here as this is an ongoing research field.

In our research we decided to have everything automated and therefore to use complex models which are able to process the traces of the full AES encryption (all 10 rounds. Other researchers use much smaller machine learning models after pre-processing the traces.

Hi,

Thank you. I just have two follow-up questions:

  1. In the TVLA plot I provided above, why is there such a big “tail” at the beginning, about the first 1000 samples?

  2. What is the reason why CPA does not work on that masked implementation, but that deep learning methods work?

Thanks!