iCE40 AES SCA attack

NewDwarf · March 11, 2024, 3:24pm

Luckely, I have a chance to run the SCA attack against AES on the iCE40 based board.
I am trying to review the AES implementation chipwhisperer/firmware/fpgas/aes/icestorm/makefile at develop · newaetech/chipwhisperer · GitHub
Whether I correctly understand that it is protected (masked) implementation of AES S-boxes for this project?
chipwhisperer/firmware/fpgas/cryptosrc/aes_googlevault/aes_sbox.v at develop · newaetech/chipwhisperer · GitHub
And the reason we can break it is we use the leak model ‘last_round_state_diff’ which is actually not protected…

jpthibault · March 11, 2024, 5:02pm

No, this implementation is not masked or protected in any way.
We use a different leakage model (compared to our software AES targets) because this implementation does a complete AES round in a cycle clock cycle; only the final result of each round gets stored in flops. This limits what we can target for a side-channel attack. In contrast, in a pure software AES implementation, the result of each round component (subbytes, shiftrows, …) all get stored somewhere.

You can’t run it on the iCE40 target, but our pipelined HW AES demo explains how to match a leakage model to an implementation.

NewDwarf · March 11, 2024, 5:16pm

…hmm. But this page chipwhisperer/firmware/fpgas at develop · newaetech/chipwhisperer · GitHub claims iCE40 can run it:

jpthibault · March 11, 2024, 5:18pm

I was referring to “pipelined AES” (second row).
The “regular” AES (first row) can indeed run on the iCE40.

NewDwarf · March 11, 2024, 5:33pm

I meant the pure AES implementation. And the makefile (develop/firmware/fpgas/aes/icestorm/makefile) also refers to this (pure AES) project.
I am not familiar with the real hardware instantiation of verilog/VHDL code, so some my questions are weird.
My expectation of “clean” Sbox implementation is like this one AES-VHDL/AES-ENC/RTL/sbox.vhd at master · hadipourh/AES-VHDL · GitHub

lut : process (input_byte) is
	begin
		case input_byte is
			when x"00" => output_byte <= x"63";
			when x"01" => output_byte <= x"7c";
			when x"02" => output_byte <= x"77";
			when x"03" => output_byte <= x"7b";

but in the case of iCE40 AES FPGA implementation, Sbox transformation looks like masked by XORing

github.com

newaetech/chipwhisperer/blob/develop/firmware/fpgas/cryptosrc/aes_googlevault/aes_sbox.v#L93C2-L101C35


      
          	begin : encrypt_top
          		reg T5, T7, T11, T12, T18, T21;
          		
          		T1 = U0 ^ U3; /* T1 = U0 + U3 */
          		T2 = U0 ^ U5; /* T2 = U0 + U5 */
          		T3 = U0 ^ U6; /* T3 = U0 + U6 */
          		T4 = U3 ^ U5; /* T4 = U3 + U5 */
          		T5 = U4 ^ U6; /* T5 = U4 + U6 */
          		T6 = T1 ^ T5; /* T6 = T1 + T5 */

...
begin : encrypt_top
		reg T5, T7, T11, T12, T18, T21;
		
		T1 = U0 ^ U3; /* T1 = U0 + U3 */
		T2 = U0 ^ U5; /* T2 = U0 + U5 */
		T3 = U0 ^ U6; /* T3 = U0 + U6 */
		T4 = U3 ^ U5; /* T4 = U3 + U5 */
		T5 = U4 ^ U6; /* T5 = U4 + U6 */
		T6 = T1 ^ T5; /* T6 = T1 + T5 */
...

Or this is not obfuscation but just tricky/optimized approach to implement AES?

jpthibault · March 11, 2024, 5:55pm

Sorry for the confusion - to be clear, the notebook for the pipelined AES target (which cannot run on iCE40) contains useful info about leakage models for hardware AES implementations. You can run the “normal” AES target and notebook on iCE40, however that notebook doesn’t explain very much about leakage models, which is why I pointed you to the pipelined AES notebook: even if you can’t run it, you might find it useful to read it.

If you look at the source code for the sbox module (that you linked above), you’ll see it has an 8-bit input, an 8-bit output, and a single bit “dec” control input which stands for “decrypt mode” (i.e. determines whether the sbox or the inverse-sbox transform should be applied). There is no other input with which to mask or obfuscate

It’s actually a very common approach for hardware sbox implementation; it comes from the mathematical definition of the sbox (see section 5.1.1 here). In hardware implementations, this kind of approach can potentially require significantly fewer gates than a (ROM-based) lookup table, (depending on the desired clock frequency). It’s usually referred to as a Galois Field-based implementation (vs a LUT-based implementation).

NewDwarf · March 11, 2024, 5:58pm

Understood. Thanks for the detailed explanation. Have a good day!