I followed the tutorial (PA_HW_CW305_1-Attacking_AES_on_an_FPGA) to implement the CPA attack, but I can not get all correct key when I use CPA to attack 32-bit AES. Not being able to just use CPA on non-128-bit AES may have something to do with how the cpa algorithm decides points of interest.
I am confused about this line of code how to determine which point in the power trace is the point of interest required by the CPA algorithm.
I searched for some information in /software/chipwhisperer/analyzer/attacks.
In AES128_8bit.py, the leakage model gives us hammering distance.I don’t think it has anything to do with points of interest, but maybe I’m wrong.
In progressive.py, it uses different traces to get output stats, but I don’t understand how it figures out points of interest in power trace.
The only selection of “points of interest” in a CPA attack is just selecting the point with the highest correlation. I’d recommend running through courses/sca101, in particular Lab 4_2, as it explains how CPA attacks work. It also does all the calculations through numpy, so it’s much more explicit than using Analyzer.
Also, when you say 32-bit AES, do you mean doing AES via T-Tables, or AES-256?
Thanks for your suggestion. I referenced PA_HW_CW305_1-Attacking_AES_on_an_FPGA earlier because I was learning to use CW305 at the same time. I neglected other tutorials on CPA but I will learn more about how CPA attack works.
Sorry I didn’t explain clearly. That is for 128-bit AES and the data path is 32-bit. As shown below is its power trace, each round will do 4 times of s-boxes, that’s why I want to understand how the algorithm of the CPA attack analyzes the power trace.
Is this a software or hardware AES implementation? Do you mean that, each cycle, 4 of the 16 bytes are processed, or something else?
For a hardware AES implementation, power analysis information is usually most visible when that information is stored in internal registers. It also leaks in the form of a difference between the last two states of that internal register. For the normal CW305 bitstream, the AES state is stored at the end of each round. This, plus the absence of MixColumns, is why we attack using the last_round_state_diff leakage model.
I know this post is a bit old, but I actually have a very similar question. Is there documentation (a whitepaper, a journal article, etc.) on what is happening when attacking specifically the 128-bit hardware AES implementation? Clearly, we can’t just try every possible key, because that would take waaayyyyy too long, but unlike how the firmware operates on 8 bits at a time, the hardware AES implementation used in PA_HW_CW305_1-Attacking_AES is doing lookups on all 16 bytes in the same cycle via the Google AES core–what optimizations are made to allow an attack on this?
I understand CPA on 8-bit firmware implementations well enough to have written my own CPA analysis tool. I have also written my own code to analyze a hardware AES implementation that performs SubBytes on 4 bytes per cycle (effectively 32 bits), but with my lack of understanding the optimizations made for the 128-bit AES analysis, my implementation on the 32-bit AES hardware is to simply try all 2^32 possible combinations x4 to account for the other SubBytes 12 bytes–which is VERY slow even when using an implementation with all math done through NumPy.
If we are able to able to solve the 128-bit hardware AES in our lifetimes, I am assuming it would be possible to do something similar to drastically speed up operations on smaller subsets of the key (2, 4, 8-bytes) for AES implementations that perhaps focus more on area or power consumption minimization, rather than speed. I have been diving into the code that performs the analysis, but I keep coming across classes and functions labeled CPAProgressiveOneSubkey or AES128_8bit, which is leaving me very confused as it leads me to believe the algorithm is somehow still operating on 8 bits of the key at a time.
The algorithm is indeed operating on 1 byte at a time, even if the device is doing sub bytes on all 16. The individual power consumption of each byte is still there, so you can still perform CPA as normal. The power consumption of the bytes you’re not currently analyzing is effectively just additional random noise and ends up not having very much of an effect.
With hardware AES, you can have additional complications as leakage is often the hamming distance between the old and new values of the AES state register, but this is unrelated to how many bytes are being operated on at a time.