DPA Lab 3_3 XMEGA Thoughts and questions

I am looking to better understand a specific facet of DPA in the LAB 3_3 example

Given the following code used in DPA on AES from chipwhisperer

def calculate_diffs(guess, byteindex=0, bitnum=0):

one_list = []
zero_list = []

for trace_index in range(numtraces):
hypothetical_leakage = aes_internal(guess, textin_array[trace_index][byteindex])

if hypothetical_leakage & (1<<bitnum):
one_list.append(trace_array[trace_index])
else:
zero_list.append(trace_array[trace_index])

one_avg = np.asarray(one_list).mean(axis=0)
zero_avg = np.asarray(zero_list).mean(axis=0)
return abs(one_avg - zero_avg)

how could this accurately determine the highest power consumption and correlate it to the correct key, this code assumes that the highest power consumption value could only be achieved with the correct key? But the issue I am thinking of is couldn’t other key values result in bits that would result in a higher absolute value than that of the real key. Even with thousands of traces, it seems like it’s pure coincidence if it lines up or not. Now I understand the reasoning behind the response to this being “you would need more traces to circumvent this issue”. The issue I am thinking about is even if you had a billion traces, the specific 1 or 0 value of a bit changes with all bytes (meaning many keys could lead to a “1” bit on the targeted bit, this all depends on the plaintext the keys are xored against). So there are bytes where that specific targeted bit may have more 1 outputs than the real key for that bit, even for a billion traces. It seems purely like it’s coincidence. To explain further let’s say that the real key is 57, which across a billion traces resulted in 10 million ones on the targeted bit across the traces. But during testing, other key values used, for example, 65,21,32, and 250+ others are used. Let’s say given a billion samples of random text, these keys each alone give higher upwards of 12 million ones on the targeted bit across the traces. Wouldn’t the result look like these had higher power consumption? It seems like the amount of ones a key would generate given the input leads to the highest power consumption value therefore incorrect keys vs the correct key seems simiply left up to whatever key + plaintext would result in more ones (which could be more than the actual key)? Can you explain how the single bit can be correlated to the highest power consumption ONLY with the correct key and not the other 1 byte keys. In the single bit lab 3_2 it made sense as there was something to compare the bits against (a fixed value) here, you’re comparing based off of the highest value resulting from whatever gives you the most "1"s. If only the correct key can show the highest power consumption values and not other keys (that could generate more 1’s), why?

I think you may be misunderstanding the attack. Guessing the correct key will allow you to sort your traces into a group with leakage of [1, 8] and [0, 7]. These groups have averages approaching 4.5 and 3.5, so if you sum them up and subtract the averages, you get a measurable difference. If your key guess is incorrect, your groups will both have leakages of [0, 8] and therefore the average of both groups should be almost the same.

Can you elaborate on this especially on what you mean when referring to [1,8] and [0,7]? Are these the bit indexes? And can you describe the change between [0,8] and [1,8] how that works at a granular level?

I am able to go through the lab, get the sub-key bytes, and for all the wrong estimations plot them to see the incorrect peaks and where the correct peak is to derive the actual AES key. But after being able to do the attack, I feel I don’t understand it at as much of a granular level as I would like.

My understanding of the attack was

1.) Supply input and capture traces
2.) Loop through input texts with key guesses 0x00-0xFF
3.) Choose a target bit in crypto output, separate the corresponding trace index into a one or zero array depending on the bit value
4.) average each array then subtract each other to get the difference, the higher the difference the more likely that is the correct key value

this is the part I wanted to understand more, if you could explain in detail how the separating into both arrays via the target bit, averaging the finding the difference works, I understand that it does work in the lab but I am not understanding why it works and how everything correlates together, I feel like I am missing details

I did not get how parts 3 and 4 could accurately get the key, can you explain each step in great detail, and why each step works the way it does? I’ve read a few papers on the topic and this is what I assumed they were describing.

Based on lab 3_1, we know that the device leaks the Hamming weight of its data (the number of 1’s), which can have a value between 0 and 8. If one of these bits is always 1 (our 1’s group), the Hamming weight and therefore the leakage will always be between 1 and 8. If this bit is instead always 0, the leakage will instead be between 0 and 7. If your key guess is correct, then you will correctly sort your traces and this difference in leakage will be measurable. If your key guess is incorrect, the grouping will be effectively random and you will get very little difference between the two groups.

To make sure I understand, with hamming weight being the measure of how much power is consumed via how many 1’s are in a byte, when you say we can have a value between 0 and 8 you’re saying the amount of possible 1’s we can have so zero ones or a hamming weight of zero would be for a byte like 0x0, a hamming weight of one would be for a byte like 0x01, and a value of 3 for a 0x15. so by the bit we are choosing because that value will always be between 1 and 8 due to confirming via the targeted bit whereas the bit being 0 the value will always be between 0 and 7. So what happens if the amount of ones in both traces even for thousands are equivalent, take the following for example:

So for instance let’s say we target the first bit in 0x80 (where in 10000000, the first bit would be 1)

but let’s say a majority of the traces captures look like the following simply due to coincidence of the randomly generated data:

ones = [11110111, 11001111, 10011011, 10111010, 10101101, …]
zeros = [01111111, 00111111, 01011011, 01111010, 01101101, …]

so even if the key was correct wouldn’t the power consumption even out not giving a measurable difference and the attack assumes that the data won’t have an equal number of ones and zeros because statistically it’s less likely, and is that the reason for the large amount of traces to avoid a situation like this?

No, we collect a large amount of traces to increase the difference and average out noise. For reference, a quick python program with 10k trials and 20 traces had your situation happen 21 times