CW Husky and CW Lite capture different traces

Hi.

I just captured the traces from the STM32F4 (with HW AES) target board by the CW Husky and surprised that that the AES key cannot be recovered.
Then I decided to capture the traces for the static AES-128 key/static plaintext by CW Husky and CW Lite.
The traces have different shapes for CW Husky and CW Lite.

Here is the set of 30 consecutive traces captured by the CW Husky (with the static AES-128 key/static plaintext):

And here is the set of 30 consecutive traces captured by the CW Lite (also with the static AES-128 key/static plaintext):

The traces are synchronized in both cases but the CW Husky traces look more noisy (maybe it is because of 12-bit quantization?)

As a result, CW Husky is unable to recover the last round key. For key recovering I used the range 709 - 712 samples as the state is pushed into the registers exactly there.

The last round key for the traces captured by the CW Husky (the last key is not recovered. Recovered bytes has very low PGE values):

And here is the completely recovered last round key captured by the CW Lite (Recovered key bytes have nice PGE values):

I used exactly the same script (for both CW Husky and CW Lite) to collect the power traces.
In both cases, the ‘scope.adc.trig_count’ returned the same value ‘1148’. So, we can say, the same ‘scope.clock.clkgen_freq’ and ‘scope.clock.adc_freq’ values were used.

Another weird thing is the CW Husky always returns the error “gain too low error” regardless of the “scope.gain.db” value. Using “scope.adc.lo_gain_errors_disabled” also doesn’t suppress blinking of “ADC” and “Glitch” LED’s.

What can be the reason of this behavior (Not reliable results obtained by the CW Husky)?

Clearly there is clipping in the first plot (gain too high).
(There is also cllipping on the CW-lite, just not as much.)

This is strange, do you have a notebook I can use to reproduce the issue?
Also, which version of CW are you using? What is the output of:

print(cw.__version__)
print(scope.fpga_buildtime)

I wouldn’t worry about the CPA attack until the gain issue is resolved – if the gain is too low or too high, side-channel attacks will suffer.

Another thing to note is that gain too low/too high errors persist until they are cleared with scope.errors.clear(); this may be what’s confusing you in setting an appropriate gain.

Thanks. I captured more traces with different gain.
Here is the trace amplified on 10 dB (in my opinion it looks much better). One thing which worries me is amplitude of the traces.

The next screenshot is of the same capture but captured with 20 dB. Looks more noisy.

When I significantly decreased the gain, LED blinking was gone.
Could you please confirm 10 dB is better for using?

…in the meantime, I will capture more traces by means of CW Husky and run the attack.

@jpthibault The result for the “Cw Husky 10 dB” traces almost the same as it was captured with 35 dB :frowning:

Here is the result:

@jpthibault I use such script. So you can easily reproduce my problem (inability to recover correct last round key using the CW Husky) in the case if you have the same (STM32F4 with HW AES) target board.

import chipwhisperer as cw
import time
from tqdm import tqdm
 
scope = cw.scope()
target = cw.target(scope, cw.targets.SimpleSerial2)
 
time.sleep(0.05)
scope.default_setup()
 
time.sleep(0.05)
scope.io.nrst = 'low'
time.sleep(0.05)
scope.io.nrst = 'high_z'
 
scope.gain.db = 10
 
scope.adc.samples = 1148
scope.adc.offset = 0
scope.adc.basic_mode = "rising_edge"
 
scope.clock.reset_adc()
assert (scope.clock.adc_locked), "ADC failed to lock"
 
project = cw.create_project("traces/STM32F4_HW_AES_husky_static_20000_10db.cwp", overwrite=True)
 
ktp = cw.ktp.Basic()
N = 20000

 
for i in tqdm(range(N)):
    key, text = ktp.next()
    trace = cw.capture_trace(scope, target, text, key)
    if trace is None:
        continue
    project.traces.append(trace)
 
print(scope.adc.trig_count)
print(cw.__version__)
print(scope.fpga_buildtime)

project.save()
 
scope.dis()
target.dis()

The 10dB trace looks like the gain is too low. The full range is +/- 0.5, so you’re only using about 15% of the ADC’s dynamic range there. Whereas 35dB was definitely too high.

You’re worrying too much about whether the traces “look” noisy, which can be very subjective… Just set the gain so that you get close to the full dynamic range, without clipping.

And if you’re still not getting full key recovery, capture more traces!

Side channel attacks are not an exact science, and CW-Lite/Husky have a different analog front-end. While Husky’s is better, it may be that on the Lite there is a slightly different sampling clock phase which in this particular case is catching a bit more leakage, allowing the attack to work with fewer traces… but at the end of the day, whether an attack succeeds in 500 traces or 600 traces, it’s still a successful attack, and in most cases, most people would not care much about this kind of difference.

One more question. Using 15 dB doesn’t blink the “ADC” and “Glitch” LED’s but 20 dB already causes blinking. Should I ignore error and use 20 dB or I should use 15 dB?

Yes, I understand this but my expectation was CW Husky should capture the traces at least as CW Lite but I got opposite results which surprised me.

Clipping = information is lost.

@jpthibault 15 dB and 60000 traces captured by the CW Husky gave even worse results :exploding_head:

First time, I totally ignored the “ADC clipping error” using 35 dB and was able to recover partly the last round key. This tells that information was not lost. :thinking:

I stand by my statement; information was lost. However it’s certainly possible that clipping is not occurring on the power samples that contain the leakage used by the side-channel attack.

In other words, it’s possible that no information relevant to your side-channel attack was lost.

@jpthibault …returning to my original question, what is wrong with capturing by the CW Husky?
Adjusting the “scope.gain.db” won’t help.
Increasing the number of the captured traces up to 100K traces also doesn’t work. It gives even worse results.
It looks like the CW Lite works much better as it just requires 10K traces to recover the whole key correctly.

I cannot agree with this statement. Any electronic system consumes deterministic power.
Our goal is to make conditions to capture the power being consumed as accurately as possible.
If the capture hardware distorts captured data, this leads to increasing the number of required power traces. And even a great number of collected traces doesn’t guarantee key recovering in the case if original analog power trace was poorly digitized.

In my personal case I still see significant difference between the captured traces by CW Lite and CW Husky.
I am not sure if I have a bad sample of the CW Husky or any Husky gives such result.

I already answered this above:

What you could do is increase scope.clock.adc_mul to get more samples per clock cycle (something you can’t do with the Lite). I think that by default our AES HW lab uses 4 samples per clock; try 8. Or, if you’re adverse to that, play with scope.clock.adc_phase, which is what I suspect is causing the difference here.

@jpthibault Without words, just results:

I was thinking more that CW Husky’s frontend doesn’t work for the SCA attacks against HW crypto.
Probably my Husky sample is broken.

CW Lite perfectly works with any HW crypto implementation.

I took some time to investigate this; the TL;DR is that Husky works perfectly well against HW crypto.
(As an aside, all of our most recently introduced FPGA target notebooks – 6 by my count – have been developed using Husky.)

My results on the sca201/Lab 2_2 notebook, using 15k traces:
CW-Lite: 15 out of 16 key bytes recovered
CW-Husky: 14 out of 16 key bytes recovered

YMMV but these are not cherry-picked results.
I used samples 700-705 for the attack with CW-Lite.
With Husky, I shifted the window to 703-708 because Husky’s sampling latency is 3 samples less (as documented here).

With CW-Lite, I used the default setup, with sampling set to clkgen_x4.
With CW-Husky, I used scope.clock.adc_mul = 4 (for fairness) and found better results with scope.clock.adc_phase = 131.

I did come across this issue when playing with adc_mul and adc_phase; this may have been the cause of your poor results above.

To conclude, it’s simply luck that the Lite’s default settings give better results than Husky’s default settings. With a different target you may well find the opposite.

@jpthibault Hi.
Thanks for looking at this issue.
I also tried similar setting several times but nothing hepled.
The script I used to collect the traces.

import chipwhisperer as cw                                                                                                                                                                                  
import time
from tqdm import tqdm

scope = cw.scope()
target = cw.target(scope, cw.targets.SimpleSerial2)

time.sleep(0.05)
scope.default_setup()

time.sleep(0.05)
scope.io.nrst = 'low'
time.sleep(0.05)
scope.io.nrst = 'high_z'

scope.gain.db = 15

scope.adc.samples = 1148
scope.adc.offset = 0
scope.adc.basic_mode = "rising_edge"
scope.clock.adc_mul = 4
scope.clock.adc_phase = 131

scope.clock.pll.recal()
scope.clock.clkgen_src = 'internal'
scope.clock.clkgen_freq = 7370000

#scope.clock.reset_adc()
#assert (scope.clock.adc_locked), "ADC failed to lock"

project = cw.create_project("traces/STM32F4_HW_AES_husky_recap.cwp", overwrite=True)

ktp = cw.ktp.Basic()
N = 20000


for i in tqdm(range(N)):
    key, text = ktp.next()
    trace = cw.capture_trace(scope, target, text, key)
    if trace is None:
        continue
    project.traces.append(trace)

project.save()

scope.dis()
target.dis()

And here is the result of guessing the key using the collected traces by the above script


…too far from perfect. :upside_down_face:

I would kindly ask you to collect the traces using above script and Husky and then share the traces.
It would be nice to compare your and my traces.