CPA attack succeeds on hardware AES (AES_100t) but fails on software AES running on RISC-V (CV32E40P) on CW305

ishir23 · June 10, 2026, 4:02pm

Hi,

I am a student working on a side-channel analysis project. I am using a CW305 (Artix-7 100t) with a custom CV32E40P (RISC-V) soft-core processor running AES in software.

My setup is,

CW305 with Artix-7 XC7A100T
CV32E40P RISC-V core synthesized as a soft-core
Software AES-128 implemented in C (SubBytes/MixColumns/ShiftRows in pure C)
ChipWhisperer Python 6.0.0, scope: CW-Lite
Clock: 50MHz, ADC: extclk_x4 (200MHz sample rate)
target.clkusbautooff = True, target.clksleeptime = 1
Using cw.capture_trace() helper

Using AES_100t.bit (NewAE’s hardware AES), CPA with last_round_state_diff recovers the full 16-byte key in 5000 traces with max correlation ~0.24. This confirms the measurement setup is correct.

It fails when I flash our custom RISC-V bitstream running software AES, CPA fails with all correlations at ~0.07 (noise floor) for both last_round_state_diff and sbox_output models with 5000 and 10000 traces. The trace variance plot shows AES activity spread across ~2000 samples at 200MHz sample rate.

I got some help and tried these stuff,

DFsbox_output and last_round_state_diff leakage models
5000, 10000 traces
Different sample windows (0-100, 0-500, 500-1000, 1000-2000)
Gain settings from 25dB to 45dB
SMA cable on X3 and X4

Questions:

Is CPA on software AES feasible on CW305 with a RISC-V soft-core, or is the noise from the FPGA fabric too high?
How many traces are typically needed for software AES on an FPGA soft-core?
Is there a better leakage model or attack strategy for this scenario?
Would a different approach like template attacks or TVLA be more appropriate?

Any guidance would be greatly appreciated. Thank you.

jpthibault · June 10, 2026, 4:29pm

Definitely not! We have run the lowRISC Ibex and Arm DesignStart soft cores on the CW305: chipwhisperer/firmware/fpgas at develop · newaetech/chipwhisperer · GitHub

The appropriate leakage model will depend on the implementation; in our examples with TINYAES, we use the sbox_output model.

It sounds like you’ve tried a lot of different settings haphazardly; take a more focused approach:

if your leakage model is targeting the first round, then identify the first round on the power trace and use a window of samples around that for your attack
set the gain so that you are using a good dynamic range (close to [-0.5, +0.5] but not clipping)
look at how the PGE evolves as you collect more traces. If it’s getting lower as you use more traces, you are on the right track!

ishir23 · June 11, 2026, 10:02pm

Thank you for the response. We followed your advice and made good progress. We found that the followig,

Correct gain: 30dB gives signal range ±0.38, no clipping
Variance plot shows highest activity in samples 0-80
PGE tracking with sbox_output model on samples 0-80 shows:
- n=500: PGE=75
- n=1000: PGE=61
- n=3000: PGE=22
- n=5000: PGE=27

PGE is generally decreasing which I believe to be on the right track. However we haven’t reached PGE=0 yet with 5000 traces.

Our setup:

CW305 Artix-7 100t with CV32E40P RISC-V soft-core at 50MHz
Software AES-128 in C (standard SubBytes/MixColumns loop implementation)
ADC at extclk_x4 (200MHz), clkusbautooff=True, cw.capture_trace()
sbox_output leakage model, window samples 0-80

Specific questions:

With your Ibex soft-core on CW305, approximately how many traces were needed to reach PGE=0 with software AES?
Our PGE fluctuates (went from 22 at n=3000 to 27 at n=5000), is that normal? Should we keep adding traces?
Is there a better window selection strategy, should we attack each byte separately with its own optimal window?
We only have one more lab session tomorrow. What is the minimum number of traces you’d recommend capturing to have a good chance of success?

This the jupyter notebook for your reference

Setup

scope.gain.db = 30
scope.adc.samples = 2000
scope.clock.adc_src = “extclk_x4”
target.pll.pll_outfreq_set(50E6, 1)
target.clkusbautooff = True
target.clksleeptime = 1

Capture

for i in range(N):
key, text = ktp.next()
ret = cw.capture_trace(scope, target, text, key)
if not ret:
continue
proj.traces.append(ret)

Attack with window 0-80

proj_n = cw.create_project(“attack”, overwrite=True)
for t in proj.traces:
trimmed = cw.Trace(t.wave[0:80], t.textin, t.textout, t.key)
proj_n.traces.append(trimmed)

attack = cwa.cpa(proj_n, cwa.leakage_models.sbox_output)
results = attack.run()
pge = results.find_maximums()[0][0][1]
corr = results.find_maximums()[0][0][2]

jpthibault · June 12, 2026, 2:50pm

You are absolutely on the right track. If you were not, then PGE would tend to stay around 128.

It’s impossible for me to tell you how many traces you need. Collect more until you succeed.

sscotto · June 15, 2026, 9:06am

CPA on software AES on CW305 with CV32E40P is possible; the failure is caused by improper time windowing and misalignment rather than noise.

Correlation collapses because your large sample range mixes unrelated instructions. Although it will still need more traces than hardware AES, CPA should begin to function after the real first round S box execution area is isolated and traces are correctly aligned.

ishir23 · June 17, 2026, 4:50am

Thank you both for the guidance. It’s been very helpful.

Update on our progress:

Following your advice, we:

Increased traces to 60,000 (up from 5,000-15,000 previously)
Isolated window 0-80 samples based on variance plot showing highest data-dependent activity there
Verified plaintext/ciphertext pairs, all correct against PyCryptodome reference implementation
Calibrated gain to 30dB, signal range ±0.38, no clipping

Current results with sbox_output, window 0-80, 60k traces:

PGEs: [69, 38, 59, 55, 23, 59, 29, 64, 34, 0, 0, 3, 66, 17, 15, 50]
Recovered: f7 62 0f d2 2d a9 b2 43 e4 e5 3d bd de 4d 53 03
Correct:   2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c
Bytes correct: 0/16

PGEs are clearly dropping (bytes 9,10 at PGE=0, several under 30) confirming we are on the right track as you said.

However even when bytes 9 and 10 show PGE=0 but the recovered byte doesn’t match the correct key byte. For example byte 9: PGE=0 but got=0xe5 while correct=0xf7.

Is it possible that PGE=0 means the best correlating guess has highest correlation, but it’s still the wrong byte due to insufficient traces, maybe a noise peak temporarily outranking the correct key? If so, we understand we just need more traces until the correct key consistently ranks #1.

We plan to capture 200,000+ traces next session. Does that sound like the right approach, or is there something else we should check first?