I’ve tried to play with the uEcc trace lab in the courses/sca205 section.
I’m using a Husky wired to the CW313 board and a CW308_STM32F415 target board plugged thanks the CW308 to CW312 adapter. 3 jumper wires are connecting the signals between the 30-pin HDR5 header and the user I/Os on the Husky.
I am using the python lief library to compute the address of the required symbol. This way they’re not hard-coded in the notebook anymore. Compiling the firmware for STM32F3 or STM32F4 gives the same symbol addresses:
XYcZ_add is at 0x08000d4d
XYcZ_addC is at 0x08000c31
The notebook runs perfectly fine until I reach the cell that captures 50 power traces. The capture runs find but the cell fails at the assertion assert len(times_both_markers) == 510 because the array is empty.
When looking the raws variable containing the raw packets captured from the I/O pins, each of the 50 captures contains the same 101 bytearray objects. There are only 2 different packets in these:
>>> print(repr({bytes(x) for x in raws[0]}))
{b'\xff\xff\x92', b'\xff\xff\x86'}
I also tried plugging an Orbtrace Mini on the small Cortex JTAG connector on the CW313 board to see if the probe was seeing anything different but no luck so far. I was running orbuculum to connect to the probe and then orbcat -c 0,”%c” and orbtop -e firmware/mcus/simpleserial-ecc/simpleserial-ecc-CW308_STM32F4.elf in separate terminals while running the capture and saw no output.
Hmm, I’ve not tried trace with an STM32F4, so my best guess is that something else is needed to switch the target into SWD mode. I don’t have access to an F4 at the moment; I’ll look into it next week. If you have a debug probe that can get your target into SWD mode, you could use that first, and then run the notebook (for example I’ve done this with a Segger before).
Jumper wires between Husky User I/O port and J8 header to replicate the picture from the notebook
Flashing works, capturing works but same issue: there’s no match between the set rules and the captured packets over SWO (they still match the ones from my original message).
I also used a Segger J-Link to put the target into SWD mode before rerunning the capture cell. Same result.
In case that’s useful, here are the modifications I’ve made to the cell in order to extract the function addresses from the firmware:
from pathlib import Path
import lief
elf_path = Path(fw_path).with_suffix(".elf")
elf = lief.ELF.parse(elf_path)
uecc_add = 0
uecc_addc = 0
for symb in elf.symbols:
if symb.is_function:
match symb.name:
case "XYcZ_addC":
uecc_addc = symb.value
case "XYcZ_add":
uecc_add = symb.value
case _:
pass
print(f"uECC XYcZ_addC function found at 0x{uecc_addc:08x}")
print(f"uECC XYcZ_add function found at 0x{uecc_add:08x}")
trace.set_isync_matches(addr0=uecc_addc, addr1=uecc_add, match='both')
When I execute the cell that moves from JTAG to SWD, there’s activity on SWCLK and SWDIO lines. SWDCLK is at around 6.3kHz. That’s the only activity than happens on these 2 wires.
Then I start seeing some periodic activity on the SWO line, at 5MHz (the shortest pulse I see is 100ns). I see a burst of data being transmitted with a pause of 1.66 seconds between 2 bursts. A burst lasts about 16us.
This activity stops when DWCTRL register value is changed, which is working as intended as it stops sending synchronization frames.
When capturing, no activity happens on any of the 3 lines.
So I guess it means the PC addresses are never matched by DWT.
Following that hypothesis, I checked with GDB that the symbol addresses I extracted in python were correct and they were.
Then I put back the hard-coded addresses in the notebook and not only I saw some SWO activity while capturing but now I had 256 traces in the buffer. Still not working as the cell expects 510 trace packets I was onto something.
So I opened the firmware with IDA to look at the code at the addresses. And it seems that the input for the comparison is NOT the address of the functions but instead the PC address of BL XYcZ_addC and BL XYcZ_add inside the for-loop of function EccPoint_mult (which requires much more complex python code to compute than reading the ELF symbol table). Once I put these 2 addresses, the notebook successfully completed.
I’ll try to update this thread with the correct python snippet to extract these once I finished writing it.
Ah ha, I’m happy you’ve figured it out! Now that you say this I do remember seeing some similar behaviour around addresses like this. I never did find any Arm documentation around why this might happen or how to avoid it. I remember just trial and error! I should update the notebooks and documentation to make a note of this and save others from going through this.
On ARM, odd PC addresses indicate thumb mode. It seems DWT only triggers on even addresses. So I tried again by stripping the LSB of the symbol addresses and the notebook almost worked.
I say almost because I get 512 packets instead of the required 510. But extracting the required addresses is now trivial and doesn’t require to disassemble the firmware anymore so tweaking the notebook to work with these 2 extra packets should be a better fix It’s probably just a matter of stripping the extra 2 packets away
I tried for ~1 hour to get the code to work with the 256 bits being collected instead of 255, with no luck so far. Stripping the extra calls after the capture isn’t enough.
So here is the python code that automatically extracts the correct addresses for setting the DWT:
from pathlib import Path
import lief
from capstone import *
from capstone.arm import *
elf_path = Path(fw_path).with_suffix(".elf")
elf = lief.ELF.parse(elf_path)
uecc_addc = elf.get_function_address("XYcZ_addC")
uecc_add = elf.get_function_address("XYcZ_add")
uecc_point_mult = elf.get_symbol("EccPoint_mult")
print(f"XYcZ_addC() at 0x{uecc_addc:08x}")
print(f"XYcZ_add() at 0x{uecc_add:08x}")
print(f"EccPoint_mult() at 0x{uecc_point_mult.value:08x}")
# Search for instructions calling XYcZ_add() and XYcZ_addC().
# We should find 2 pairs of these.
calls = []
cs = Cs(CS_ARCH_ARM, CS_MODE_ARM)
if uecc_point_mult.value & 1 == 1:
cs.mode = CS_MODE_THUMB
func_code = bytes(elf.get_content_from_virtual_address(uecc_point_mult.value & 0xffff_fffe, size=uecc_point_mult.size))
for ins in cs.disasm(func_code, uecc_point_mult.value):
if ins.id == ARM_INS_BL:
if ins.op_str[1:] in (hex(uecc_add), hex(uecc_addc)):
calls.append(ins)
assert len(calls) == 4
addr0 = 0
addr1 = 0
# The pair of call we are looking for is the one with the smallest distance between the calls
if calls[1].address - calls[0].address < calls[3].address - calls[2].address:
addr0 = calls[0].address & 0xffff_fffe
addr1 = calls[1].address & 0xffff_fffe
else:
addr0 = calls[2].address & 0xffff_fffe
addr1 = calls[3].address & 0xffff_fffe
print(f"Setting addr0=0x{addr0:08x}, addr1=0x{addr1:08x}")
trace.set_isync_matches(addr0=addr0, addr1=addr1, match='both')
Using this code instead of the cell that sets hard-coded address should work on STM32F3 and K82F targets.
It obviously adds 2 dependencies:
pip install lief to parse ELF file and extract symbols/data
pip install capstone for disassembling. I tried to do without it for ARM and Thumb/Thumb2 encoding is too messy to do it manually
Likely the compiler did something different? That’s why we recommend people start with the provided pre-compiled firmware. I don’t want to have to update the attack for all the possible ways that the compiler could complicate things, the notebook is already pretty long and complicated .