Getting started with power analysis on Husky+313+312(Artix-7)

Goals

To get a minimal working example for collecting power traces from

Husky (capture/control) + CW313 (interposer) + CW312 (Artix-7 target).

The simplest example comes from looking at the Vivado project

chipwhisperer/hardware/victims/cw308_ufo_target/xc7a35/vivado/ss2_cw305_aes.xpr

(no other way to get at the design hierarchy AFAIK) along with the demo

chipwhisperer/jupyter/demos/PA_HW_CW305_1-Attacking_AES_on_an_FPGA.ipynb.

It’s a lot to digest for me, so below I walk through these to the best of my ability, with a couple bolded questions and a few more at the end. Any comments, suggestions, insights, or references would be appreciated. I’m looking forward to the day when I can hit enter and produce a pile of traces to analyze.

The demo

Target and scope parameters

Some of the parameters set include:

import chipwhisperer as cw
scope = cw.scope()
scope.adc.samples = 129
scope.adc.offset = 0
scope.adc.basic_mode = "rising_edge"
scope.trigger.triggers = "tio4"
scope.io.tio1 = "serial_rx"
scope.io.tio2 = "serial_tx"
scope.io.hs2 = "disabled"
# ...
if TARGET_PLATFORM == 'CW312T_A35':
    scope.gain.db = 45 # this is a good setting for the inductive shunt; if using another, adjust as needed
    scope.io.hs2 = 'clkgen'
    fpga_id = 'cw312t_a35'
    platform = 'ss2'
# ...
target = cw.target(scope, cw.targets.CW305, force=True, fpga_id=fpga_id, platform=platform)
# ...
if TARGET_PLATFORM == 'CW312T_A35':
    scope.clock.clkgen_freq = 7.37e6
    scope.io.hs2 = 'clkgen'
    if scope._is_husky:
        scope.clock.clkgen_src = 'system'
        scope.clock.adc_mul = 4
        scope.clock.reset_dcms()
    # ...

which sets some of the I/O ports for UART and clock, specifies the clock source and frequency (the Husky is driving the clock), and sets some port and parameters for triggering. The scope API seems well-documented at https://chipwhisperer.readthedocs.io. Questions about triggering, moving data to/from the FPGA, and the ss2 protocol will be considered later.

Programming a bitstream

In the demo, a bitstream is loaded via cw.target() which with the parameters above calls cw.targets.CW305._con() in CW305.py, a chunk of which is devoted to loading default bitstreams.

elif platform == 'ss2':
	# ...
    self.fpga = CW312T_XC7A35T(scope)
    # ...
    if bsfile is None:
        from chipwhisperer.hardware.firmware.xc7a35 import getsome
        if self.target_name == 'AES':
            bsfile = getsome(f"AES_{fpga_id}.bit")
        elif self.target_name == 'Cryptech ecdsa256-v1 pmul':
            bsfile = getsome(f"ECDSA256v1_pmul_{fpga_id}.bit")
        elif self.target_name == 'Pipelined AES':
            if version is None:
                version = 0
            bsfile = getsome(f"Pipelined_AES_{fpga_id}_half{version}.bit")
        else:
            raise ValueError('Unknown target!')

    self.fpga.program(bsfile, sck_speed=prog_speed)

There doesn’t seem to be a way to pass in bsfile; perhaps in **kwargs to cw.target, but the docstring says “rarely needed.” Assuming I do this or otherwise modify CW305.py, the self.fpga.program() call is to cw.hardware.naeusb.programmer_targetfpga.CW312T_XC7A35T.program(), which is the XilinxGeneric.program(). There are some notes in the docstring for the CW312T_XC7A35T class that might be annoying but I’ll ignore them for now.

Question: In summary, it seems I could load a bitstream with cw.target(scope, cw.targets.CW305, force=True, platform='ss2', bsfile='path/to/my.bit'). However, the default bitstreams are loaded using cw.hardware.firmware.xc7a35.getsome(). Do I need to use this as well? What does it do?

Capturing traces

cw.capture_trace() and the *.cwp format don’t seem very general at first glance, but can perhaps be modified or used as templates. For cw.capture_trace(), it seems a simple “data in/data out” would suffice, with the particulars left to the user and specific to the UUT, but the parameters are broken into key, plaintext, options for a static key, etc. This is reasonable for certain contexts (avoiding unnecessary key scheduling or key generation), but I assume this doesn’t matter much and I can load whatever I want and use the static option if desired; TBD.

In cw.capture_trace() there seems to be some distinction between key and plaintext, and the trace, response, and somewhat redundant plaintext/key are returned in some structure (cw.common.traces.Trace):

if key:
    target.set_key(key, ack=ack, always_send=always_send_key)
# ...
if plaintext:
    target.simpleserial_write('p', plaintext)
# ...
if len(wave) >= 1:
    return Trace(wave, plaintext, response, key)
# ...

The key and plaintext are both sent through CW305.simpleserial_write(), but there are differences in cmd

def simpleserial_write(self, cmd, data, end=None):
	# ...
    if cmd == 'p':
        self.loadInput(data)
        self.go()
    elif cmd == 'k':
        self.loadEncryptionKey(data)
	# ...

but both do an fpga_write() to some registers defined elsewhere… perhaps important details for later.

Question: Are there important differences between “key” and “plaintext” inputs besides static options for the key?

The UUT

Obviously I want to put my own designs on the FPGA and gather traces; that’s why I bought this stuff.

Necessary files

The Verilog source files for the demo are in a few places in two repos: chipwhisperer/hardware/victims/cw308_ufo_target/xc7a35/hdl/, chipwhisperer/hardware/victims/cw305_artixtarget/fpga/, and fpga-common/hdl/.

Constraints

cw312_ss2_aes.xdc is the constraint file associated with the demo above, although there are a couple others with a cw312 prefix. Clocks, I/O 1-4, HDR 1-10 (?), LEDs, some other stuff. Not sure if there’s anything I need to worry about here.

Wrapper, USB, FIFO, UART, UUT

The demo has a wrapper ss2_aes_wrapper which instantiates ss2 and cw305_top.

  • ss2 instantiates crc, uart_core, and fifo_sync. The FIFO is set at 256 bytes. This is all probably fine, to be treated as a black box, but who knows?

  • cw305_top instantiates cw305_usb_reg_fe, cw_305_reg_aes, clocks, and aes_core.

    • The aes_core looks like a black box (good) with triggering based on its busy signal.
    • clocks I’m not going to worry about, maybe managing different clock domains.
    • I assume cw305_reg_usb_fe is USB stuff I don’t have to worry about, but cw305_reg_aes interacts with cw305_reg_usb_fe and grabs some parameters from cw305_defines.v. I think maybe these are just holding/passing data from the UUT to the USB? I also don’t know how this is interacting with ss2 and UART.

Question: Some of this is confusing to me, but it looks like I’d have to write something like cw305_aes_reg for whatever I’m putting on the FPGA. This looks like the most important module to understand and modify to roll my own. Yes/no/maybeso?

Trigger, data, clock

  • ss2_aes_wrapper has UART rx/tx ports, clks going in and out, and the trigger io4.
  • cw305_top has USB ports, trigger io4, and some other stuff.
  • Inside cw305_top, busy_o from the AES core is assigned to the trigger io4.
  • The important AES signals (plaintext, key, ciphertext, busy, load) go through cw305_aes_reg.
  • I have no idea what a/the “block interface” is. Hopefully it doesn’t matter.

Question: I’m not sure I understand triggering. I assume there are two signals: one to start trace acquisition and another to stop it. It looks like the io4 is out from ss2_aes_wrapper, but the AES “load” signal disappears for me in cw305_aes_reg, probably going through cw305_usb_reg_fe. Do two signals need to be controlled, or does the scope.adc.samples parameter control how long acquisition runs, so one only needs to start the capture?

Questions

  • The basic question is “What’s required to roll my own?” I am trying my best to get going, but a well-documented minimal working example would really hit the spot. Obviously my ignorance of hardware and the fact that I didn’t build this myself is making this difficult.

  • Can someone describe the paths for data and trigger? A diagram between cw.capture_trace() and something like

    entity my_thingy is
        port(
            clk_i : in std_logic;
            -- start, e.g. high for one cycle
            start_i : in std_logic;
            -- input, e.g. plaintext, key, random
            input_i : in std_logic_vector(WIDTH_IN - 1 downto 0);
            -- output, e.g. ciphertext
            output_o : out std_logic_vector(WIDTH_OUT - 1 downto 0);
            -- done, e.g. output valid when high
            valid_o : out std_logic
        );
    end entity my_thingy;
    

    probably by way of a modified version of cw305_aes_reg? One answer is “look at the example above,” which I will continue to do, but it’s still lying half-eaten with the other half poorly digested.

  • The top-level communication in ss2_aes_wrapper and ss2 is UART, but the USB communication is buried in cw305_top. I’m not sure exactly what’s going on or how this works *shrug*.

  • Is modifying the above the best or easiest way to get started? If not, what endpoints do I need to understand?

The cw.target(..., bsfile='path/to/bitfile.bit') is the correct way to load this. I’ve updated the documentation for cw.target so that **kwargs points people to the target.con() method for their target class.

The important differences for this target is that different registers are used for plaintext and key. Also, target.set_key() called by capture_trace() tries to avoid sending the same key more than once by only sending it if it’s different from the last key.

I can’t really help too much with the FPGA questions, but hopefully https://github.com/newaetech/chipwhisperer/blob/develop/hardware/victims/cw308_ufo_target/xc7a35/README.md can answer some of that. My understanding is that all the ssv2* stuff is basically translating UART commands into register reads/writes, so you shouldn’t have to worry about any UART stuff.

You just need to start the capture. scope.adc.samples will control how long the trace is. O_start should be the load signal, triggered by a write to the GO register, and it gets passed to GOOGLE_VAULT_AES in cw305_top.v

Maybe ss2.xpr would be the be the best place to start? I’m not sure of your end goal, but that example should almost be a blank slate - it’s basically just some registers and what’s required to read/write to them.

Basically, there’s two different targets we have with the A35 FPGA - the older CW305 and the new CW312-Artix-7. The CW305 uses an 8-bit parallel interface, while the CW312 uses a serial interface. There’s some code reuse between the two, so you end up seeing references to both in the older CW305 stuff.

Alex