Reasonable first SCA setup for an unknown SoC?

Hello everyone,

For learning purposes I’ve been planning to run side-channel analysis, and possibly glitching, on an unknown Bluetooth + USB SoC in a consumer device from 2020. I don’t have access to a datasheet since the SoC is unknown and likely behind an NDA, so I’ve been trying to investigate it on my own before pulling the trigger on a Husky (or HuskyPlus).

I’m completely new to SCA, and still relatively new to embedded/electrical engineering in general, but I’ve been reading and preparing for a few months now.

What I’ve found so far:

  • The SoC is a 140+ ball BGA.

  • It seems to have an internal buck generating about 1.15 V. I have found no other rail (or decoupling cap) which is in the range of 0.9-1.2v.

  • That ~1.15 V rail leaves the SoC (in a very noisy state), goes through an inductor, and then comes back into the SoC through 4 pins with nearby local decoupling.

  • My guess is that this is some kind of main low-voltage internal rail, likely digital logic and maybe more. This is my main (and only) candidate for SCA work.

  • I haven’t found any exposed clock signal for synchronous sampling, but it does have a crystal.

  • Both the crystal and the 1.15v rail are running even when the device is “sleeping” (not explicitly powered on).

I designed a small PCB to act as a frontend for this target:

  • 20-pin connector for ChipWhisperer

  • SMA connectors for ShuntLow, ShuntHigh, and Crowbar

  • XTAL_OUT tap circuit that biases and digitizes the crystal output using a Schmitt trigger, then feeds it to CLOCK_IN

  • Low-noise LDO (LT3045EMSE) to supply the ~1.15 V rail externally through a shunt, so I can bypass the original buck output for a cleaner signal.

What I’ve tested so far:

  • I removed the inductor and supplied that rail externally. The device still boots and works normally.

  • The XTAL_OUT tap works, although with my logic analyzer at 100 MS/s I only see what looks like a ~20 MHz clock with ugly duty cycle (varying between 80/20 and 60/40). I assume that is just because my sampling rate is too low to show the real waveform properly.

  • I inserted a 1.0 Ω shunt in series with the injected rail.

    • I ended up with 1.164 V on the supply side and 1.122 V on the load side during normal operation, which suggests 44 mA average current in that state.

    • The device still boots and behaves correctly with that shunt in place.

I don’t own an oscilloscope (yet), so the only analog measurements I have are from a Saleae Logic, which is obviously very limited here. But even with that, I can already see:

  • different current/activity depending on mode (USB vs Bluetooth)

  • differences between Bluetooth paired vs not paired

  • repeating pattern where it draws more current at almost exactly every 1ms - likely some repeating task?

So that makes me think I’m at least looking at a rail with real workload-dependent activity, not just some static support rail.

I’m trying to build as much confidence as possible before buying a Husky. I know starting with an unknown chip means I’m skipping quite a few steps, and I do plan to practice on known targets first, but this is my long-term goal and the main reason I started learning this.

So given all that:

  • Does this sound like a reasonable setup?

  • Does the rail work I’ve done so far sound like meaningful evidence that I’m on a useful rail for SCA?

  • An imperfect duty-cycle should be fine for the purpose of synchronous sampling, right?

  • Are there more things I could verify before pulling the trigger on a Husky?

  • Any obvious red flags in this setup?

Sorry for the long post, and thanks in advance for any feedback. I’ve mostly been learning this on my own from books (The Hardware Hacking Handbook is great!) and by bouncing ideas off LLMs, so I’d really appreciate a sanity check from people with real hands-on SCA experience.

Wow, for someone new at SCA it looks like you’re off to a very solid start!

Looks like you’ve got a very solid understanding of the basics.

This does sounds like a promising setup.

Regarding the imperfect duty cycle: I think that should be fine, but obviously it would be good to confirm the clock speed with a faster scope. The duty cycle should not be a problem for ChipWhisperers: we don’t use the clock directly, we pass it through a PLL. Knowing the clock would also help decide on a Lite/Husky/Plus since they each have different maximum sampling clocks.

And of course regardless of the XTAL frequency, the SoC is likely multiplying that and running at a higher clock. Again it would be good to borrow a fast scope to get a better idea of the internal clock rate.

Thanks a lot for the feedback, it’s nice to hear that I haven’t been fooled by the LLMs!

I was hoping that the Husky’s LA with 300M/s sampling could give me a better indication of the real (digitized) crystal frequency, and from there figure out the core clock by trial and error by seeing which PLL multiplier looks to give the best signal when I look for correlations to known data sent in through USB (using a Cynthion with an FPGA-based trigger to trigger the Husky). However, that assumes that the SoC’s PLL multiplier for its core clock is an integer (which I just found out is not a guarantee), but the Chipwhisperers only do integer multipliers anyways, correct?

Assuming the clock is in the 80-160MHz range (which seems realistic since it’s a battery-powered device, likely running Cortex M4 or similar), what bandwidth/sampling specs do you think would be needed from an oscilloscope to find the clock frequency? And would that be done on the power rail or with an H-probe? Sadly the makerspaces around me do not seem to have very high-speed oscilloscopes.

Also, I realized that the shunt I’m using is placed quite far away from the SoC’s leads. It’s located on my PCB, and after it the shunt-low net goes through a ~7-8mm long wire from my PCB to the inductor’s pad (which itself is maybe 10mm from the leads). Given that I remove as much of the original local decoupling as possible, how important is it that the shunt is placed as close as possible to the SoC’s leads?

Image of shunt resistor (R1), the (black) wire and the inductor’s pad. Please ignore the messy soldering and that the wire is currently much longer than it needs to be.

Correct.

Hard to answer in the general sense, but if you sample sufficiently fast, then you get very clear periodic pulses from each clock edge transition.

Again, hard to answer in the general sense! There is no hard and fast rule. See if you get good results from what you have.

Alright, I ordered a Husky now, will give it a try before I iterate. Thanks again!

An update on my strategy to figure out the clock rate, in case anyone finds this later and has the same issue:

I set up what I thought would be an OK setup to capture the device processing a USB packet, and then did a coarse sweep with ADC phase values between 0-100 (step size 10), and every relevant adc_mul. I then recorded 300 traces for each value (and my computer somehow ended up needing 58gb memory total - swap saved me because I only have 32!), and looked at how well the traces correlated with each other.

It seems to have worked well! This is the result, which makes me confident that it’s running at an adc_mul of 6 (which means 120MHz in my case), and that the adc_phase should be around ~50.

1 Like

I made it one step forward and two steps back, and so I’m back with my findings and another few questions.

After my previous finding, I assumed that the 6x multiplier was correct. To my knowledge this could still be the case, but I’m becoming much less confident.

To start off with, I have made it so that my setup triggers on a certain USB package. After the capture, it also goes through all of the USB packets which were seen during it to add metadata about when certain USB events happened, so the final trace plot looks like this after applying a rolling average:

The OUT packet is the start of the transaction which contains the data. I’m still not sure if the AES operations happens before the ACK marker or after it, but it’s definitely before the GET marker because that’s when the result of the operation is fetched. There are definitely consistent patterns near the events. For example, the ~20µs after the OUT packet always looks quite similar, but I can’t see any difference between traces where the operation is rejected before AES and ones where it fails due to invalid AES-CMAC, so I think that is likely just the USB operations.

What I have found is that when I zoom in on the raw 120MS/s traces, I end up seeing this pattern at all times (except for maybe 30% of the captures where they end up look completely different, more on that below).

While not as obvious in this trace as it sometimes is, you can see that every 3rd cycle it’s gaining height, and then dropping over the next 2, so there is likely something happening in the 40MHz spectrum as well. There are no other obvious pattern or distinctive features in the “flat” sections of the trace.

I thought this might just be noise from my measuring setup since I realized too late that I didn’t do a great job with the high-frequency filtering before the shunt, and the 20MHz digital clock on the same board with potentially fast edges (dampened by a 33Ω after a Nexperia 74LVC1G17GW) is on the same board, just next to the 3V3_IN on the 20-pin connector. So I thought I’d see if a differential measurement over the shunt would show better results than a single-ended measurements, but no, it didn’t.

So, back to the 30% of traces which look different, this is what they look like close-up at 240MS/s (yes, I bought a HuskyPro because I wanted to be able to oversample and I’m a sucker for metal cases):

I first thought that this might be some artifact because I can’t think of a good reason for it. In these graphs it’s very hard to see any kind of correlations between USB operations and the waveform. They’re so noisy that they require a very aggressive 1024-sample rolling average window to be visualized ok:

So with all this said (sorry for the long post again!), I’m out of ideas. Should I try to improve the capture setup by filtering out the high frequency noise before the LDO and moving the shunt much closer to the device, or is there something else I could try? I’m even starting to doubt this is the correct rail, but then I probably wouldn’t be able to see the USB operations happening so clearly? Should I give up and get a H-probe?

Super thankful for replies! I understand that this forum might not be the right place for these types of in-depth project-specific questions, but I don’t really know where else to ask them.

I don’t have much to offer unfortunately, just a few thoughts:

  • rolling average can be useful to help visualize things, but be careful to not focus too much on having “nice traces”
  • averaging traces can be really useful to improve SNR (but obviously requires synchronized traces, and the ability to make the target do the exact same thing more than once)
  • an H-probe can be very effective, but results will be highly dependent on placement, and SNR can easily be worse unless you have a probe that gives you very localized measurements (and very good placement)

Thanks! No, I understand if there isn’t a lot - there’s too many details, and I’ve skimmed on a lot of them as well (the whole target-interface PCB and schematic to start with). Any hint is helpful, I appreciate all I can get. :slight_smile:

I think I’m starting to get a little bit more clarity on the issues:

  • After spending almost a whole day chasing the “artifacty” traces and what might be causing them, and finally becoming quite confident that they must be due to noise or interferences (or maybe some bad reference somewhere?), they suddenly completely stopped happening without any (intentional) change to the setup and I can’t replicate them anymore.
  • The 20/40MHz pattern might still be because of noise from the crystal tap switching, and my lack of HF filter in the LDO/shunt path.
    • I tried “hooking into” the CW313’s LP filter through the JP6 header to see if I can get it to disappear - which is probably not what is was intended for, but seems like it should work and it can’t end up be worse than the original unfiltered rail. However, my device seems to end up in some only partially-functioning state when supplied through that power rail. Or it could just be me having no idea what I’m doing.

Regarding the averaging: I’ve tried it but I had problems finding good alignment between the traces, hence starting to look at single traces instead to figure out what I could improve there. I think the issues were partly due to background activities happening in the SoC (I’ll need to be more selective with what traces I end up using), and partly due to noise and low SNR.

So, I know my next steps for now. I’ll work on a rev2 of my “target-interface board” PCB, so that it has a much shorter distance from the shunt to the SoC’s pads, has better HF filtering pre-shunt/LDO and keeps the 20MHz clock signal away from the power rails and ground returns I care about.

Regarding the H-probe (CW505): How important is it to use the LNA (CW502) with it if one has a Husky/HuskyPlus? Just getting the CW502 alone is quite cheap compared to the package with CW501, CW502, CW503, CW505.

We’d recommend the LNA (and power supply), but you can try without and see what you get. There’s no single answer to these questions because it depends on your whole setup. Crank the gain and see whether that gets you a decent portion of Husky’s ADC’s dynamic range.

1 Like