Questions regarding Arm Trace frames

Hello,

I am trying to convert the the frames captured by TraceWhisperer into another format, but I can’t figure out at which layer of encoding they are sitting.

More precisely, I am running the notebook pc_sample_annotate.ipynb on K82F target and when I print the first few frames, I see they have varying length (either 6 or 8 bytes), as shown below:

03 17 ba 3e 00 00 
03 17 18 3e 00 00 
03 17 90 00 3c 00 01 00 
03 17 7c 3d 00 00 
03 17 92 3d 00 03 01 00 

According to Arm CoreSight Architecture Specification v3.0 documentation, PC sampling packets are using header value 0x15 (1-byte payload) or 0x17 (4-byte payload). I don’t know what the extra byte (0x03) means nor what to do with the extra bytes at the end. When I try to strip them away, the values can’t match PC values.

When I run the capture cell but plugging an OrbTrace mini probe instead of Husky, I clearly only see PC sample packets, properly encoded as 0x17 followed by only 4 bytes (containing the PC value) before seeing the next 0x17 header:

17b63e0000
176c3d0000
17883d0000
17763d0000
17a83d0000

And I could validate the PC values here are correct.

Leaving me to the question: what encoding layer TraceWhisper stops at?

When running the method trace.write_raw_capture(frames, ‘raw.bin’) the code also prefixes the whole file with 8 long synchronization packets with value 0x7fffffff but the ITM documentation states that a synchronization packet at this layer is:

A Synchronization packet is at least forty-seven 0 bits followed by single 1 bit.

That’s almost a bit-wise negation of the value I see.

The frames don’t match the TPIU encoding either because the frames should be 16 bytes long.

Any help?

TraceWhisperer doesn’t do any decoding of the trace data. It captures the raw trace data (which is not nice format at all IMO!). In the notebook you point to, if you keep going you’ll see that the raw trace data is then fed to orbuculum for decoding.

This is explained in the notebooks and also here.

Except that, as correctly stated in the notebook, starting v2.2.0, orbuculum stopped parsingthis format in favor of their more robust and bandwidth efficient OFLOW protocol. Problem is that ETMv4 is only supported with orbuculum v2.2 so I need to stick to it for my experiments and therefore am working to convert traces out of chipwhisperer into OFLOW.

The format you point to is the TPIU formatter but then frames are expected to be of a fixed sized of 16 bytes. So I guess they’re either truncated or split in a special way?

TraceWhisperer captures only raw timestamped trace bytes (assuming trace.capture.raw = 'True’; otherwise it captures timestamped rule match events). This is what is returned by trace.read_capture_data(). The format of this raw data is defined here. There is no parsing of the trace data whatsoever at this level

In our notebooks this raw data is fed to trace.get_raw_trace_packets(). As per tthe API, here there is an option to suppress sync frames and use those sync frames as markers that delimit “pseudo-frames” - which are the 6 / 8 byte lines that you report seeing. Again this is not proper parsing; it’s very basic. It works well with the versions of Orbuculum that are mentioned in the notebooks.

Thanks for the pointers, I’ll look at that and see how far I can go from here.

In any case, hats off for the TraceWhisperer implementation. That’s a lot of documentation to digest!

Thank you! A lot of work went into it on the hope that it would be an incredibly useful addition to ChipWhisperer. I think that its complexity (and very awkward formatting!) is what holds it back from being more commonly used. But it can be a really useful tool nevertheless.

I managed to write a TPIU frame parser in python which now allows the notebook and TraceWhisperer to work with orbuculum v2.2.0 :slight_smile:

I even used their python package pyorb (pip install python-orbuculum) to parse the ITM messages directly with liborb without having to call an external binary and then parse its output.

I may have a decent hypothesis about why the pseudo frames from TraceWhisperer weren’t summing up to a multiple of 16 bytes (which should be the case as TPIU frames are always 16 bytes long): the notebook sets the trace capture to only happen while trigger signal is held high, which could truncate the last TPIU frame because there’s a bit of buffering if the frame isn’t complete yet.

I’ll clean up my code, add unit testing to make sure it works with and without the synchronization bytes (long and short), and add documentation before making a PR. I won’t update the notebook unless you want me to, but I can provide a rough “diff” here about how to adapt them. It would add 1 dependency (pyorb) but running the orbuculum binaries would require to write OFLOW files which then also adds a dependency (cobs this time to encode the ITM frames to OFLOW) so I think pyorb is the best long term solution as it creates proper classes per packet, allowing better support rather than parsing text output.

ooh nice, that would be awesome. I will definitely take a look.

I’ve created a pull request: Add TPIU pure-python parser by jmichelp · Pull Request #558 · newaetech/chipwhisperer · GitHub

I’m open to comments to improve it.

A bigger rework might be needed though because the current TraceWhisperer implementation assumes an ETMv3 is present, which may be true for Cortex-M3/M4 chips but newer Cortex-M33 for example use ETMv4. While most of the register maps are backward compatible (TPIU, ITM, DWT), ETM is very different and it’s not easy to map features from ETMv3 with ETMv4 registers. I’m almost done getting the pc_sample_annotate notebook to run correctly.

The main blocking point is the getreg() and setreg() commands over SimpleSerial because they only take a register ID on a uint8_t

I am thinking about changing that to something much more flexible: the uint8_t would be used for the peripheral ID (0: DWT, 1: ETM, 2: TPIU, 3: ITM) and a uint16_t would represent the offset within the peripheral, as each of them fit in 0x10000 bytes. The end of the memory space on each peripheral contains IDs which allow to identify if the peripheral is present or not, which version it is, etc. The python code could read the PID/CID registers then determine which SVD file to load to create all the definitions.

Unless you have a better idea. But without changes, at least TraceWhisperer.set_isync_matches() function would be broken on more modern ARM chips using ETMv4

That seems a good solution and I wish I had done it that way originally!

But I am reluctant to have changes that aren’t backwards-compatible, because I would prefer to avoid recompiling and updating the sca205 notebook series.

Maybe the best way, as ugly as it is, would be to keep the current commands, and add two new commands that are for writing/reading arbitrary memory locations?

That deals with the firmware side; the other half of the problem is how to support this on the Python side in TraceWhisperer.py. I don’t have an answer here either and Arm trace is complex enough that we have no interest in supporting all past/present/future versions of ETM and other debug components… however if there is some not too painful way to make it easier for users to adapt TraceWhisperer to different ETM implementation – I don’t know if that’s possible.

I need to think about it some more…