Sorting and doing operations on very large trace datasets


I’m starting to get into memory issues when collecting large number of traces and doing subsequent operations on them (a few hundreds of thousands of traces). For example, in the archived “PA_Profiling_1-Template_Attacks_HW_Assumption” tutorial, the sorting algorithm is written as

# Make 9 blank lists - one for each Hamming weight
tempTracesHW = [[] for _ in range(9)]

# Fill them up
for i in range(len(project_template.traces)):
    HW = tempHW[i]

# Switch to numpy arrays
tempTracesHW = [np.array(tempTracesHW[HW]) for HW in range(9)]

Now this works fine for a few thousand traces when doing the attack on the HW assumption, but I get memory allocation issues when collected large number of traces for template attacks modelling each AES sub-byte directly (I’m currently trying out 300 000 traces). So this is just an example of when I’m in need for sorting many traces, but I guess it’s a general issue. So, how should I go about sorting very large datasets like this? Also, up until now, I’ve been saving and loading the traces as single variables, and doing operations on them as single variables. This doesn’t seem very sustainable in the long run with very large trace datasets. Any suggestion how to take this a step further? (I’m not a computer science guy, and I’ve never had to deal with huge datasets before).


Yeah, that code definitely isn’t optimized. I’d recommend preallocating the trace memory and writing your traces into that memory instead of storing things in chipwhisperer projects (np.zeros() is the function you want). I think if you put everything in numpy arrays, you can then do something like:

temp_traces_HW = []
for hw in range(9):
    temp_traces_HW.append(trace_array[temp_hw == hw])

I haven’t tested that though, so you might need to do some modifications to it.


Hi Alex,

Thank you! It seems to be working, at least I don’t get any memory allocation problems anymore. Just needed a slight modification to your code since I’m working with numpy arrays, so the code looks like this:

    labels = template_keys[:,subkey_to_find]
    template_sorted_traces = []
    # Sort traces according to label
    for i in range(n_classes):
        template_sorted_traces = np.append(template_sorted_traces, template_traces[labels == i])

Here labels is just the column of AES sub-keys for all the plaintexts used during the profiling stage at the sub-key position of interest, and n_classes is in this case 256 (doing template attacks directly on the sub-keys as recommended in the CW tutorial in order to be able to get the correct sub-key with just one attack trace). Now, after this I can of course reshape the template_sorted_traces array, but that brings me to another Python-specific question: Since I later in the code am doing operations on groups of traces associated with the specific sub-key values, how would I do this elegantly? (e.g. taking the trace sample averages for each sub-key value, or other group operations of interest) I could of course create a 3D array as before, but is there a better way? (the labels array of sub-key values would of course also be sorted to align with the sorted traces).

An additional way to reduce the memory usage could be to scale and convert the original float64 trace values to int16 (since the CW-Lite ADC is 10 bits). Furthermore, I’m selecting the points of interest based on principal component analysis, so I’m just wondering if anyone has had any instability or rounding issues when doing the associated singular value decomposition with int16 instead of float64 representations.

Also, I’m not using the CW project structure for this, I’m storing the collected traces, plaintexts, keys, and ciphertexts in numpy arrays and saving these in an h5 file format. In that connection, I’m wondering if that’s a good idea or not for very big data files, cause that means I’ll be loading a huge variable into Python for processing. Does anyone have any experience with using e.g. SQLite or similar and processing the data trace by trace, for example? (or would this be very slow?) I have no experience with this kind of processing, so I’m all ears to ideas of how to treat potentially huge datasets efficiently down the road.