How to choose datasets for leakage assessment?

zyj123 · December 30, 2023, 9:17pm

I’ve gained insights from the dataset in “Test Vector Leakage Assessment (TVLA) Derived Test Requirements (DTR) with AES.” However, in other papers, such as those discussing chi-square tests and DL-LA, there’s mention of a “fixed vs. fixed” dataset type. Does this refer to fixed plaintext or fixed key? I’m having some difficulty understanding and don’t know which type of dataset to choose.
Can anyone explain this to me? Thanks a lot!

jpthibault · January 2, 2024, 2:30pm

I don’t know which paper you’re referring to; all the ones I’ve come across, if they are proper academic papers, will properly define what is fixed in the TVLA test.

TurangaLeela · January 2, 2024, 5:49pm

My understanding of fixed-vs-fixed is that both key and plaintext are fixed for both sets, but the key and plaintext differ between these two sets. This is suggested by some papers to require less traces to detect leakage compared to fixed-vs-randon, unless you’re unlucky to pick the inputs for the two sets that lead to very similar leakage behaviour.

zyj123 · January 3, 2024, 6:49am

Thank you for your reply, This paper is the《DL-LA: Deep Learning Leakage Assessment
A modern roadmap for SCA evaluations》.

zyj123 · January 3, 2024, 7:04am

Thank you，but I still don’t understand. Is there any paper that clearly defines this kind of dataset? For example，for AES-128, the first fixed-key is 0x0000 0000 0000 0000 0000 0000 0000 0000, the second fixed-key is 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF, and the plaintext corresponding to the two keys is randomly generated but the same. Can this example be called the fixed-vs-fixed dataset？

TurangaLeela · January 6, 2024, 9:51am

Yes, I consider this to be an example of fixed-vs-fixed dataset.
As for clear definition of datasets - there is no instruction how exactly to set these input data which will work ‘optimally’ for all cases (so to minimize the number of traces to detect leakage for different implementations or across different platforms). You just have two datasets that somehow differ and you can use different approches how to construct these sets - fixed-vs-fixed, fixed-vs-random, semifixed-vs-random… and then you decide which data you fix or randomize (plaintext, key), and if you fix some data then to what value. There is no general answer here. You may have a look at this doc that gives some examples https://www.rambus.com/wp-content/uploads/2015/08/TVLA-DTR-with-AES.pdf

zyj123 · January 7, 2024, 12:07pm

I get it. Thank you very much!