I revisited a CPA I performed on the ATSAMD21 and can’t figure out something fundamental. I was hoping for some guidance. Decryption using AES 128bits.

Basically, when I performed the CPA on the InvSubBytesAndXOR step I was getting very low PGE on the actual target. However, when I performed it on the InvMixColumns step, the PGE was very high and I was able to find the solution without knowing the key in advance.

What I can’t figure out is why would this be? In an inverse AES, the InvMixColumns step has the round 9’s key XOR’d with the block not round 10. I confirmed with my target I’m correctly deciphering the various invaes steps from the power trace. I also found code that I feel is likely the source they used.

All I can think is some sort of cache write back is causing this, but it still doesn’t make sense as that memory block in over written by the invmixcolumns step.

Any thoughts? Here is the code. I marked with “<----” where my CPAs took place. Take a look at the comments in invmixcolumns, for where the CPA worked.

Thank you in advance.

```
void InvCipher( unsigned char * block, unsigned char * expandedKey )
{
unsigned char round = ROUNDS-1;
int count;
expandedKey += BLOCKSIZE * ROUNDS;
XORBytes( block, expandedKey, 16 );
expandedKey -= BLOCKSIZE;
do {
InvShiftRows( block );
InvSubBytesAndXOR( block, expandedKey, 16 ); <-- Low PGE
expandedKey -= BLOCKSIZE;
InvMixColumns( block ); <-- Very High PGE, see InvMixColumns for detail
} while( --round );
InvShiftRows( block );
InvSubBytesAndXOR( block, expandedKey, 16 );
}
```

The combined InvSubBytes and Round key step

```
void InvSubBytesAndXOR( unsigned char * bytes, unsigned char * key, unsigned char count )
{
do {
// *bytes = sBoxInv[ *bytes ] ^ *key; // Inverse substitute every byte in state and add key.
*bytes = block2[ *bytes ] ^ *key; // Use block2 directly. Increases speed.
bytes++;
key++;
} while( --count );
}
```

InvMixColumns

```
void InvMixColumns( unsigned char * state )
{
<--- Align here and I get Key Round 10 bytes 0, 13, 10, 7 (InvShiftRows order)
InvMixColumn( state + 0*4 );
<--- Align here and I get Key Round 10 bytes 4, 1, 14, 11
InvMixColumn( state + 1*4 );
<--- Align here and I get Key Round 10 bytes 8, 5, 2, 15
InvMixColumn( state + 2*4 );
<--- Align here and I get Key Round 10 bytes 12, 9, 6 ,3
InvMixColumn( state + 3*4 );
}
void InvMixColumn( unsigned char * column )
{
unsigned char r0, r1, r2, r3;
r0 = column[1] ^ column[2] ^ column[3];
r1 = column[0] ^ column[2] ^ column[3];
r2 = column[0] ^ column[1] ^ column[3];
r3 = column[0] ^ column[1] ^ column[2];
column[0] = (column[0] << 1) ^ (column[0] & 0x80 ? BPOLY : 0);
column[1] = (column[1] << 1) ^ (column[1] & 0x80 ? BPOLY : 0);
column[2] = (column[2] << 1) ^ (column[2] & 0x80 ? BPOLY : 0);
column[3] = (column[3] << 1) ^ (column[3] & 0x80 ? BPOLY : 0);
r0 ^= column[0] ^ column[1];
r1 ^= column[1] ^ column[2];
r2 ^= column[2] ^ column[3];
r3 ^= column[0] ^ column[3];
column[0] = (column[0] << 1) ^ (column[0] & 0x80 ? BPOLY : 0);
column[1] = (column[1] << 1) ^ (column[1] & 0x80 ? BPOLY : 0);
column[2] = (column[2] << 1) ^ (column[2] & 0x80 ? BPOLY : 0);
column[3] = (column[3] << 1) ^ (column[3] & 0x80 ? BPOLY : 0);
r0 ^= column[0] ^ column[2];
r1 ^= column[1] ^ column[3];
r2 ^= column[0] ^ column[2];
r3 ^= column[1] ^ column[3];
column[0] = (column[0] << 1) ^ (column[0] & 0x80 ? BPOLY : 0);
column[1] = (column[1] << 1) ^ (column[1] & 0x80 ? BPOLY : 0);
column[2] = (column[2] << 1) ^ (column[2] & 0x80 ? BPOLY : 0);
column[3] = (column[3] << 1) ^ (column[3] & 0x80 ? BPOLY : 0);
column[0] ^= column[1] ^ column[2] ^ column[3];
r0 ^= column[0];
r1 ^= column[0];
r2 ^= column[0];
r3 ^= column[0];
column[0] = r0;
column[1] = r1;
column[2] = r2;
column[3] = r3;
}
```