Hi Yuval,
The reason that hamming weight is used instead of hamming distance is because microcontrollers typically reset the state of the bus lines to a middle value in between data updates to lower average power consumption, meaning you’re finding the hamming distance between the SBox output and 0 (which is just the hamming weight).
The full hamming distance model is a lot more useful when attacking hardware AES, where this isn’t the case. This makes the attack harder, since you need to also consider what the data lines were both before an operation and after.
Alex