When we are performing unaligned stack accesses in the 32-64B window
we have to do a read-modify-write cycle. E.g. for reading 8 bytes
from address 17:
The load on line 4 is unnecessary, because tmp already contains data
from stack[20].
For write we can optimize both loads and writebacks away.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>