\r
// Align DST to 16 byte alignment so that we don't cross cache line\r
// boundaries on both loads and stores. There are at least 96 bytes\r
- // to copy, so copy 16 bytes unaligned and then align. The loop\r
+ // to copy, so copy 16 bytes unaligned and then align. The loop\r
// copies 64 bytes per iteration and prefetches one iteration ahead.\r
\r
.p2align 4\r
subs count, count, 64\r
b.hi 1b\r
\r
- // Write the last full set of 64 bytes. The remainder is at most 64\r
+ // Write the last full set of 64 bytes. The remainder is at most 64\r
// bytes, so it is safe to always copy 64 bytes from the end even if\r
// there is just 1 byte left.\r
2:\r