Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...

[mirror_ubuntu-zesty-kernel.git] / Documentation / unaligned-memory-access.txt
diff --git a/Documentation/unaligned-memory-access.txt b/Documentation/unaligned-memory-access.txt

index 6223eace3c09f5492901dbc0d898981ae88a6021..a445da098bc6e5aa733cd55ca2ee8b4a5f04dc2c 100644 (file)
--- a/Documentation/unaligned-memory-access.txt
+++ b/Documentation/unaligned-memory-access.txt
@@ -57,7 +57,7 @@ here; a summary of the common scenarios is presented below:
     unaligned access to be corrected.
   - Some architectures are not capable of unaligned memory access, but will
     silently perform a different memory access to the one that was requested,
-   resulting a a subtle code bug that is hard to detect!
+   resulting in a subtle code bug that is hard to detect!
  
  It should be obvious from the above that if your code causes unaligned
  memory accesses to happen, your code will not work correctly on certain
@@ -137,24 +137,34 @@ Code that causes unaligned access
  =================================
  
  With the above in mind, let's move onto a real life example of a function
-that can cause an unaligned memory access. The following function adapted
+that can cause an unaligned memory access. The following function taken
  from include/linux/etherdevice.h is an optimized routine to compare two
  ethernet MAC addresses for equality.
  
-unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2)
+bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
  {
-       const u16 *a = (const u16 *) addr1;
-       const u16 *b = (const u16 *) addr2;
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+       u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
+                  ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
+
+       return fold == 0;
+#else
+       const u16 *a = (const u16 *)addr1;
+       const u16 *b = (const u16 *)addr2;
         return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;
+#endif
  }
  
-In the above function, the reference to a[0] causes 2 bytes (16 bits) to
-be read from memory starting at address addr1. Think about what would happen
-if addr1 was an odd address such as 0x10003. (Hint: it'd be an unaligned
-access.)
+In the above function, when the hardware has efficient unaligned access
+capability, there is no issue with this code.  But when the hardware isn't
+able to access memory on arbitrary boundaries, the reference to a[0] causes
+2 bytes (16 bits) to be read from memory starting at address addr1.
+
+Think about what would happen if addr1 was an odd address such as 0x10003.
+(Hint: it'd be an unaligned access.)
  
  Despite the potential unaligned access problems with the above function, it
-is included in the kernel anyway but is understood to only work on
+is included in the kernel anyway but is understood to only work normally on
  16-bit-aligned addresses. It is up to the caller to ensure this alignment or
  not use this function at all. This alignment-unsafe function is still useful
  as it is a decent optimization for the cases when you can ensure alignment,
@@ -209,7 +219,7 @@ memory and you wish to avoid unaligned access, its usage is as follows:
  
         u32 value = get_unaligned((u32 *) data);
  
-These macros work work for memory accesses of any length (not just 32 bits as
+These macros work for memory accesses of any length (not just 32 bits as
  in the examples above). Be aware that when compared to standard access of
  aligned memory, using these macros to access unaligned memory can be costly in
  terms of performance.
@@ -218,9 +228,35 @@ If use of such macros is not convenient, another option is to use memcpy(),
  where the source or destination (or both) are of type u8* or unsigned char*.
  Due to the byte-wise nature of this operation, unaligned accesses are avoided.
  
+
+Alignment vs. Networking
+========================
+
+On architectures that require aligned loads, networking requires that the IP
+header is aligned on a four-byte boundary to optimise the IP stack. For
+regular ethernet hardware, the constant NET_IP_ALIGN is used. On most
+architectures this constant has the value 2 because the normal ethernet
+header is 14 bytes long, so in order to get proper alignment one needs to
+DMA to an address which can be expressed as 4*n + 2. One notable exception
+here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned
+addresses can be very expensive and dwarf the cost of unaligned loads.
+
+For some ethernet hardware that cannot DMA to unaligned addresses like
+4*n+2 or non-ethernet hardware, this can be a problem, and it is then
+required to copy the incoming frame into an aligned buffer. Because this is
+unnecessary on architectures that can do unaligned accesses, the code can be
+made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:
+
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+       skb = original skb
+#else
+       skb = copy skb
+#endif
+
  --
-Author: Daniel Drake <dsd@gentoo.org>
+Authors: Daniel Drake <dsd@gentoo.org>,
+         Johannes Berg <johannes@sipsolutions.net>
  With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
-Johannes Berg, Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock,
-Uli Kunitz, Vadim Lobanov
+Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
+Vadim Lobanov