OSDN Git Service

x86/csum: Rewrite/optimize csum_partial()
authorEric Dumazet <edumazet@google.com>
Fri, 12 Nov 2021 16:19:50 +0000 (08:19 -0800)
committerBorislav Petkov <bp@suse.de>
Wed, 8 Dec 2021 10:26:09 +0000 (11:26 +0100)
commit3411506550b1f714a52b5db087666c08658d2698
treee96cd40dcbad1e8346aa181e51537bcab2ff2a41
parent0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1
x86/csum: Rewrite/optimize csum_partial()

With more NICs supporting CHECKSUM_COMPLETE, and IPv6 being widely
used csum_partial() is heavily used with small amount of bytes, and is
consuming many cycles.

IPv6 header size, for instance, is 40 bytes.

Another thing to consider is that NET_IP_ALIGN is 0 on x86, meaning
that network headers are not word-aligned, unless the driver forces
this.

This means that csum_partial() fetches one u16 to 'align the buffer',
then performs three u64 additions with carry in a loop, then a
remaining u32, then a remaining u16.

With this new version, it performs a loop only for the 64 bytes blocks,
then the remaining is bisected.

Testing on various CPUs, all of them show a big reduction in
csum_partial() cost (by 50 to 80 %)

Before:
4.16%  [kernel]       [k] csum_partial
After:
0.83%  [kernel]       [k] csum_partial

If run in a loop 1,000,000 times:

Before:
26,922,913      cycles                    # 3846130.429 GHz
80,302,961      instructions              #    2.98  insn per cycle
21,059,816      branches                  # 3008545142.857 M/sec
     2,896      branch-misses             #    0.01% of all branches
After:
17,960,709      cycles                    # 3592141.800 GHz
41,292,805      instructions              #    2.30  insn per cycle
11,058,119      branches                  # 2211623800.000 M/sec
     2,997      branch-misses             #    0.03% of all branches

 [ bp: Massage, merge in subsequent fixes into a single patch:
   - um compilation error due to missing load_unaligned_zeropad():
- Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20211118175239.1525650-1-eric.dumazet@gmail.com
   - Fix initial seed for odd buffers
- Reported-by: Noah Goldstein <goldstein.w.n@gmail.com>
Link: https://lkml.kernel.org/r/20211125141817.3541501-1-eric.dumazet@gmail.com
  ]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20211112161950.528886-1-eric.dumazet@gmail.com
arch/x86/lib/csum-partial_64.c