git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log

perf/x86/64: Simplify regs_user->abi setting code in get_regs_user()

user_64bit_mode(regs) basically checks regs->cs to point to a
64-bit segment. This check used to be unreliable here because
regs->cs was not always correct in syscalls.

Now regs->cs is always correct: in syscalls, in interrupts, in
exceptions. No need to emply heuristics here.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user()

Yes, it is true that cx contains return address.
It's not clear why we trash it.
Stop doing that.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user()

After recent changes to syscall entry points,
user_regs->{cs,ss,sp} are always correct. (They used to be
undefined while in syscalls).

We can report them reliably, without guessing.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/32: Tidy up JNZ instructions after TESTs

After TESTs, use logically correct JNZ mnemonic instead of JNE.

This doesn't change code:

  md5:
   c3005b39a11fe582b7df7908561ad4ee  entry_32.o.before.asm
   c3005b39a11fe582b7df7908561ad4ee  entry_32.o.after.asm

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428689620-21881-1-git-send-email-dvlasenk@redhat.com
[ Added object file comparison. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Reduce padding in execve stubs

execve stubs are 7 bytes only. Padding them to 16 bytes is a
waste.

   text    data     bss     dec     hex filename
  12594       0       0   12594    3132 entry_64.o.before
  12530       0       0   12530    30f2 entry_64.o

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork

It used to be used to check for _TIF_IA32, but the check has
been removed.

Remove GET_THREAD_INFO() too.

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-7-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Simplify jumps in ret_from_fork

Replace
        test
        jz  1f
        jmp label
    1:

with
        test
        jnz label

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-6-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Remove a redundant jump

Jumping to the very next instruction is not very useful:

jmp label
label:

Removing the jump.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Optimize [v]fork/clone stubs

Replace "call func; ret" with "jmp func".

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too

The change which affected how execve clears EXTRA_REGS missed
32-bit execve syscalls.

Fix this by using 64-bit execve stub epilogue for them too.

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat()

This is a preparatory patch for moving stub32_execve[at]() to this
file. It makes sense to have all execve stubs in one place, so
that they can reuse code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Use common code for rt_sigreturn() epilogue

Similarly to stub_execve, we can reuse the epilogue in
stub_rt_sigreturn() and stub_x32_rt_sigreturn().

Add a comment explaining why we can't eliminage SAVE_EXTRA_REGS
here.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Add forgotten CFI annotation

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428424967-14460-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout

Interrupt entry points are handled with the following code,
each 32-byte code block contains seven entry points:

...
[push][jump 22] // 4 bytes
[push][jump 18] // 4 bytes
[push][jump 14] // 4 bytes
[push][jump 10] // 4 bytes
[push][jump  6] // 4 bytes
[push][jump  2] // 4 bytes
[push][jump common_interrupt][padding] // 8 bytes

[push][jump]
[push][jump]
[push][jump]
[push][jump]
[push][jump]
[push][jump]
[push][jump common_interrupt][padding]

[padding_2]
common_interrupt:

And there is a table which holds pointers to every entry point,
IOW: to every push.

In cold cache, two jumps are still costlier than one, even
though we get the benefit of them residing in the same
cacheline.

This change replaces short jumps with near ones to
'common_interrupt', and pads every push+jump pair to 8 bytes. This
way, each interrupt takes only one jump.

This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
dispatch table with ".align 8" - we do not need anything
stronger than that.

The table of entry addresses (the interrupt[] array) is no
longer necessary, the address of entries can be easily
calculated as (irq_entries_start + i*8).

   text    data     bss     dec     hex filename
  12546       0       0   12546    3102 entry_64.o.before
  11626       0       0   11626    2d6a entry_64.o

The size decrease is because 1656 bytes of .init.rodata are
gone. That's initdata, though. The resident size does go up a
bit.

Run-tested (32 and 64 bits).

Acked-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Move opportunistic sysret code to syscall code path

This change does two things:

Copy-pastes "retint_swapgs:" code into syscall handling code,
the copy is under "syscall_return:" label. The code is unchanged
apart from some label renames.

Removes "opportunistic sysret" code from "retint_swapgs:" code
block, since now it won't be reached by syscall return. This in
fact removes most of the code in question.

   text    data     bss     dec     hex filename
  12530       0       0   12530    30f2 entry_64.o.before
  12562       0       0   12562    3112 entry_64.o

Run-tested.

Acked-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'v4.0-rc7' into x86/asm, to resolve conflicts

Conflicts:
arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86, selftests: Add sigreturn selftest

This is my sigreturn test, added mostly unchanged from its old
home. It exercises the sigreturn(2) syscall, specifically
focusing on its interactions with various IRET corner cases.

It tests for correct behavior in several areas that were
historically dangerously buggy. For example, it exercises espfix
on kernels of both bitnesses under various conditions, and it
contains testcases for several now-fixed bugs in IRET error
handling.

If you run it on older kernels without the fixes, your system will
crash. It probably won't eat your data in the process.

There is no released kernel on which the sigreturn_64 test will
pass, but it passes on tip:x86/asm.

I plan to switch to lib.mk for Linux 4.2.

I'm not using the ksft_ helpers at all yet. I can do that later.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/89d10b76b92c7202d8123654dc8d36701c017b3d.1428386971.git.luto@kernel.org
[ Fixed empty format string GCC build warning in trivial_32bit_program.c ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) In TCP, don't register an FRTO for cumulatively ACK'd data that was
    previously SACK'd, from Neal Cardwell.

2) Need to hold RNL mutex in ipv4 multicast code namespace cleanup,
    from Cong WANG.

3) Similarly we have to hold RNL mutex for fib_rules_unregister(), also
    from Cong WANG.

4) Revert and rework netns nsid allocation fix, from Nicolas Dichtel.

5) When we encapsulate for a tunnel device, skb->sk still points to the
    user socket.  So this leads to cases where we retraverse the
    ipv4/ipv6 output path with skb->sk being of some other address
    family (f.e. AF_PACKET).  This can cause things to crash since the
    ipv4 output path is dereferencing an AF_PACKET socket as if it were
    an ipv4 one.

    The short term fix for 'net' and -stable is to elide these socket
    checks once we've entered an encapsulation sequence by testing
    xmit_recursion.

    Longer term we have a better solution wherein we pass the tunnel's
    socket down through the output paths, but that is way too invasive
    for 'net' and -stable.

    From Hannes Frederic Sowa.

6) l2tp_init() failure path forgets to unregister per-net ops, from
    Cong WANG.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
  net: dsa: fix filling routing table from OF description
  l2tp: unregister l2tp_net_ops on failure path
  mvneta: dont call mvneta_adjust_link() manually
  ipv6: protect skb->sk accesses from recursive dereference inside the stack
  netns: don't allocate an id for dead netns
  Revert "netns: don't clear nsid too early on removal"
  ip6mr: call del_timer_sync() in ip6mr_free_table()
  net: move fib_rules_unregister() under rtnl lock
  ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup
  tcp: fix FRTO undo on cumulative ACK of SACKed range
  xen-netfront: transmit fully GSO-sized packets

net/mlx4_core: Fix error message deprecation for ConnectX-2 cards

Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.

Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: fix filling routing table from OF description

According to description in 'include/net/dsa.h', in cascade switches
configurations where there are more than one interconnected devices,
'rtable' array in 'dsa_chip_data' structure is used to indicate which
port on this switch should be used to send packets to that are destined
for corresponding switch.

However, dsa_of_setup_routing_table() fills 'rtable' with port numbers
of the _target_ switch, but not current one.

This commit removes redundant devicetree parsing and adds needed port
number as a function argument. So dsa_of_setup_routing_table() now just
looks for target switch number by parsing parent of 'link' device node.

To remove possible misunderstandings with the way of determining target
switch number, a corresponding comment was added to the source code and
to the DSA device tree bindings documentation file.

This was tested on a custom board with two Marvell 88E6095 switches with
following corresponding routing tables: { -1, 10 } and { 8, -1 }.

Signed-off-by: Pavel Nakonechny <pavel.nakonechny@skitlab.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"Updates for the input subsystem - two more tweaks for ALPS driver to
  work out kinks after splitting the touchpad, trackstick, and potential
  external PS/2 mouse into separate input devices.

  Changes to support ALPS SS4 devices (protocol V8) will be coming in
  4.1..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: alps - document stick behavior for protocol V2
  Input: alps - report V2 Dualpoint Stick events via the right evdev node
  Input: alps - report interleaved bare PS/2 packets via dev3

l2tp: unregister l2tp_net_ops on failure path

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvneta: dont call mvneta_adjust_link() manually

mvneta_adjust_link() is a callback for of_phy_connect() and should
not be called directly. The result of calling it directly is as below:

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: protect skb->sk accesses from recursive dereference inside the stack

We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.

ipv6 does not conform with this in three places:

1) ip6_fragment: we do consult ipv6_npinfo for frag_size

2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket

3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU

Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.

Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/alternatives: Guard NOPs optimization

Take a look at the first instruction byte before optimizing the NOP -
there might be something else there already, like the ALTERNATIVE_2()
in rdtsc_barrier() which NOPs out on AMD even though we just
patched in an MFENCE.

This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC,
AMD CPUs set it, we patch in the MFENCE and right afterwards it sees
X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly
optimize the NOP.

Checking whether at least the first byte is 0x90 prevents that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428181662-18020-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry: Clear EXTRA_REGS for all executable formats

On failure, sys_execve() does not clobber EXTRA_REGS, so we can
just return to userpsace without saving/restoring them.

On success, ELF_PLAT_INIT() in sys_execve() clears all these
registers.

On other executable formats:

  - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone
    else except sh) doesn't define it.

  - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most
    others) doesn't define it.

  - There are no such hooks in binfmt_aout.c et al. We inherit
    EXTRA_REGS from the prior executable.

This inconsistency was not intended.

This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve,
removes register clearing in ELF_PLAT_INIT(),
and instead simply clears them on success in stub_execve.

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/signal: Remove pax argument from restore_sigcontext

The 'pax' argument is unnecesary. Instead, store the RAX value
directly in regs.

This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
was changed to return an error code instead of EAX directly:

https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174

In 2007 sigaltstack syscall support was added, where the return
value of restore_sigcontext() was changed to carry the memory-copying
failure code.

But instead of putting 'ax' into regs->ax directly, it was carried
in via a pointer and then returned, where the generic syscall return
code copied it to regs->ax.

So there was never any deeper reason for this suboptimal pattern, it
was simply never noticed after being introduced.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Input: alps - document stick behavior for protocol V2

Document that protocol V2 uses standard (bare) PS/2 mouse packets for the
DualPoint stick.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-By: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: alps - report V2 Dualpoint Stick events via the right evdev node

On V2 devices the DualPoint Stick reports bare packets, these should be
reported via the "AlpsPS/2 ALPS DualPoint Stick" dev2 evdev node, which also
has the INPUT_PROP_POINTING_STICK propbit set.

Note that since there is no way to distinguish these packets from an external
PS/2 mouse (insofar as these laptops have an external PS/2 port) this means
that we will be reporting PS/2 mouse events via this evdev node too, as we've
been doing in kernel 3.19 and older.

This has been tested on a Dell Latitude D620 and a Dell Latitude E6400,
which both have a V2 touchpad + a DualPoint Stick which reports bare packets.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: alps - report interleaved bare PS/2 packets via dev3

Bare packets should be reported via the same evdev device independent on
whether they are detected on the beginning of a packet or in the middle
of a packet.

This has been tested on a Dell Latitude E6400, where the DualPoint Stick
reports bare packets, which get reported via dev3 when the touchpad is
idle, and via dev2 when the touchpad and stick are used simultaneously.

This commit fixes this inconsistency by always reporting bare packets via
dev3. Note that since the come from a DualPoint Stick they really should be
reported via dev2, this gets fixed in a later commit.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids for 4.0-rc6.  Nothing
  major, some xhci fixes for reported problems, and some usb-serial
  device ids.

  All have been in linux-next for a while"

* tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: ftdi_sio: Use jtag quirk for SNAP Connect E10
  usb: isp1760: fix spin unlock in the error path of isp1760_udc_start
  usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers
  usb: xhci: handle Config Error Change (CEC) in xhci driver
  USB: keyspan_pda: add new device id
  USB: ftdi_sio: Added custom PID for Synapse Wireless product

Merge tag 'staging-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are some staging driver fixes, well, really all just IIO driver
  fixes, for 4.0-rc6.  They fix issues that have been reported with
  these drivers.

  All of these patches have been in linux-next for a while"

* tag 'staging-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: imu: Use iio_trigger_get for indio_dev->trig assignment
  iio: adc: vf610: use ADC clock within specification
  iio/adc/cc10001_adc.c: Fix !HAS_IOMEM build
  iio: core: Fix double free.
  iio:inv-mpu6050: Fix inconsistency for the scale channel
  staging: iio: dummy: Fix undefined symbol build error
  iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo
  staging: iio: hmc5843: Set iio name property in sysfs
  iio: bmc150: change sampling frequency
  iio: fix drivers that check buffer->scan_mask

Merge tag 'tty-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are 3 serial driver fixes for 4.0-rc6.  They fix some reported
  issues with the samsung and fsl_lpuart drivers.

  All have been in linux-next for a while"

* tag 'tty-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: fsl_lpuart: clear receive flag on FIFO flush
  tty: serial: fsl_lpuart: specify transmit FIFO size
  serial: samsung: Clear operation mode on UART shutdown

x86/alternatives: Fix ALTERNATIVE_2 padding generation properly

Quentin caught a corner case with the generation of instruction
padding in the ALTERNATIVE_2 macro: if len(orig_insn) <
len(alt1) < len(alt2), then not enough padding gets added and
that is not good(tm) as we could overwrite the beginning of the
next instruction.

Luckily, at the time of this writing, we don't have
ALTERNATIVE_2() invocations which have that problem and even if
we did, a simple fix would be to prepend the instructions with
enough prefixes so that that corner case doesn't happen.

However, best it would be if we fixed it properly. See below for
a simple, abstracted example of what we're doing.

So what we ended up doing is, we compute the

max(len(alt1), len(alt2)) - len(orig_insn)

and feed that value to the .skip gas directive. The max() cannot
have conditionals due to gas limitations, thus the fancy integer
math.

With this patch, all ALTERNATIVE_2 sites get padded correctly;
generating obscure test cases pass too:

  #define alt_max_short(a, b)    ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))

  #define gen_skip(orig, alt1, alt2, marker) \
   .skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \
   (alt_max_short(alt1, alt2) - (orig)),marker

   .pushsection .text, "ax"
  .globl main
  main:
   gen_skip(1, 2, 4, 0x09)
   gen_skip(4, 1, 2, 0x10)
   ...
   .popsection

Thanks to Quentin for catching it and double-checking the fix!

Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input subsystem fixes from Dmitry Torokhov:
"A fix for ALPS driver for issue introduced in the latest update and a
  tweak for yet another Lenovo box in Synaptics.

  There will be more ALPS tweaks coming.."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: define INPUT_PROP_ACCELEROMETER behavior
  Input: synaptics - fix min-max quirk value for E440
  Input: synaptics - add quirk for Thinkpad E440
  Input: ALPS - fix max coordinates for v5 and v7 protocols
  Input: add MT_TOOL_PALM

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fix from Jens Axboe:
"Just one patch in this pull request, fixing a regression caused by a
'mathematically correct' change to lcm()"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix blk_stack_limits() regression due to lcm() change

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes: a SYSRET single-stepping fix, a dmi-scan robustization
  fix, a reboot quirk and a kgdb fixlet"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kgdb/x86: Fix reporting of 'si' in kgdb on x86_64
  x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
  x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
  MAINTAINERS: Change the x86 microcode loader maintainer
  firmware: dmi_scan: Prevent dmi_num integer overflow

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Two x86 Intel PMU constraint handling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints
perf/x86/intel: Filter branches for PEBS event

Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux

Pull devicetree fix from Grant Likely:
"Simple bugfix for bad device tree data on the PA-Semi platform"

* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
drivers/of: Add empty ranges quirk for PA-Semi

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
"A set of small cifs fixes fixing a memory leak, kernel oops, and
  infinite loop (and some spotted by Coverity)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  Fix warning
  Fix another dereference before null check warning
  CIFS: session servername can't be null
  Fix warning on impossible comparison
  Fix coverity warning
  Fix dereference before null check warning
  Don't ignore errors on encrypting password in SMBTcon
  Fix warning on uninitialized buftype
  cifs: potential memory leaks when parsing mnt opts
  cifs: fix use-after-free bug in find_writable_file
  cifs: smb2_clone_range() - exit on unhandled error

netns: don't allocate an id for dead netns

First, let's explain the problem.
Suppose you have an ipip interface that stands in the netns foo and its link
part in the netns bar (so the netns bar has an nsid into the netns foo).
Now, you remove the netns bar:
- the bar nsid into the netns foo is removed
- the netns exit method of ipip is called, thus our ipip iface is removed:
   => a netlink message is built in the netns foo to advertise this deletion
   => this netlink message requests an nsid for bar, thus a new nsid is
      allocated for bar and never removed.

This patch adds a check in peernet2id() so that an id cannot be allocated for
a netns which is currently destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "netns: don't clear nsid too early on removal"

This reverts
commit 4217291e592d ("netns: don't clear nsid too early on removal").

This is not the right fix, it introduces races.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/asm/entry/64: Use a define for an invalid segment selector

... instead of a naked number, for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value

Commit:

  d56fe4bf5f3c ("x86/asm/entry/64: Always set up SYSENTER MSRs")

missed to add "ULL" to the 0 and wrmsrl_safe() complains:

  arch/x86/kernel/cpu/common.c: In function ‘syscall_init’:
  arch/x86/kernel/cpu/common.c:1226:2: warning: right shift count >= width of type wrmsrl_safe(MSR_IA32_SYSENTER_CS, 0);

Fix it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/mm/KASLR: Propagate KASLR status to kernel proper

Commit:

e2b32e678513 ("x86, kaslr: randomize module base load address")

made module base address randomization unconditional and didn't regard
disabled KKASLR due to CONFIG_HIBERNATION and command line option
"nokaslr". For more info see (now reverted) commit:

f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")

In order to propagate KASLR status to kernel proper, we need a single bit
in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
top-down allocated bits for bits supposed to be used by the bootloader.

Originally-From: Jiri Kosina <jkosina@suse.cz>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32

Commit:

4214a16b0297 ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER")

removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the
macro now too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER

SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
with usergs and IRQs on.  That means that we rely on STI to
correctly mask interrupts for one instruction.  This is okay by
itself, but the semantics with respect to NMIs are unclear.

Avoid the whole issue by using SYSRETL instead.  For background,
Intel CPUs don't allow SYSCALL from compat mode, but they do
allow SYSRETL back to compat mode.  Go figure.

To avoid doing too much at once, this doesn't revamp the calling
convention.  We still return with EBP, EDX, and ECX on the user
stack.

Oddly this seems to be 30 cycles or so faster.  Avoiding POPFQ
and STI will account for under half of that, I think, so my best
guess is that Intel just optimizes SYSRET much better than
SYSEXIT.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1

We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
and we unnecessarily cache the value in tss.sp1. We never
read the cached value.

Remove all of the caching. It serves no purpose.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/32: Improve a TOP_OF_KERNEL_STACK_PADDING comment

At Denys' request, clean up the comment describing stack padding
in the 32-bit sysenter path.

No code changes.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm: Add support for the CLWB instruction

Add support for the new CLWB (cache line write back)
instruction.  This instruction was announced in the document
"Intel Architecture Instruction Set Extensions Programming
Reference" with reference number 319433-022.

  https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

The CLWB instruction is used to write back the contents of
dirtied cache lines to memory without evicting the cache lines
from the processor's cache hierarchy.  This should be used in
favor of clflushopt or clflush in cases where you require the
cache line to be written to memory but plan to access the data
again in the near future.

One of the main use cases for this is with persistent memory
where CLWB can be used with PCOMMIT to ensure that data has been
accepted to memory and is durable on the DIMM.

This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
and PCOMMIT with appropriate fencing:

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;

for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);

/* Flush any possible final partial cacheline */
clwb(vend);

/*
* Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
* (MFENCE via mb() also works)
*/
wmb();

/* PCOMMIT and the required SFENCE for ordering */
pcommit_sfence();
}

After this function completes the data pointed to by vaddr is
has been accepted to memory and will be durable if the vaddr
points to persistent memory.

Regarding the details of how the alternatives assembly is set
up, we need one additional byte at the beginning of the CLFLUSH
so that we can flip it into a CLFLUSHOPT by changing that byte
into a 0x66 prefix.  Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no
functional effect with the plain CLFLUSH, but I've been told
that executing a CLFLUSH + prefix should be faster than
executing a CLFLUSH + NOP.

We had to hard code the assembly for CLWB because, lacking the
ability to assemble the CLWB instruction itself, the next
closest thing is to have an xsaveopt instruction with a 0x66
prefix.  Unfortunately XSAVEOPT itself is also relatively new,
and isn't included by all the GCC versions that the kernel needs
to support.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

ip6mr: call del_timer_sync() in ip6mr_free_table()

We need to wait for the flying timers, since we
are going to free the mrtable right after it.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move fib_rules_unregister() under rtnl lock

We have to hold rtnl lock for fib_rules_unregister()
otherwise the following race could happen:

fib_rules_unregister(): fib_nl_delrule():
... ...
... ops = lookup_rules_ops();
list_del_rcu(&ops->list);
list_for_each_entry(ops->rules) {
fib_rules_cleanup_ops(ops); ...
list_del_rcu(); list_del_rcu();
}

Note, net->rules_mod_lock is actually not needed at all,
either upper layer netns code or rtnl lock guarantees
we are safe.

Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup

This is the IPv4 part for commit 905a6f96a1b1
(ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup).

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"One drm core fix, one exynos regression fix, two sets of radeon fixes
  (Alex was a bit behind last week), and two i915 fixes.

  Nothing too serious we seem to have calmed down i915 since last week"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: fix wait in radeon_mn_invalidate_range_start
  drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr
  drm: Exynos: Respect framebuffer pitch for FIMD/Mixer
  drm/i915: Reject the colorkey ioctls for primary and cursor planes
  drm/i915: Skip allocating shadow batch for 0-length batches
  drm/radeon: programm the VCE fw BAR as well
  drm/radeon: always dump the ring content if it's available
  radeon: Do not directly dereference pointers to BIOS area.
  drm/radeon/dpm: fix 120hz handling harder
  drm/edid: set ELD for firmware and debugfs override EDIDs

Merge tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux

Pull irqchip fixes from Jason Cooper:
"This is the second round of fixes for irqchip.  It contains some fixes
  found while the arm64 guys were writing the kvm gicv3 its emulation.

  GICv3 ITS:
    - Small batch of fixes discovered while writing the kvm ITS emulation"

* tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux:
  irqchip: gicv3-its: Use non-cacheable accesses when no shareability
  irqchip: gicv3-its: Fix PROP/PEND and BASE/CBASE confusion
  irqchip: gicv3-its: Fix device ID encoding
  irqchip: gicv3-its: Fix encoding of collection's target redistributor

Merge branch 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Just two small fixes for radeon, both destined for stable.

* 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr

Merge branch 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Fix display on issue to Exynos5250 based Snow(1366x768) board.

* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm: Exynos: Respect framebuffer pitch for FIMD/Mixer

Merge tag 'drm-intel-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes

one oops fixes and a 0-length allocation fix from next backported.

* tag 'drm-intel-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Reject the colorkey ioctls for primary and cursor planes
drm/i915: Skip allocating shadow batch for 0-length batches

Merge tag 'topic/drm-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes

Here's a single drm core fix, cc: stable, that affects i915
users.

* tag 'topic/drm-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel:
drm/edid: set ELD for firmware and debugfs override EDIDs

Merge tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen regression fixes from David Vrabel:
"Fix two regressions in the balloon driver's use of memory hotplug when
  used in a PV guest"

* tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: before adding hotplugged memory, set frames to invalid
  x86/xen: prepare p2m list for memory hotplug

Merge tag 'powerpc-4.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fix from Michael Ellerman:
"Fix memory corruption by pnv_alloc_idle_core_states"

* tag 'powerpc-4.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc: fix memory corruption by pnv_alloc_idle_core_states

tcp: fix FRTO undo on cumulative ACK of SACKed range

On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.

The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.

The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fix from Roland Dreier:
"Fix for exploitable integer overflow in uverbs interface"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

xen-netfront: transmit fully GSO-sized packets

xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.

Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
    XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.

The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.

So the maximum permitted size of an skb is calculated to be
    (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.

Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.

Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).

Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.

Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"This time we have addition of caps for jz4740 which fixes intentional
  warning at boot.  Then we have memory leak issues in drivers using
  virt-dma by Peter on few drive"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: moxart-dma: Fix memory leak when stopping a running transfer
  dmaengine: bcm2835-dma: Fix memory leak when stopping a running transfer
  dmaengine: omap-dma: Fix memory leak when terminating running transfer
  dmaengine: edma: fix memory leak when terminating running transfers
  dmaengine: jz4740: Define capabilities

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix use-after-free with mac80211 RX A-MPDU reorder timer, from
    Johannes Berg.

2) iwlwifi leaks memory every module load/unload cycles, fix from Larry
    Finger.

3) Need to use for_each_netdev_safe() in rtnl_group_changelink()
    otherwise we can crash, from WANG Cong.

4) mlx4 driver does register_netdev() too early in the probe sequence,
    from Ido Shamay.

5) Don't allow router discovery hop limit to decrease the interface's
    hop limit, from D.S. Ljungmark.

6) tx_packets and tx_bytes improperly accounted for certain classes of
    USB network devices, fix from Ben Hutchings.

7) ip{6}mr_rules_init() mistakenly use plain kfree to release the ipmr
    tables in the error path, they must instead use ip{6}mr_free_table().
    Fix from WANG Cong.

8) cxgb4 doesn't properly quiesce all RX activity before unregistering
    the netdevice.  Fix from Hariprasad Shenai.

9) Fix hash corruptions in ipvlan driver, from Jiri Benc.

10) nla_memcpy(), like a real memcpy, should fully initialize the
    destination buffer, even if the source attribute is smaller.  Fix
    from Jiri Benc.

11) Fix wrong error code returned from iucv_sock_sendmsg().  We should
    use whatever sock_alloc_send_skb() put into 'err'.  From Eugene
    Crosser.

12) Fix slab object leak on module unload in TIPC, from Ying Xue.

13) Need a READ_ONCE() when reading the cached RX socket route in
    tcp_v{4,6}_early_demux().  From Michal Kubecek.

14) Still too many problems with TPC support in the ath9k driver, so
    disable it for now.  From Felix Fietkau.

15) When in AP mode the rtlwifi driver can leak DMA mappings, fix from
    Larry Finger.

16) Missing kzalloc() failure check in gs_usb CAN driver, from Colin Ian
    King.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  cxgb4: Fix to dump devlog, even if FW is crashed
  cxgb4: Firmware macro changes for fw verison 1.13.32.0
  bnx2x: Fix kdump when iommu=on
  bnx2x: Fix kdump on 4-port device
  mac80211: fix RX A-MPDU session reorder timer deletion
  MAINTAINERS: Update Intel Wired Ethernet Driver info
  tipc: fix a slab object leak
  net/usb/r8152: add device id for Lenovo TP USB 3.0 Ethernet
  af_iucv: fix AF_IUCV sendmsg() errno
  openvswitch: Return vport module ref before destruction
  netlink: pad nla_memcpy dest buffer with zeroes
  bonding: Bonding Overriding Configuration logic restored.
  ipvlan: fix check for IP addresses in control path
  ipvlan: do not use rcu operations for address list
  ipvlan: protect against concurrent link removal
  ipvlan: fix addr hash list corruption
  net: fec: setup right value for mdio hold time
  net: tcp6: fix double call of tcp_v6_fill_cb()
  cxgb4vf: Fix sparse warnings
  netns: don't clear nsid too early on removal
  ...

IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Cc: <stable@vger.kernel.org>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints

Some of the CYCLE_ACTIVITY.* events can only be scheduled on
counter 2. Due to a typo Haswell matched those with
INTEL_EVENT_CONSTRAINT, which lead to the events never
matching as the comparison does not expect anything
in the umask too. Fix the typo.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/intel: Filter branches for PEBS event

For supporting Intel LBR branches filtering, Intel LBR sharing logic
mechanism is introduced from commit b36817e88630 ("perf/x86: Add Intel
LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to
config lbr_sel, which is finally used to set LBR_SELECT.

However, the intel_shared_regs_constraints() function is called after
intel_pebs_constraints(). The PEBS event will return immediately after
intel_pebs_constraints(). So it's impossible to filter branches for PEBS
events.

This patch moves intel_shared_regs_constraints() ahead of
intel_pebs_constraints().

We can safely do that because the intel_shared_regs_constraints() function
only returns empty constraint if its rejecting the event, otherwise it
returns NULL such that we continue calling intel_pebs_constraints() and
x86_get_event_constraint().

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

drm/radeon: fix wait in radeon_mn_invalidate_range_start

We need to wait for all fences, not just the exclusive one.

Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr

We somehow try to free the SG table twice.

Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=89734

Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm: Exynos: Respect framebuffer pitch for FIMD/Mixer

When performing a modeset, use the framebuffer pitch value to set FIMD
IMG_SIZE and Mixer SPAN registers. These are both defined as pitch - the
distance between contiguous lines (bytes for FIMD, pixels for mixer).

Fixes display on Snow (1366x768).

Signed-off-by: Daniel Stone <daniels@collabora.com>
Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>

x86/cpu: Factor out common CPU initialization code, fix 32-bit Xen PV guests

Some of x86 bare-metal and Xen CPU initialization code is common
between the two and therefore can be factored out to avoid code
duplication.

As a side effect, doing so will also extend the fix provided by
commit a7fcf28d431e ("x86/asm/entry: Replace this_cpu_sp0() with
current_top_of_stack() to x86_32") to 32-bit Xen PV guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1427897534-5086-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/boot/64: Use __BOOT_TSS instead of literal $0x20

__BOOT_TSS = (GDT_ENTRY_BOOT_TSS * 8)
GDT_ENTRY_BOOT_TSS = (GDT_ENTRY_BOOT_CS + 2)
GDT_ENTRY_BOOT_CS = 2

(2 + 2) * 8 = 4 * 8 = 32 = 0x20

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Fold the 'test_in_nmi' macro into its only user

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

kgdb/x86: Fix reporting of 'si' in kgdb on x86_64

This patch fixes an error in kgdb for x86_64 which would report
the value of dx when asked to give the value of si.

Signed-off-by: Steffen Liebergeld <steffen.liebergeld@kernkonzept.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set

When I wrote the opportunistic SYSRET code, I missed an important difference
between SYSRET and IRET.

Both instructions are capable of setting EFLAGS.TF, but they behave differently
when doing so:

- IRET will not issue a #DB trap after execution when it sets TF.
   This is critical -- otherwise you'd never be able to make forward progress when
   returning to userspace.

- SYSRET, on the other hand, will trap with #DB immediately after
   returning to CPL3, and the next instruction will never execute.

This breaks anything that opportunistically SYSRETs to a user
context with TF set.  For example, running this code with TF set
and a SIGTRAP handler loaded never gets past 'post_nop':

extern unsigned char post_nop[];
asm volatile ("pushfq\n\t"
      "popq %%r11\n\t"
      "nop\n\t"
      "post_nop:"
      : : "c" (post_nop) : "r11");

In my defense, I can't find this documented in the AMD or Intel manual.

Fix it by using IRET to restore TF.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2a23c6b8a9c4 ("x86_64, entry: Use sysret to return to userspace when possible")
Link: http://lkml.kernel.org/r/9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

drm/i915: Reject the colorkey ioctls for primary and cursor planes

The legcy colorkey ioctls are only implemented for sprite planes, so
reject the ioctl for primary/cursor planes. If we want to support
colorkeying with these planes (assuming we have hw support of course)
we should just move ahead with the colorkey property conversion.

Testcase: kms_legacy_colorkey
Cc: Tommi Rantala <tt.rantala@gmail.com>
Cc: stable@vger.kernel.org
Reference: http://mid.gmane.org/CA+ydwtr+bCo7LJ44JFmUkVRx144UDFgOS+aJTfK6KHtvBDVuAw@mail.gmail.com
Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

Merge tag 'wireless-drivers-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
iwlwifi:

* fix a memory leak, we leaked memory each time the module
was loaded.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-net'

Hariprasad Shenai says:

====================
cxgb4 FW macro changes for new FW

Fix to dump device log even in the case of firmware crash. Also
incorporates changes for new FW.

This patch series has been created against net tree and includes patches on
cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Fix to dump devlog, even if FW is crashed

Add new Common Code routines to retrieve Firmware Device Log
parameters from PCIE_FW_PF[7]. The firmware initializes its Device Log very
early on and stores the parameters for its location/size in that register.
Using the parameters from the register allows us to access the Firmware
Device Log even when the firmware crashes very early on or we're not
attached to the firmware

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Firmware macro changes for fw verison 1.13.32.0

Adds new macro and few macro changes for fw version 1.13.32.0 also
changes version string in driver to match 1.13.32.0

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
This contains just a single fix for a crash I happened to randomly
run into today during testing. It's clearly been around for a while,
but is pretty hard to trigger, even when I tried explicitly (and
modified the code to make it more likely) it rarely did.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'iommu-fixes-v4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"This contains fixes for:

   - a VT-d issue where hardware domain-ids might be freed while still
     in use.

   - an ipmmu-vmsa issue where where the device-table was not zero
     terminated

   - unchecked register access issue in the arm-smmu driver"

* tag 'iommu-fixes-v4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Remove unused variable
  iommu: ipmmu-vmsa: Add terminating entry for ipmmu_of_ids
  iommu/vt-d: Detach domain *only* from attached iommus
  iommu/arm-smmu: fix ARM_SMMU_FEAT_TRANS_OPS condition

lguest: now needs PCI_DIRECT.

Since commit 8e7094694396 ("lguest: add a dummy PCI host bridge.")
lguest uses PCI, but it needs you to frob the ports directly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull lazytime fixes from Ted Ts'o:
"This fixes a problem in the lazy time patches, which can cause
  frequently updated inods to never have their timestamps updated.

  These changes guarantee that no timestamp on disk will be stale by
  more than 24 hours"

* tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs: add dirtytime_expire_seconds sysctl
  fs: make sure the timestamps for lazytime inodes eventually get written

Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
"Two main issues:

   - We found that turning on pNFS by default (when it's configured at
     build time) was too aggressive, so we want to switch the default
     before the 4.0 release.

   - Recent client changes to increase open parallelism uncovered a
     serious bug lurking in the server's open code.

  Also fix a krb5/selinux regression.

  The rest is mainly smaller pNFS fixes"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  sunrpc: make debugfs file creation failure non-fatal
  nfsd: require an explicit option to enable pNFS
  NFSD: Fix bad update of layout in nfsd4_return_file_layout
  NFSD: Take care the return value from nfsd4_encode_stateid
  NFSD: Printk blocklayout length and offset as format 0x%llx
  nfsd: return correct lockowner when there is a race on hash insert
  nfsd: return correct openowner when there is a race to put one in the hash
  NFSD: Put exports after nfsd4_layout_verify fail
  NFSD: Error out when register_shrinker() fail
  NFSD: Take care the return value from nfsd4_decode_stateid
  NFSD: Check layout type when returning client layouts
  NFSD: restore trace event lost in mismerge

Merge branch 'bnx2'

Yuval Mintz says:

====================
bnx2x: kdump related fixes

This patch series aims to fix bnx2x driver issues when loading in kdump kernel.
Both issues fixed here would be fatal to the device, requiring full reset of
the system in order to recover, preventing the device from serving its purpose
in the kdump environment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix kdump when iommu=on

When IOMM-vtd is active, once main kernel crashes unfinished DMAE transactions
will be blocked, putting the HW in an error state which will cause further
transactions to timeout.

Current employed logic uses wrong macros, causing the first function to be the
only function that cleanups that error state during its probe/load.

This patch allows all the functions to successfully re-load in kdump kernel.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix kdump on 4-port device

When running in a kdump kernel, it's very likely that due to sync. loss with
management firmware the first PCI function to probe and reach the previous
unload flow would decide it can reset the chip and continue onward. While doing
so, it will only close its own Rx port.

On a 4-port device where 2nd port on engine is a 1g-port, the 2nd port would
allow ingress traffic after the chip is reset [assuming it was active on the
first kernel]. This would later cause a HW attention.

This changes driver flow to close both ports' 1g capabilities during the
previous driver unload flow prior to the chip reset.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac80211: fix RX A-MPDU session reorder timer deletion

There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:

* tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
* station is destroyed
* reorder timer fires before ieee80211_free_tid_rx() runs,
accessing the station, thus potentially crashing due to
the use-after-free

The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.

Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk

The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.

The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.

Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.

( Searching the web seems to suggest that other Bay Trail-D mainboards
might be affected as well. )
--
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: <stable@vger.kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'iio-fixes-for-4.0d' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus

Jonathan writes:

IIO fixes for 4.0 set 4

A couple more IIO fixes.

* Fix check for HAS_IOMEM in the cc100001_adc driver to avoid build errors.
  Rather curiously it was ORed with Regulator and clock support.
* vf610 driver was trying to use an ADC clock outside the possible
  spec on some boards.  The driver assumed a fixed clock speed previously
  across all boards, but that is not true.  This fix ensures that the
  reported frequency is correct on all boards.
* The adis imu common code directly set the current trigger to the
  driver supplied one.  Unfortunately this didn't increase the use count
  leading to a double free via a particular path of changing the trigger
  then removing the driver.

Merge tag 'usb-serial-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for v4.0-rc6

Here are a few new device IDs.

Signed-off-by: Johan Hovold <johan@kernel.org>

x86/asm/entry/64: Use local label to skip around sycall dispatch

Logically, we just want to jump around the following instruction
and its prologue/epilogue:

  call *sys_call_table(,%rax,8)

if the syscall number is too big - we do not specifically target
the "int_ret_from_sys_call" label.

Use a local, numerical label for this jump, for more clarity.

This also makes the code smaller:

-ffffffff8187756b:      0f 87 0f 00 00 00       ja     ffffffff81877580 <int_ret_from_sys_call>
+ffffffff8187756b:      77 0f                   ja     ffffffff8187757c <int_ret_from_sys_call>

because jumps to global labels are never translated to short jump
instructions by GAS.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm: Replace "MOVQ $imm, %reg" with MOVL

There is no reason to use MOVQ to load a non-negative immediate
constant value into a 64-bit register. MOVL does the same, since
the upper 32 bits are zero-extended by the CPU.

This makes the code a bit smaller, while leaving functionality
unchanged.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Simplify looping around preempt_schedule_irq()

At the 'exit_intr' label we test whether interrupt/exception was in
kernel. If it did, we jump to the preemption check. If preemption
does happen (IOW if we call preempt_schedule_irq()), we go back to
'exit_intr'.

But it's pointless, we already know that the test succeeded last
time, preemption doesn't change the fact that interrupt/exception
was in the kernel.

We can go back directly to checking PER_CPU_VAR(__preempt_count) instead.

This makes the 'exit_intr' label unused, drop it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS()

At this location, we already have interrupts off, always.
To be more specific, we already disabled them here:

ret_from_intr:
DISABLE_INTERRUPTS(CLBR_NONE)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Simplify retint_kernel label usage, make retint_restore_args label local

Get rid of #define obfuscation of retint_kernel in
CONFIG_PREEMPT case by defining retint_kernel label always, not
only for CONFIG_PREEMPT.

Strip retint_kernel of .global-ness (ENTRY macro) - it has no
users outside of this file.

This looks like cosmetics, but it is not:
"je LABEL" can be optimized to short jump by assember
only if LABEL is not global, for global labels jump is always
a near one with relocation.

Convert retint_restore_args to a local numeric label, making it
clearer that it is not used elsewhere in the file.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>