]> git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log
mirror_ubuntu-eoan-kernel.git
5 years agoselftests/kselftest/runner.sh: Add 45 second timeout per test
Kees Cook [Thu, 19 Sep 2019 18:06:44 +0000 (11:06 -0700)]
selftests/kselftest/runner.sh: Add 45 second timeout per test

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 852c8cbf34d3b3130a05c38064dd98614f97d3a8 ]

Commit a745f7af3cbd ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.

To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".

Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agokselftest: exclude failed TARGETS from runlist
Cristian Marussi [Thu, 26 Sep 2019 17:52:19 +0000 (18:52 +0100)]
kselftest: exclude failed TARGETS from runlist

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 131b30c94fbc0adb15f911609884dd39dada8f00 ]

A TARGET which failed to be built/installed should not be included in the
runlist generated inside the run_kselftest.sh script.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoHID: hyperv: Use in-place iterator API in the channel callback
Dexuan Cui [Tue, 20 Aug 2019 02:56:34 +0000 (02:56 +0000)]
HID: hyperv: Use in-place iterator API in the channel callback

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 6a297c90efa68b2864483193b8bfb0d19478600c ]

Simplify the ring buffer handling with the in-place API.

Also avoid the dynamic allocation and the memory leak in the channel
callback function.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoRDMA/iwcm: Fix a lock inversion issue
Bart Van Assche [Mon, 30 Sep 2019 23:16:54 +0000 (16:16 -0700)]
RDMA/iwcm: Fix a lock inversion issue

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit b66f31efbdad95ec274345721d99d1d835e6de01 ]

This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&id_priv->handler_mutex);
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
 #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
 dump_stack+0x8a/0xd6
 __lock_acquire.cold+0xe1/0x24d
 lock_acquire+0x106/0x240
 __mutex_lock+0x12e/0xcb0
 mutex_lock_nested+0x1f/0x30
 rdma_destroy_id+0x78/0x4a0 [rdma_cm]
 iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
 cm_work_handler+0xe62/0x1100 [iw_cm]
 process_one_work+0x56d/0xac0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

This is not a bug as there are actually two lock classes here.

Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoRDMA/iw_cxgb4: fix SRQ access from dump_qp()
Potnuri Bharat Teja [Mon, 30 Sep 2019 07:41:19 +0000 (13:11 +0530)]
RDMA/iw_cxgb4: fix SRQ access from dump_qp()

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 91724c1e5afe45b64970036170659726e7dc5cff ]

dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used
by the application. This patch matches the QPID before dumping them.  Also
removes unwanted SRQ id addition to QP id xarray.

Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com
Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoRDMA/hfi1: Prevent memory leak in sdma_init
Navid Emamdoost [Wed, 25 Sep 2019 14:45:42 +0000 (09:45 -0500)]
RDMA/hfi1: Prevent memory leak in sdma_init

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 34b3be18a04ecdc610aae4c48e5d1b799d8689f6 ]

In sdma_init if rhashtable_init fails the allocated memory for
tmp_sdma_rht should be released.

Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable")
Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoRDMA/siw: Fix serialization issue in write_space()
Krishnamraju Eraparaju [Mon, 23 Sep 2019 10:11:12 +0000 (15:41 +0530)]
RDMA/siw: Fix serialization issue in write_space()

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit df791c54d627bae53c9be3be40a69594c55de487 ]

In siw_qp_llp_write_space(), 'sock' members should be accessed with
sk_callback_lock held, otherwise, it could race with
siw_sk_restore_upcalls(). And this could cause "NULL deref" panic.  Below
panic is due to the NULL cep returned from sk_to_cep(sk):

  Call Trace:
   <IRQ>    siw_qp_llp_write_space+0x11/0x40 [siw]
   tcp_check_space+0x4c/0xf0
   tcp_rcv_established+0x52b/0x630
   tcp_v4_do_rcv+0xf4/0x1e0
   tcp_v4_rcv+0x9b8/0xab0
   ip_protocol_deliver_rcu+0x2c/0x1c0
   ip_local_deliver_finish+0x44/0x50
   ip_local_deliver+0x6b/0xf0
   ? ip_protocol_deliver_rcu+0x1c0/0x1c0
   ip_rcv+0x52/0xd0
   ? ip_rcv_finish_core.isra.14+0x390/0x390
   __netif_receive_skb_one_core+0x83/0xa0
   netif_receive_skb_internal+0x73/0xb0
   napi_gro_frags+0x1ff/0x2b0
   t4_ethrx_handler+0x4a7/0x740 [cxgb4]
   process_responses+0x2c9/0x590 [cxgb4]
   ? t4_sge_intr_msix+0x1d/0x30 [cxgb4]
   ? handle_irq_event_percpu+0x51/0x70
   ? handle_irq_event+0x41/0x60
   ? handle_edge_irq+0x97/0x1a0
   napi_rx_handler+0x14/0xe0 [cxgb4]
   net_rx_action+0x2af/0x410
   __do_softirq+0xda/0x2a8
   do_softirq_own_stack+0x2a/0x40
   </IRQ>
   do_softirq+0x50/0x60
   __local_bh_enable_ip+0x50/0x60
   ip_finish_output2+0x18f/0x520
   ip_output+0x6e/0xf0
   ? __ip_finish_output+0x1f0/0x1f0
   __ip_queue_xmit+0x14f/0x3d0
   ? __slab_alloc+0x4b/0x58
   __tcp_transmit_skb+0x57d/0xa60
   tcp_write_xmit+0x23b/0xfd0
   __tcp_push_pending_frames+0x2e/0xf0
   tcp_sendmsg_locked+0x939/0xd50
   tcp_sendmsg+0x27/0x40
   sock_sendmsg+0x57/0x80
   siw_tx_hdt+0x894/0xb20 [siw]
   ? find_busiest_group+0x3e/0x5b0
   ? common_interrupt+0xa/0xf
   ? common_interrupt+0xa/0xf
   ? common_interrupt+0xa/0xf
   siw_qp_sq_process+0xf1/0xe60 [siw]
   ? __wake_up_common_lock+0x87/0xc0
   siw_sq_resume+0x33/0xe0 [siw]
   siw_run_sq+0xac/0x190 [siw]
   ? remove_wait_queue+0x60/0x60
   kthread+0xf8/0x130
   ? siw_sq_resume+0xe0/0xe0 [siw]
   ? kthread_bind+0x10/0x10
   ret_from_fork+0x35/0x40

Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agostaging: rtl8188eu: fix null dereference when kzalloc fails
Connor Kuehl [Fri, 27 Sep 2019 21:44:15 +0000 (14:44 -0700)]
staging: rtl8188eu: fix null dereference when kzalloc fails

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 955c1532a34305f2f780b47f0c40cc7c65500810 ]

If kzalloc() returns NULL, the error path doesn't stop the flow of
control from entering rtw_hal_read_chip_version() which dereferences the
null pointer. Fix this by adding a 'goto' to the error path to more
gracefully handle the issue and avoid proceeding with initialization
steps that we're no longer prepared to handle.

Also update the debug message to be more consistent with the other debug
messages in this function.

Addresses-Coverity: ("Dereference after null check")

Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Link: https://lore.kernel.org/r/20190927214415.899-1-connor.kuehl@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf annotate: Don't return -1 for error when doing BPF disassembly
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 19:04:21 +0000 (16:04 -0300)]
perf annotate: Don't return -1 for error when doing BPF disassembly

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 11aad897f6d1a28eae3b7e5b293647c522d65819 ]

Return errno when open_memstream() fails and add two new speciall error
codes for when an invalid, non BPF file or one without BTF is passed to
symbol__disassemble_bpf(), so that its callers can rely on
symbol__strerror_disassemble() to convert that to a human readable error
message that can help figure out what is wrong, with hints even.

Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-usevw9r2gcipfcrbpaueurw0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf annotate: Return appropriate error code for allocation failures
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:53:33 +0000 (15:53 -0300)]
perf annotate: Return appropriate error code for allocation failures

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 16ed3c1e91159e28b02f11f71ff4ce4cbc6f99e4 ]

We should return errno or the annotation extra range understood by
symbol__strerror_disassemble() instead of -1, fix it, returning ENOMEM
instead.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-8of1cmj3rz0mppfcshc9bbqq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf annotate: Fix arch specific ->init() failure errors
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:48:12 +0000 (15:48 -0300)]
perf annotate: Fix arch specific ->init() failure errors

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 42d7a9107d83223a5fcecc6732d626a6c074cbc2 ]

They are called from symbol__annotate() and to propagate errors that can
help understand the problem make them return what
symbol__strerror_disassemble() known, i.e. errno codes and other
annotation specific errors in a special, out of errnos, range.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-pqx7srcv7tixgid251aeboj6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf annotate: Propagate the symbol__annotate() error return
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:44:13 +0000 (15:44 -0300)]
perf annotate: Propagate the symbol__annotate() error return

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 211f493b611eef012841f795166c38ec7528738d ]

We were just returning -1 in symbol__annotate() when symbol__annotate()
failed, propagate its error as it is used later to pass to
symbol__strerror_disassemble() to present a error message to the user,
that in some cases were getting:

  "Invalid -1 error code"

Fix it to propagate the error.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-0tj89rs9g7nbcyd5skadlvuu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf annotate: Fix the signedness of failure returns
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:11:47 +0000 (15:11 -0300)]
perf annotate: Fix the signedness of failure returns

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 28f4417c3333940b242af03d90214f713bbef232 ]

Callers of symbol__annotate() expect a errno value or some other
extended error value range in symbol__strerror_disassemble() to
convert to a proper error string, fix it when propagating a failure to
find the arch specific annotation routines via arch__find(arch_name).

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-o0k6dw7cas0vvmjjvgsyvu1i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf annotate: Propagate perf_env__arch() error
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:06:01 +0000 (15:06 -0300)]
perf annotate: Propagate perf_env__arch() error

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit a66fa0619a0ae3585ef09e9c33ecfb5c7c6cb72b ]

The callers of symbol__annotate2() use symbol__strerror_disassemble() to
convert its failure returns into a human readable string, so
propagate error values from functions it calls, starting with
perf_env__arch() that when fails the right thing to do is to look at
'errno' to see why its possible call to uname() failed.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-it5d83kyusfhb1q1b0l4pxzs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf tools: Propagate get_cpuid() error
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 13:55:34 +0000 (10:55 -0300)]
perf tools: Propagate get_cpuid() error

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit f67001a4a08eb124197ed4376941e1da9cf94b42 ]

For consistency, propagate the exact cause for get_cpuid() to have
failed.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9ig269f7ktnhh99g4l15vpu2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf jevents: Fix period for Intel fixed counters
Andi Kleen [Fri, 27 Sep 2019 23:35:45 +0000 (16:35 -0700)]
perf jevents: Fix period for Intel fixed counters

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 6bdfd9f118bd59cf0f85d3bf4b72b586adea17c1 ]

The Intel fixed counters use a special table to override the JSON
information.

During this override the period information from the JSON file got
dropped, which results in inst_retired.any and similar running with
frequency mode instead of a period.

Just specify the expected period in the table.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf script brstackinsn: Fix recovery from LBR/binary mismatch
Andi Kleen [Fri, 27 Sep 2019 23:35:44 +0000 (16:35 -0700)]
perf script brstackinsn: Fix recovery from LBR/binary mismatch

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit e98df280bc2a499fd41d7f9e2d6733884de69902 ]

When the LBR data and the instructions in a binary do not match the loop
printing instructions could get confused and print a long stream of
bogus <bad> instructions.

The problem was that if the instruction decoder cannot decode an
instruction it ilen wasn't initialized, so the loop going through the
basic block would continue with the previous value.

Harden the code to avoid such problems:

- Make sure ilen is always freshly initialized and is 0 for bad
  instructions.

- Do not overrun the code buffer while printing instructions

- Print a warning message if the final jump is not on an instruction
  boundary.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf map: Fix overlapped map handling
Steve MacLean [Sat, 28 Sep 2019 01:39:00 +0000 (01:39 +0000)]
perf map: Fix overlapped map handling

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit ee212d6ea20887c0ef352be8563ca13dbf965906 ]

Whenever an mmap/mmap2 event occurs, the map tree must be updated to add a new
entry. If a new map overlaps a previous map, the overlapped section of the
previous map is effectively unmapped, but the non-overlapping sections are
still valid.

maps__fixup_overlappings() is responsible for creating any new map entries from
the previously overlapped map. It optionally creates a before and an after map.

When creating the after map the existing code failed to adjust the map.pgoff.
This meant the new after map would incorrectly calculate the file offset
for the ip. This results in incorrect symbol name resolution for any ip in the
after region.

Make maps__fixup_overlappings() correctly populate map.pgoff.

Add an assert that new mapping matches old mapping at the beginning of
the after map.

Committer-testing:

Validated correct parsing of libcoreclr.so symbols from .NET Core 3.0 preview9
(which didn't strip symbols).

Preparation:

  ~/dotnet3.0-preview9/dotnet new webapi -o perfSymbol
  cd perfSymbol
  ~/dotnet3.0-preview9/dotnet publish
  perf record ~/dotnet3.0-preview9/dotnet \
      bin/Debug/netcoreapp3.0/publish/perfSymbol.dll
  ^C

Before:

  perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
     grep libcoreclr.so | head -n 4
        dotnet  1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
            [0x7fe615726000(0x768000) @ 0 08:02 5510620 765057155]: \
            r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
        dotnet  1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
            [0x7fe615974000(0x1000) @ 0x24e000 08:02 5510620 765057155]: \
            rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
        dotnet  1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
            [0x7fe615c42000(0x1000) @ 0x51c000 08:02 5510620 765057155]: \
            rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
        dotnet  1907 373352.705249:     250000 cpu-clock: \
             7fe6159a1f99 [unknown] \
             (.../3.0.0-preview9-19423-09/libcoreclr.so)

After:

  perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
     grep libcoreclr.so | head -n 4
        dotnet  1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
            [0x7fe615726000(0x768000) @ 0 08:02 5510620 765057155]: \
            r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
        dotnet  1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
            [0x7fe615974000(0x1000) @ 0x24e000 08:02 5510620 765057155]: \
            rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
        dotnet  1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
            [0x7fe615c42000(0x1000) @ 0x51c000 08:02 5510620 765057155]: \
            rwxp .../3.0.0-preview9-19423-09/libcoreclr.so

All the [unknown] symbols were resolved.

Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
Tested-by: Brian Robbins <brianrob@microsoft.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: John Keeping <john@metanate.com>
Cc: John Salem <josalem@microsoft.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom McDonald <thomas.mcdonald@microsoft.com>
Link: http://lore.kernel.org/lkml/BN8PR21MB136270949F22A6A02335C238F7800@BN8PR21MB1362.namprd21.prod.outlook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoperf tests: Avoid raising SEGV using an obvious NULL dereference
Ian Rogers [Wed, 25 Sep 2019 19:59:24 +0000 (12:59 -0700)]
perf tests: Avoid raising SEGV using an obvious NULL dereference

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit e3e2cf3d5b1fe800b032e14c0fdcd9a6fb20cf3b ]

An optimized build such as:

  make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-O3

will turn the dereference operation into a ud2 instruction, raising a
SIGILL rather than a SIGSEGV. Use raise(..) for correctness and clarity.

Similar issues were addressed in Numfor Mbiziwo-Tiapo's patch:

  https://lkml.org/lkml/2019/7/8/1234

Committer testing:

Before:

  [root@quaco ~]# perf test hooks
  55: perf hooks                                            : Ok
  [root@quaco ~]# perf test -v hooks
  55: perf hooks                                            :
  --- start ---
  test child forked, pid 17092
  SIGSEGV is observed as expected, try to recover.
  Fatal error (SEGFAULT) in perf hook 'test'
  test child finished with 0
  ---- end ----
  perf hooks: Ok
  [root@quaco ~]#

After:

  [root@quaco ~]# perf test hooks
  55: perf hooks                                            : Ok
  [root@quaco ~]# perf test -v hooks
  55: perf hooks                                            :
  --- start ---
  test child forked, pid 17909
  SIGSEGV is observed as expected, try to recover.
  Fatal error (SEGFAULT) in perf hook 'test'
  test child finished with 0
  ---- end ----
  perf hooks: Ok
  [root@quaco ~]#

Fixes: a074865e60ed ("perf tools: Introduce perf hooks")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20190925195924.152834-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agolibsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature
Ian Rogers [Wed, 25 Sep 2019 19:59:23 +0000 (12:59 -0700)]
libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 4b0b2b096da9d296e0e5668cdfba8613bd6f5bc8 ]

Unconditionally defining _FORTIFY_SOURCE can break tools that don't work
with it, such as memory sanitizers:

  https://github.com/google/sanitizers/wiki/AddressSanitizer#faq

Fixes: 4b6ab94eabe4 ("perf subcmd: Create subcmd library")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20190925195924.152834-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoiio: fix center temperature of bmc150-accel-core
Pascal Bouwmann [Thu, 29 Aug 2019 05:29:41 +0000 (07:29 +0200)]
iio: fix center temperature of bmc150-accel-core

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 6c59a962e081df6d8fe43325bbfabec57e0d4751 ]

The center temperature of the supported devices stored in the constant
BMC150_ACCEL_TEMP_CENTER_VAL is not 24 degrees but 23 degrees.

It seems that some datasheets were inconsistent on this value leading
to the error.  For most usecases will only make minor difference so
not queued for stable.

Signed-off-by: Pascal Bouwmann <bouwmann@tau-tec.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoiio: adc: meson_saradc: Fix memory allocation order
Remi Pommarel [Sun, 1 Sep 2019 10:54:10 +0000 (12:54 +0200)]
iio: adc: meson_saradc: Fix memory allocation order

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit de10ac47597e7a3596b27631d0d5ce5f48d2c099 ]

meson_saradc's irq handler uses priv->regmap so make sure that it is
allocated before the irq get enabled.

This also fixes crash when CONFIG_DEBUG_SHIRQ is enabled, as device
managed resources are freed in the inverted order they had been
allocated, priv->regmap was freed before the spurious fake irq that
CONFIG_DEBUG_SHIRQ adds called the handler.

Fixes: 3af109131b7eb8 ("iio: adc: meson-saradc: switch from polling to interrupt mode")
Reported-by: Elie Roudninski <xademax@gmail.com>
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Elie ROUDNINSKI <xademax@gmail.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agobtrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
Qu Wenruo [Mon, 14 Oct 2019 06:34:51 +0000 (14:34 +0800)]
btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 8702ba9396bf7bbae2ab93c94acd4bd37cfa4f09 ]

[Background]
Btrfs qgroup uses two types of reserved space for METADATA space,
PERTRANS and PREALLOC.

PERTRANS is metadata space reserved for each transaction started by
btrfs_start_transaction().
While PREALLOC is for delalloc, where we reserve space before joining a
transaction, and finally it will be converted to PERTRANS after the
writeback is done.

[Inconsistency]
However there is inconsistency in how we handle PREALLOC metadata space.

The most obvious one is:
In btrfs_buffered_write():
btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);

We always free qgroup PREALLOC meta space.

While in btrfs_truncate_block():
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));

We only free qgroup PREALLOC meta space when something went wrong.

[The Correct Behavior]
The correct behavior should be the one in btrfs_buffered_write(), we
should always free PREALLOC metadata space.

The reason is, the btrfs_delalloc_* mechanism works by:
- Reserve metadata first, even it's not necessary
  In btrfs_delalloc_reserve_metadata()

- Free the unused metadata space
  Normally in:
  btrfs_delalloc_release_extents()
  |- btrfs_inode_rsv_release()
     Here we do calculation on whether we should release or not.

E.g. for 64K buffered write, the metadata rsv works like:

/* The first page */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=0
total: num_bytes=calc_inode_reservations()
/* The first page caused one outstanding extent, thus needs metadata
   rsv */

/* The 2nd page */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=calc_inode_reservations()
total: not changed
/* The 2nd page doesn't cause new outstanding extent, needs no new meta
   rsv, so we free what we have reserved */

/* The 3rd~16th pages */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=calc_inode_reservations()
total: not changed (still space for one outstanding extent)

This means, if btrfs_delalloc_release_extents() determines to free some
space, then those space should be freed NOW.
So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
than btrfs_qgroup_convert_reserved_meta().

The good news is:
- The callers are not that hot
  The hottest caller is in btrfs_buffered_write(), which is already
  fixed by commit 336a8bb8e36a ("btrfs: Fix wrong
  btrfs_delalloc_release_extents parameter"). Thus it's not that
  easy to cause false EDQUOT.

- The trans commit in advance for qgroup would hide the bug
  Since commit f5fef4593653 ("btrfs: qgroup: Make qgroup async transaction
  commit more aggressive"), when btrfs qgroup metadata free space is slow,
  it will try to commit transaction and free the wrongly converted
  PERTRANS space, so it's not that easy to hit such bug.

[FIX]
So to fix the problem, remove the @qgroup_free parameter for
btrfs_delalloc_release_extents(), and always pass true to
btrfs_inode_rsv_release().

Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoBtrfs: fix inode cache block reserve leak on failure to allocate data space
Filipe Manana [Thu, 4 Jul 2019 15:24:19 +0000 (16:24 +0100)]
Btrfs: fix inode cache block reserve leak on failure to allocate data space

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 29d47d00e0ae61668ee0c5d90bef2893c8abbafa ]

If we failed to allocate the data extent(s) for the inode space cache, we
were bailing out without releasing the previously reserved metadata. This
was triggering the following warnings when unmounting a filesystem:

  $ cat -n fs/btrfs/inode.c
  (...)
  9268  void btrfs_destroy_inode(struct inode *inode)
  9269  {
  (...)
  9276          WARN_ON(BTRFS_I(inode)->block_rsv.reserved);
  9277          WARN_ON(BTRFS_I(inode)->block_rsv.size);
  (...)
  9281          WARN_ON(BTRFS_I(inode)->csum_bytes);
  9282          WARN_ON(BTRFS_I(inode)->defrag_bytes);
  (...)

Several fstests test cases triggered this often, such as generic/083,
generic/102, generic/172, generic/269 and generic/300 at least, producing
stack traces like the following in dmesg/syslog:

  [82039.079546] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9276 btrfs_destroy_inode+0x203/0x270 [btrfs]
  (...)
  [82039.081543] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
  [82039.081912] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [82039.082673] RIP: 0010:btrfs_destroy_inode+0x203/0x270 [btrfs]
  (...)
  [82039.083913] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
  [82039.084320] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
  [82039.084736] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
  [82039.085156] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
  [82039.085578] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
  [82039.086000] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
  [82039.086416] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
  [82039.086837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [82039.087253] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
  [82039.087672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [82039.088089] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [82039.088504] Call Trace:
  [82039.088918]  destroy_inode+0x3b/0x70
  [82039.089340]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
  [82039.089768]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
  [82039.090183]  ? wait_for_completion+0x65/0x1a0
  [82039.090607]  close_ctree+0x172/0x370 [btrfs]
  [82039.091021]  generic_shutdown_super+0x6c/0x110
  [82039.091427]  kill_anon_super+0xe/0x30
  [82039.091832]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [82039.092233]  deactivate_locked_super+0x3a/0x70
  [82039.092636]  cleanup_mnt+0x3b/0x80
  [82039.093039]  task_work_run+0x93/0xc0
  [82039.093457]  exit_to_usermode_loop+0xfa/0x100
  [82039.093856]  do_syscall_64+0x162/0x1d0
  [82039.094244]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [82039.094634] RIP: 0033:0x7f8db8fbab37
  (...)
  [82039.095876] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  [82039.096290] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
  [82039.096700] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
  [82039.097110] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
  [82039.097522] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
  [82039.097937] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
  [82039.098350] irq event stamp: 0
  [82039.098750] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [82039.099150] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.099545] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.099925] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [82039.100292] ---[ end trace f2521afa616ddccc ]---
  [82039.100707] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9277 btrfs_destroy_inode+0x1ac/0x270 [btrfs]
  (...)
  [82039.103050] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
  [82039.103428] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [82039.104203] RIP: 0010:btrfs_destroy_inode+0x1ac/0x270 [btrfs]
  (...)
  [82039.105461] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
  [82039.105866] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
  [82039.106270] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
  [82039.106673] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
  [82039.107078] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
  [82039.107487] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
  [82039.107894] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
  [82039.108309] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [82039.108723] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
  [82039.109146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [82039.109567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [82039.109989] Call Trace:
  [82039.110405]  destroy_inode+0x3b/0x70
  [82039.110830]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
  [82039.111257]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
  [82039.111675]  ? wait_for_completion+0x65/0x1a0
  [82039.112101]  close_ctree+0x172/0x370 [btrfs]
  [82039.112519]  generic_shutdown_super+0x6c/0x110
  [82039.112988]  kill_anon_super+0xe/0x30
  [82039.113439]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [82039.113861]  deactivate_locked_super+0x3a/0x70
  [82039.114278]  cleanup_mnt+0x3b/0x80
  [82039.114685]  task_work_run+0x93/0xc0
  [82039.115083]  exit_to_usermode_loop+0xfa/0x100
  [82039.115476]  do_syscall_64+0x162/0x1d0
  [82039.115863]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [82039.116254] RIP: 0033:0x7f8db8fbab37
  (...)
  [82039.117463] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  [82039.117882] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
  [82039.118330] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
  [82039.118743] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
  [82039.119159] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
  [82039.119574] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
  [82039.119987] irq event stamp: 0
  [82039.120387] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [82039.120787] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.121182] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.121563] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [82039.121933] ---[ end trace f2521afa616ddccd ]---
  [82039.122353] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9278 btrfs_destroy_inode+0x1bc/0x270 [btrfs]
  (...)
  [82039.124606] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
  [82039.125008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [82039.125801] RIP: 0010:btrfs_destroy_inode+0x1bc/0x270 [btrfs]
  (...)
  [82039.126998] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010202
  [82039.127399] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
  [82039.127803] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8dde29b34660
  [82039.128206] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
  [82039.128611] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
  [82039.129020] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
  [82039.129428] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
  [82039.129846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [82039.130261] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
  [82039.130684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [82039.131142] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [82039.131561] Call Trace:
  [82039.131990]  destroy_inode+0x3b/0x70
  [82039.132417]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
  [82039.132844]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
  [82039.133262]  ? wait_for_completion+0x65/0x1a0
  [82039.133688]  close_ctree+0x172/0x370 [btrfs]
  [82039.134157]  generic_shutdown_super+0x6c/0x110
  [82039.134575]  kill_anon_super+0xe/0x30
  [82039.134997]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [82039.135415]  deactivate_locked_super+0x3a/0x70
  [82039.135832]  cleanup_mnt+0x3b/0x80
  [82039.136239]  task_work_run+0x93/0xc0
  [82039.136637]  exit_to_usermode_loop+0xfa/0x100
  [82039.137029]  do_syscall_64+0x162/0x1d0
  [82039.137418]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [82039.137812] RIP: 0033:0x7f8db8fbab37
  (...)
  [82039.139059] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  [82039.139475] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
  [82039.139890] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
  [82039.140302] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
  [82039.140719] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
  [82039.141138] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
  [82039.141597] irq event stamp: 0
  [82039.142043] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [82039.142443] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.142839] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.143220] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [82039.143588] ---[ end trace f2521afa616ddcce ]---
  [82039.167472] WARNING: CPU: 3 PID: 13167 at fs/btrfs/extent-tree.c:10120 btrfs_free_block_groups+0x30d/0x460 [btrfs]
  (...)
  [82039.173800] CPU: 3 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
  [82039.174847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [82039.177031] RIP: 0010:btrfs_free_block_groups+0x30d/0x460 [btrfs]
  (...)
  [82039.180397] RSP: 0018:ffffac0b426a7dd8 EFLAGS: 00010206
  [82039.181574] RAX: ffff8de010a1db40 RBX: ffff8de010a1db40 RCX: 0000000000170014
  [82039.182711] RDX: ffff8ddff4380040 RSI: ffff8de010a1da58 RDI: 0000000000000246
  [82039.183817] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
  [82039.184925] R10: ffff8de036404380 R11: ffffffffb8a5ea00 R12: ffff8de010a1b2b8
  [82039.186090] R13: ffff8de010a1b2b8 R14: 0000000000000000 R15: dead000000000100
  [82039.187208] FS:  00007f8db96d12c0(0000) GS:ffff8de036b80000(0000) knlGS:0000000000000000
  [82039.188345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [82039.189481] CR2: 00007fb044005170 CR3: 00000002315cc006 CR4: 00000000003606e0
  [82039.190674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [82039.191829] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [82039.192978] Call Trace:
  [82039.194160]  close_ctree+0x19a/0x370 [btrfs]
  [82039.195315]  generic_shutdown_super+0x6c/0x110
  [82039.196486]  kill_anon_super+0xe/0x30
  [82039.197645]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [82039.198696]  deactivate_locked_super+0x3a/0x70
  [82039.199619]  cleanup_mnt+0x3b/0x80
  [82039.200559]  task_work_run+0x93/0xc0
  [82039.201505]  exit_to_usermode_loop+0xfa/0x100
  [82039.202436]  do_syscall_64+0x162/0x1d0
  [82039.203339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [82039.204091] RIP: 0033:0x7f8db8fbab37
  (...)
  [82039.206360] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  [82039.207132] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
  [82039.207906] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
  [82039.208621] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
  [82039.209285] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
  [82039.209984] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
  [82039.210642] irq event stamp: 0
  [82039.211306] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [82039.211971] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.212643] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
  [82039.213304] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [82039.213875] ---[ end trace f2521afa616ddccf ]---

Fix this by releasing the reserved metadata on failure to allocate data
extent(s) for the inode cache.

Fixes: 69fe2d75dd91d0 ("btrfs: make the delalloc block rsv per inode")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agodm snapshot: rework COW throttling to fix deadlock
Mikulas Patocka [Wed, 2 Oct 2019 10:15:53 +0000 (06:15 -0400)]
dm snapshot: rework COW throttling to fix deadlock

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit b21555786f18cd77f2311ad89074533109ae3ffa ]

Commit 721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.

The implementation of this throttling mechanism is prone to a deadlock:

1. One or more threads write to the origin device causing COW, which is
   performed by kcopyd.

2. At some point some of these threads might reach the s->cow_count
   semaphore limit and block in down(&s->cow_count), holding a read lock
   on _origins_lock.

3. Someone tries to acquire a write lock on _origins_lock, e.g.,
   snapshot_ctr(), which blocks because the threads at step (2) already
   hold a read lock on it.

4. A COW operation completes and kcopyd runs dm-snapshot's completion
   callback, which ends up calling pending_complete().
   pending_complete() tries to resubmit any deferred origin bios. This
   requires acquiring a read lock on _origins_lock, which blocks.

   This happens because the read-write semaphore implementation gives
   priority to writers, meaning that as soon as a writer tries to enter
   the critical section, no readers will be allowed in, until all
   writers have completed their work.

   So, pending_complete() waits for the writer at step (3) to acquire
   and release the lock. This writer waits for the readers at step (2)
   to release the read lock and those readers wait for
   pending_complete() (the kcopyd thread) to signal the s->cow_count
   semaphore: DEADLOCK.

The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html

Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.

Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Tested-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agodm snapshot: introduce account_start_copy() and account_end_copy()
Mikulas Patocka [Wed, 2 Oct 2019 10:14:17 +0000 (06:14 -0400)]
dm snapshot: introduce account_start_copy() and account_end_copy()

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit a2f83e8b0c82c9500421a26c49eb198b25fcdea3 ]

This simple refactoring moves code for modifying the semaphore cow_count
into separate functions to prepare for changes that will extend these
methods to provide for a more sophisticated mechanism for COW
throttling.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoio_uring: fix up O_NONBLOCK handling for sockets
Jens Axboe [Thu, 17 Oct 2019 15:20:46 +0000 (09:20 -0600)]
io_uring: fix up O_NONBLOCK handling for sockets

BugLink: https://bugs.launchpad.net/bugs/1851550
[ Upstream commit 491381ce07ca57f68c49c79a8a43da5b60749e32 ]

We've got two issues with the non-regular file handling for non-blocking
IO:

1) We don't want to re-do a short read in full for a non-regular file,
   as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
   we need to punt to async context even if the file is opened as
   non-blocking. Otherwise the caller always gets -EAGAIN.

Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.

Cc: stable@vger.kernel.org
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoPCI: Fix missing inline for pci_pr3_present()
Takashi Iwai [Wed, 23 Oct 2019 14:25:28 +0000 (22:25 +0800)]
PCI: Fix missing inline for pci_pr3_present()

BugLink: https://bugs.launchpad.net/bugs/1840835
The inline prefix was missing in the dummy function pci_pr3_present()
definition.  Fix it.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 52525b7a3cf8 ("PCI: Add a helper to check Power Resource Requirements _PR3 existence")
Link: https://lore.kernel.org/r/201910212111.qHm6OcWx%lkp@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 46b4bff6572b0552b1ee062043621e4b252638d8 linux-next)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoALSA: hda: Allow HDA to be runtime suspended when dGPU is not bound to a driver
Kai-Heng Feng [Wed, 23 Oct 2019 14:25:26 +0000 (22:25 +0800)]
ALSA: hda: Allow HDA to be runtime suspended when dGPU is not bound to a driver

BugLink: https://bugs.launchpad.net/bugs/1840835
Nvidia proprietary driver doesn't support runtime power management, so
when a user only wants to use the integrated GPU, it's a common practice
to let dGPU not to bind any driver, and let its upstream port to be
runtime suspended. At the end of runtime suspension the port uses
platform power management to disable power through _OFF method of power
resource, which is listed by _PR3.

After commit b516ea586d71 ("PCI: Enable NVIDIA HDA controllers"), when
the dGPU comes with an HDA function, the HDA won't be suspended if the
dGPU is unbound, so the power resource can't be turned off by its
upstream port driver.

Commit 37a3a98ef601 ("ALSA: hda - Enable runtime PM only for
discrete GPU") only allows HDA to be runtime suspended once GPU is
bound, to keep APU's HDA working.

However, HDA on dGPU isn't that useful if dGPU is not bound to any
driver.  So let's relax the runtime suspend requirement for dGPU's HDA
function, to disable the power source to save lots of power.

BugLink: https://bugs.launchpad.net/bugs/1840835
Fixes: b516ea586d71 ("PCI: Enable NVIDIA HDA controllers")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20191018073848.14590-2-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit bacd861452d2be86a4df341b12e32db7dac8021e linux-next)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoPCI: Add a helper to check Power Resource Requirements _PR3 existence
Kai-Heng Feng [Wed, 23 Oct 2019 14:25:25 +0000 (22:25 +0800)]
PCI: Add a helper to check Power Resource Requirements _PR3 existence

BugLink: https://bugs.launchpad.net/bugs/1840835
A driver may want to know the existence of _PR3, to choose different
runtime suspend behavior. A user will be add in next patch.

This is mostly the same as nouveau_pr3_present().

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20191018073848.14590-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 52525b7a3cf82adec5c6cf0ecbd23ff228badc94 linux-next)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agox86/cpu: Add Comet Lake to the Intel CPU models header
Kan Liang [Tue, 22 Oct 2019 21:07:17 +0000 (00:07 +0300)]
x86/cpu: Add Comet Lake to the Intel CPU models header

BugLink: http://bugs.launchpad.net/bugs/1843794
Comet Lake is the new 10th Gen Intel processor. Add two new CPU model
numbers to the Intel family list.

The CPU model numbers are not published in the SDM yet but they come
from an authoritative internal source.

 [ bp: Touch up commit message. ]

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: ak@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
(cherry picked from commit 8d7c6ac3b2371eb1cbc9925a88f4d10efff374de)
Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: shiftfs: drop CAP_SYS_RESOURCE from effective capabilities
Christian Brauner [Wed, 23 Oct 2019 12:23:50 +0000 (14:23 +0200)]
UBUNTU: SAUCE: shiftfs: drop CAP_SYS_RESOURCE from effective capabilities

BugLink: https://bugs.launchpad.net/bugs/1849483
Currently shiftfs allows to exceed project quota and reserved space on
e.g. ext2. See [1] and especially [2] for a bug report. This is very
much not what we want. Quotas and reserverd space settings set on the
host need to respected. The cause for this issue is overriding the
credentials with the superblock creator's credentials whenever we
perform operations such as fallocate() or writes while retaining
CAP_SYS_RESOURCE.

The fix is to drop CAP_SYS_RESOURCE from the effective capability set
after we have made a copy of the superblock creator's credential at
superblock creation time. This very likely gives us more security than
we had before and the regression potential seems limited. I would like
to try this apporach first before coming up with something potentially
more sophisticated. I don't see why CAP_SYS_RESOURCE should become a
limiting factor in most use-cases.

[1]: https://github.com/lxc/lxd/issues/6333
[2]: https://github.com/lxc/lxd/issues/6333#issuecomment-545154838
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: shiftfs: setup correct s_maxbytes limit
Christian Brauner [Wed, 23 Oct 2019 12:22:28 +0000 (14:22 +0200)]
UBUNTU: SAUCE: shiftfs: setup correct s_maxbytes limit

BugLink: https://bugs.launchpad.net/bugs/1849482
Set the s_maxbytes limit to MAX_LFS_FILESIZE.
Currently shiftfs limits the maximum size for fallocate() needlessly
causing calls such as fallocate --length 2GB ./file to fail. This
limitation is arbitrary since it's not caused by the underlay but
rather by shiftfs itself capping the s_maxbytes. This causes bugs such
as the one reported in [1].

[1]: https://github.com/lxc/lxd/issues/6333
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoBluetooth: hidp: Fix assumptions on the return value of hidp_send_message
Dan Elkouby [Tue, 29 Oct 2019 16:25:55 +0000 (09:25 -0700)]
Bluetooth: hidp: Fix assumptions on the return value of hidp_send_message

BugLink: https://bugs.launchpad.net/bugs/1850443
hidp_send_message was changed to return non-zero values on success,
which some other bits did not expect. This caused spurious errors to be
propagated through the stack, breaking some drivers, such as hid-sony
for the Dualshock 4 in Bluetooth mode.

As pointed out by Dan Carpenter, hid-microsoft directly relied on that
assumption as well.

Fixes: 48d9cc9d85dd ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes")
Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
(cherry picked from commit 8bb3537095f107ed55ad51f6241165b397aaafac)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: ALSA: hda/realtek - Fix 2 front mics of codec 0x623
Aaron Ma [Wed, 30 Oct 2019 06:56:01 +0000 (14:56 +0800)]
UBUNTU: SAUCE: ALSA: hda/realtek - Fix 2 front mics of codec 0x623

BugLink: https://bugs.launchpad.net/bugs/1850599
These 2 ThinkCentres installed a new realtek codec ID 0x623,
it has 2 front mics with the same location on pin 0x18 and 0x19.

Apply fixup ALC283_FIXUP_HEADSET_MIC to change 1 front mic
location to right, then pulseaudio can handle them.
One "Front Mic" and one "Mic" will be shown, and audio output works
fine.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191024114439.31522-1-aaron.ma@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 8a6c55d0f883e9a7e7c91841434f3b6bbf932bb2
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: ALSA: hda/realtek - Add support for ALC623
Kailang Yang [Wed, 30 Oct 2019 06:56:00 +0000 (14:56 +0800)]
UBUNTU: SAUCE: ALSA: hda/realtek - Add support for ALC623

BugLink: https://bugs.launchpad.net/bugs/1850599
Support new codec ALC623.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/ed97b6a8bd9445ecb48bc763d9aaba7a@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit f0778871a13889b86a65d4ad34bef8340af9d082
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoHID: core: fix dmesg flooding if report field larger than 32bit
Joshua Clayton [Wed, 30 Oct 2019 07:16:52 +0000 (15:16 +0800)]
HID: core: fix dmesg flooding if report field larger than 32bit

BugLink: https://bugs.launchpad.net/bugs/1850600
Only warn once of oversize hid report value field

On HP spectre x360 convertible the message:
hid-sensor-hub 001F:8087:0AC2.0002: hid_field_extract() called with n (192) > 32! (kworker/1:2)
is continually printed many times per second, crowding out all else.
Protect dmesg by printing the warning only one time.

The size of the hid report field data structure should probably be increased.
The data structure is treated as a u32 in Linux, but an unlimited number
of bits in the USB hid spec, so there is some rearchitecture needed now that
devices are sending more than 32 bits.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
(cherry picked from commit 0af10eed9b7308187c7865024248b2a2a5aa382a)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoHID: core: Add printk_once variants to hid_warn() etc
Joshua Clayton [Wed, 30 Oct 2019 07:16:51 +0000 (15:16 +0800)]
HID: core: Add printk_once variants to hid_warn() etc

BugLink: https://bugs.launchpad.net/bugs/1850600
hid_warn_once() is needed. Add the others as part of the block.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
(cherry picked from commit aaeabb121a6271775cef35ec3197e9ef124fdb2a)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoHID: core: reformat and reduce hid_printk macros
Joshua Clayton [Wed, 30 Oct 2019 07:16:50 +0000 (15:16 +0800)]
HID: core: reformat and reduce hid_printk macros

BugLink: https://bugs.launchpad.net/bugs/1850600
Reformat hid_printk macros to use standard __VA_ARGS__ syntax.
Per Joe Perches hid_printk(), hid_emerg(), hid_crit(), and hid_alert() are
unlikely ever to be used. Remove them.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
(cherry picked from commit 337c22ab1d4f316f37e7a7c78bdea3d768270542)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails
Seth Forshee [Thu, 7 Nov 2019 16:08:25 +0000 (10:08 -0600)]
UBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails

BugLink: https://bugs.launchpad.net/bugs/1850994
ovl_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().

With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.

Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.

CVE-2019-15794

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: shiftfs: Restore vm_file value when lower fs mmap fails
Seth Forshee [Thu, 7 Nov 2019 16:08:24 +0000 (10:08 -0600)]
UBUNTU: SAUCE: shiftfs: Restore vm_file value when lower fs mmap fails

BugLink: https://bugs.launchpad.net/bugs/1850994
shiftfs_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().

With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.

Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.

CVE-2019-15794

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: fs: Move SB_I_NOSUID to the top of s_iflags
Seth Forshee [Thu, 7 Nov 2019 17:21:53 +0000 (11:21 -0600)]
UBUNTU: SAUCE: fs: Move SB_I_NOSUID to the top of s_iflags

BugLink: https://bugs.launchpad.net/bugs/1851677
SB_I_NOSUID was added by a sauce patch, and over time it has come
to occpy the same bit in s_iflags as SB_I_USERNS_VISIBLE without
being noticed. overlayfs will set SB_I_NOSUID when any lower
mount is nosuid. When this happens for a user namespace mount,
mount_too_revealing() will perform additional, unnecessary checks
which may block mounting when it should be allowed.

Move SB_I_NOSUID to prevent this conflict, and move it to the top
of s_iflags to make future conflicts less likely.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Sultan Alsawaf <sultan.alsawaf@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: (efi-lockdown) Really don't allow lifting lockdown from userspace
Seth Forshee [Tue, 5 Nov 2019 20:35:05 +0000 (14:35 -0600)]
UBUNTU: SAUCE: (efi-lockdown) Really don't allow lifting lockdown from userspace

BugLink: https://bugs.launchpad.net/bugs/1851380
"UBUNTU: SAUCE: (efi-lockdown) Add a SysRq option to lift kernel
lockdown" adds a sysrq key to lift kernel lockdown, which is
meant to only allow a physically present user to lift lockdown
using a keyboard. However, the code has a bug which also allows
root to lift lockdown through /proc/sysrq-trigger. Fix this bug
to make this work as intended.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: [Debian]: do not skip tests for linux-hwe-edge
Thadeu Lima de Souza Cascardo [Fri, 25 Oct 2019 12:14:00 +0000 (14:14 +0200)]
UBUNTU: [Debian]: do not skip tests for linux-hwe-edge

Ignore: yes

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoUBUNTU: upstream stable to v5.3.8
Connor Kuehl [Tue, 29 Oct 2019 15:26:28 +0000 (08:26 -0700)]
UBUNTU: upstream stable to v5.3.8

BugLink: https://bugs.launchpad.net/bugs/1850456
Ignore: yes
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoLinux 5.3.8
Greg Kroah-Hartman [Tue, 29 Oct 2019 08:22:48 +0000 (09:22 +0100)]
Linux 5.3.8

BugLink: https://bugs.launchpad.net/bugs/1850456
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoRDMA/cxgb4: Do not dma memory off of the stack
Greg KH [Tue, 1 Oct 2019 16:56:11 +0000 (18:56 +0200)]
RDMA/cxgb4: Do not dma memory off of the stack

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 3840c5b78803b2b6cc1ff820100a74a092c40cbb upstream.

Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing.  On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.

Resolve this by allocating the memory for a message off of the heap
instead of the stack.  kmalloc() always will give us a proper memory
location that DMA will work correctly from.

Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoblk-rq-qos: fix first node deletion of rq_qos_del()
Tejun Heo [Tue, 15 Oct 2019 15:49:27 +0000 (08:49 -0700)]
blk-rq-qos: fix first node deletion of rq_qos_del()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 307f4065b9d7c1e887e8bdfb2487e4638559fea1 upstream.

rq_qos_del() incorrectly assigns the node being deleted to the head if
it was the first on the list in the !prev path.  Fix it by iterating
with ** instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoof: reserved_mem: add missing of_node_put() for proper ref-counting
Chris Goldsworthy [Sun, 20 Oct 2019 01:57:24 +0000 (18:57 -0700)]
of: reserved_mem: add missing of_node_put() for proper ref-counting

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 5dba51754b04a941a1064f584e7a7f607df3f9bc upstream.

Commit d698a388146c ("of: reserved-memory: ignore disabled memory-region
nodes") added an early return in of_reserved_mem_device_init_by_idx(), but
didn't call of_node_put() on a device_node whose ref-count was incremented
in the call to of_parse_phandle() preceding the early exit.

Fixes: d698a388146c ("of: reserved-memory: ignore disabled memory-region nodes")
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: stable@vger.kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoopp: of: drop incorrect lockdep_assert_held()
Viresh Kumar [Thu, 10 Oct 2019 10:25:33 +0000 (15:55 +0530)]
opp: of: drop incorrect lockdep_assert_held()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f2edbb6699b0bc6e4f789846b99007200546c6c2 upstream.

_find_opp_of_np() doesn't traverse the list of OPP tables but instead
just the entries within an OPP table and so only requires to lock the
OPP table itself.

The lockdep_assert_held() was added there by mistake and isn't really
required.

Fixes: 5d6d106fa455 ("OPP: Populate required opp tables from "required-opps" property")
Cc: v5.0+ <stable@vger.kernel.org> # v5.0+
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoPCI: PM: Fix pci_power_up()
Rafael J. Wysocki [Mon, 14 Oct 2019 11:25:00 +0000 (13:25 +0200)]
PCI: PM: Fix pci_power_up()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 45144d42f299455911cc29366656c7324a3a7c97 upstream.

There is an arbitrary difference between the system resume and
runtime resume code paths for PCI devices regarding the delay to
apply when switching the devices from D3cold to D0.

Namely, pci_restore_standard_config() used in the runtime resume
code path calls pci_set_power_state() which in turn invokes
__pci_start_power_transition() to power up the device through the
platform firmware and that function applies the transition delay
(as per PCI Express Base Specification Revision 2.0, Section 6.6.1).
However, pci_pm_default_resume_early() used in the system resume
code path calls pci_power_up() which doesn't apply the delay at
all and that causes issues to occur during resume from
suspend-to-idle on some systems where the delay is required.

Since there is no reason for that difference to exist, modify
pci_power_up() to follow pci_set_power_state() more closely and
invoke __pci_start_power_transition() from there to call the
platform firmware to power up the device (in case that's necessary).

Fixes: db288c9c5f9d ("PCI / PM: restore the original behavior of pci_set_power_state()")
Reported-by: Daniel Drake <drake@endlessm.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Link: https://lore.kernel.org/linux-pm/CAD8Lp44TYxrMgPLkHCqF9hv6smEurMXvmmvmtyFhZ6Q4SE+dig@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoxen/netback: fix error path of xenvif_connect_data()
Juergen Gross [Fri, 18 Oct 2019 07:45:49 +0000 (09:45 +0200)]
xen/netback: fix error path of xenvif_connect_data()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 3d5c1a037d37392a6859afbde49be5ba6a70a6b3 upstream.

xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().

Remove the superfluous module_put().

Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoceph: just skip unrecognized info in ceph_reply_info_extra
Jeff Layton [Thu, 26 Sep 2019 20:05:11 +0000 (16:05 -0400)]
ceph: just skip unrecognized info in ceph_reply_info_extra

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1d3f87233e26362fc3d4e59f0f31a71b570f90b9 upstream.

In the future, we're going to want to extend the ceph_reply_info_extra
for create replies. Currently though, the kernel code doesn't accept an
extra blob that is larger than the expected data.

Change the code to skip over any unrecognized fields at the end of the
extra blob, rather than returning -EIO.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agocpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
Rafael J. Wysocki [Tue, 8 Oct 2019 23:29:10 +0000 (01:29 +0200)]
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 65650b35133ff20f0c9ef0abd5c3c66dbce3ae57 upstream.

It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.

The latter issue has been present since commit 45975c7d21a1 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit 90de2a4aa9f3 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.

Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.

Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomemstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
Christophe JAILLET [Sat, 5 Oct 2019 11:21:01 +0000 (13:21 +0200)]
memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 28c9fac09ab0147158db0baeec630407a5e9b892 upstream.

If 'jmb38x_ms_count_slots()' returns 0, we must undo the previous
'pci_request_regions()' call.

Goto 'err_out_int' to fix it.

Fixes: 60fdd931d577 ("memstick: add support for JMicron jmb38x MemoryStick host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoKVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
Greg Kurz [Fri, 27 Sep 2019 11:53:43 +0000 (13:53 +0200)]
KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 12ade69c1eb9958b13374edf5ef742ea20ccffde upstream.

Connecting a vCPU to a XIVE KVM device means establishing a 1:1
association between a vCPU id and the offset (VP id) of a VP
structure within a fixed size block of VPs. We currently try to
enforce the 1:1 relationship by checking that a vCPU with the
same id isn't already connected. This is good but unfortunately
not enough because we don't map VP ids to raw vCPU ids but to
packed vCPU ids, and the packing function kvmppc_pack_vcpu_id()
isn't bijective by design. We got away with it because QEMU passes
vCPU ids that fit well in the packing pattern. But nothing prevents
userspace to come up with a forged vCPU id resulting in a packed id
collision which causes the KVM device to associate two vCPUs to the
same VP. This greatly confuses the irq layer and ultimately crashes
the kernel, as shown below.

Example: a guest with 1 guest thread per core, a core stride of
8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to
inject at some point an invalid vCPU id 348, which is the packed
version of itself and 2392, we get:

genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348)
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
Call Trace:
[c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable)
[c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0
[c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260
[c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348'
WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:  c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0
REGS: c000003f7f9934b0 TRAP: 0700   Not tainted  (5.3.0-xive-nr-servers-5.3-gku+)
MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48228222  XER: 20040000
CFAR: c000000000131a50 IRQMASK: 0
GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057
GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf
GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033
GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006
GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48
GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0
NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200
LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200
Call Trace:
[c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable)
[c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150
[c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0
[c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100
[c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80
[c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
Instruction dump:
2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378
3884f080 3863b948 4bbf6925 60000000 <0fe000004bffff7c fba10088 4bbf6e41
---[ end trace b925b67a74a1d8d1 ]---
BUG: Kernel NULL pointer dereference at 0x00000010
Faulting instruction address: 0xc00800000d44fc04
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G        W         5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:  c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970
REGS: c000003f7f9938e0 TRAP: 0300   Tainted: G        W          (5.3.0-xive-nr-servers-5.3-gku+)
MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24228882  XER: 20040000
CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000
GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8
GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd
GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8
GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000
GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90
NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm]
LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm]
Call Trace:
[c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable)
[c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm]
[c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm]
[c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320
[c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0
[c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440
[c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
Instruction dump:
3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c
2f890000 419effb8 4800821d e8410018 <e9230010e9490008 9b2a0039 7c0004ac
---[ end trace b925b67a74a1d8d2 ]---

Kernel panic - not syncing: Fatal exception

This affects both XIVE and XICS-on-XIVE devices since the beginning.

Check the VP id instead of the vCPU id when a new vCPU is connected.
The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu()
is moved after the check to avoid the need for rollback.

Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: tracepoints: Fix bad entry members of qgroup events
Qu Wenruo [Thu, 17 Oct 2019 02:38:37 +0000 (10:38 +0800)]
btrfs: tracepoints: Fix bad entry members of qgroup events

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1b2442b4ae0f234daeadd90e153b466332c466d8 upstream.

[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:

  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2

The @type can be completely garbage, as DATA type is not possible for
trace_qgroup_meta_reserve() trace event.

[CAUSE]
Ther are several problems related to qgroup trace events:
- Unassigned entry member
  Member entry::type of trace_qgroup_update_reserve() and
  trace_qgourp_meta_reserve() is not assigned

- Redundant entry member
  Member entry::type is completely useless in
  trace_qgroup_meta_convert()

Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: tracepoints: Fix wrong parameter order for qgroup events
Qu Wenruo [Thu, 17 Oct 2019 02:38:36 +0000 (10:38 +0800)]
btrfs: tracepoints: Fix wrong parameter order for qgroup events

BugLink: https://bugs.launchpad.net/bugs/1850456
commit fd2b007eaec898564e269d1f478a2da0380ecf51 upstream.

[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:

  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2

The diff should always be alinged to sector size (4k), so there is
definitely something wrong.

[CAUSE]
For the wrong @diff, it's caused by wrong parameter order.
The correct parameters are:

  struct btrfs_root, s64 diff, int type.

However the parameters used are:

  struct btrfs_root, int type, s64 diff.

Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoBtrfs: check for the full sync flag while holding the inode lock during fsync
Filipe Manana [Wed, 16 Oct 2019 15:28:52 +0000 (16:28 +0100)]
Btrfs: check for the full sync flag while holding the inode lock during fsync

BugLink: https://bugs.launchpad.net/bugs/1850456
commit ba0b084ac309283db6e329785c1dc4f45fdbd379 upstream.

We were checking for the full fsync flag in the inode before locking the
inode, which is racy, since at that that time it might not be set but
after we acquire the inode lock some other task set it. One case where
this can happen is on a system low on memory and some concurrent task
failed to allocate an extent map and therefore set the full sync flag on
the inode, to force the next fsync to work in full mode.

A consequence of missing the full fsync flag set is hitting the problems
fixed by commit 0c713cbab620 ("Btrfs: fix race between ranged fsync and
writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
of weird inconsistencies after replaying a log due to file extents items
representing ranges that overlap.

So just move the check such that it's done after locking the inode and
before starting writeback again.

Fixes: 0c713cbab620 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoBtrfs: fix qgroup double free after failure to reserve metadata for delalloc
Filipe Manana [Tue, 15 Oct 2019 09:54:39 +0000 (10:54 +0100)]
Btrfs: fix qgroup double free after failure to reserve metadata for delalloc

BugLink: https://bugs.launchpad.net/bugs/1850456
commit c7967fc1499beb9b70bb9d33525fb0b384af8883 upstream.

If we fail to reserve metadata for delalloc operations we end up releasing
the previously reserved qgroup amount twice, once explicitly under the
'out_qgroup' label by calling btrfs_qgroup_free_meta_prealloc() and once
again, under label 'out_fail', by calling btrfs_inode_rsv_release() with a
value of 'true' for its 'qgroup_free' argument, which results in
btrfs_qgroup_free_meta_prealloc() being called again, so we end up having
a double free.

Also if we fail to reserve the necessary qgroup amount, we jump to the
label 'out_fail', which calls btrfs_inode_rsv_release() and that in turns
calls btrfs_qgroup_free_meta_prealloc(), even though we weren't able to
reserve any qgroup amount. So we freed some amount we never reserved.

So fix this by removing the call to btrfs_inode_rsv_release() in the
failure path, since it's not necessary at all as we haven't changed the
inode's block reserve in any way at this point.

Fixes: c8eaeac7b73434 ("btrfs: reserve delalloc metadata differently")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: don't needlessly create extent-refs kernel thread
David Sterba [Sat, 12 Oct 2019 16:42:10 +0000 (18:42 +0200)]
btrfs: don't needlessly create extent-refs kernel thread

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 80ed4548d0711d15ca51be5dee0ff813051cfc90 upstream.

The patch 32b593bfcb58 ("Btrfs: remove no longer used function to run
delayed refs asynchronously") removed the async delayed refs but the
thread has been created, without any use. Remove it to avoid resource
consumption.

Fixes: 32b593bfcb58 ("Btrfs: remove no longer used function to run delayed refs asynchronously")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoBtrfs: add missing extents release on file extent cluster relocation error
Filipe Manana [Wed, 9 Oct 2019 16:43:45 +0000 (17:43 +0100)]
Btrfs: add missing extents release on file extent cluster relocation error

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 44db1216efe37bf670f8d1019cdc41658d84baf5 upstream.

If we error out when finding a page at relocate_file_extent_cluster(), we
need to release the outstanding extents counter on the relocation inode,
set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise
the inode's block reserve size can never decrease to zero and metadata
space is leaked. Therefore add a call to btrfs_delalloc_release_extents()
in case we can't find the target page.

Fixes: 8b62f87bad9c ("Btrfs: rework outstanding_extents")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
Qu Wenruo [Thu, 10 Oct 2019 02:39:26 +0000 (10:39 +0800)]
btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 4b654acdae850f48b8250b9a578a4eaa518c7a6f upstream.

In btrfs_read_block_groups(), if we have an invalid block group which
has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS
feature, we error out without freeing the block group cache.

This patch will add the missing btrfs_put_block_group() to prevent
memory leak.

Note for stable backports: the file to patch in versions <= 5.3 is
fs/btrfs/extent-tree.c

Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agopinctrl: armada-37xx: swap polarity on LED group
Patrick Williams [Tue, 1 Oct 2019 15:51:38 +0000 (10:51 -0500)]
pinctrl: armada-37xx: swap polarity on LED group

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b835d6953009dc350d61402a854b5a7178d8c615 upstream.

The configuration registers for the LED group have inverted
polarity, which puts the GPIO into open-drain state when used in
GPIO mode.  Switch to '0' for GPIO and '1' for LED modes.

Fixes: 87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support for Armada 37xx")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001155154.99710-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agopinctrl: armada-37xx: fix control of pins 32 and up
Patrick Williams [Tue, 1 Oct 2019 15:46:31 +0000 (10:46 -0500)]
pinctrl: armada-37xx: fix control of pins 32 and up

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 20504fa1d2ffd5d03cdd9dc9c9dd4ed4579b97ef upstream.

The 37xx configuration registers are only 32 bits long, so
pins 32-35 spill over into the next register.  The calculation
for the register address was done, but the bitmask was not, so
any configuration to pin 32 or above resulted in a bitmask that
overflowed and performed no action.

Fix the register / offset calculation to also adjust the offset.

Fixes: 5715092a458c ("pinctrl: armada-37xx: Add gpio support")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001154634.96165-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agopinctrl: cherryview: restore Strago DMI workaround for all versions
Dmitry Torokhov [Tue, 24 Sep 2019 02:49:58 +0000 (19:49 -0700)]
pinctrl: cherryview: restore Strago DMI workaround for all versions

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 260996c30f4f3a732f45045e3e0efe27017615e4 upstream.

This is essentially a revert of:

e3f72b749da2 pinctrl: cherryview: fix Strago DMI workaround
86c5dd6860a6 pinctrl: cherryview: limit Strago DMI workarounds to version 1.0

because even with 1.1 versions of BIOS there are some pins that are
configured as interrupts but not claimed by any driver, and they
sometimes fire up and result in interrupt storms that cause touchpad
stop functioning and other issues.

Given that we are unlikely to qualify another firmware version for a
while it is better to keep the workaround active on all Strago boards.

Reported-by: Alex Levin <levinale@chromium.org>
Fixes: 86c5dd6860a6 ("pinctrl: cherryview: limit Strago DMI workarounds to version 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Alex Levin <levinale@chromium.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agox86/hyperv: Make vapic support x2apic mode
Roman Kagan [Thu, 10 Oct 2019 12:33:05 +0000 (12:33 +0000)]
x86/hyperv: Make vapic support x2apic mode

BugLink: https://bugs.launchpad.net/bugs/1850456
commit e211288b72f15259da86eed6eca680758dbe9e74 upstream.

Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.

However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.

As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.

According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors.  The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.

Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agox86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
Sean Christopherson [Tue, 1 Oct 2019 20:50:19 +0000 (13:50 -0700)]
x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 7a22e03b0c02988e91003c505b34d752a51de344 upstream.

Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit.  The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE.  If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.

  smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:x2apic_dead_cpu+0x1a/0x30
  Call Trace:
   cpuhp_invoke_callback+0x9a/0x580
   _cpu_up+0x10d/0x140
   do_cpu_up+0x69/0xb0
   smp_init+0x63/0xa9
   kernel_init_freeable+0xd7/0x229
   ? rest_init+0xa0/0xa0
   kernel_init+0xa/0x100
   ret_from_fork+0x35/0x40

Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agox86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
Steve Wahl [Tue, 24 Sep 2019 21:03:55 +0000 (16:03 -0500)]
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 2aa85f246c181b1fa89f27e8e20c5636426be624 upstream.

Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.

Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.

Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
 1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR.  But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses.  So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.

The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.

This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).

The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.

 [ bp: Touchups. ]

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoirqchip/sifive-plic: Switch to fasteoi flow
Marc Zyngier [Sun, 15 Sep 2019 14:17:45 +0000 (15:17 +0100)]
irqchip/sifive-plic: Switch to fasteoi flow

BugLink: https://bugs.launchpad.net/bugs/1850456
commit bb0fed1c60cccbe4063b455a7228818395dac86e upstream.

The SiFive PLIC interrupt controller seems to have all the HW
features to support the fasteoi flow, but the driver seems to be
stuck in a distant past. Bring it into the 21st century.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Palmer Dabbelt <palmer@sifive.com> (QEMU Boot)
Tested-by: Darius Rad <darius@bluespec.com> (on 2 HW PLIC implementations)
Tested-by: Paul Walmsley <paul.walmsley@sifive.com> (HiFive Unleashed)
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8636gxskmj.wl-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodm cache: fix bugs when a GFP_NOWAIT allocation fails
Mikulas Patocka [Wed, 16 Oct 2019 13:21:50 +0000 (09:21 -0400)]
dm cache: fix bugs when a GFP_NOWAIT allocation fails

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 13bd677a472d534bf100bab2713efc3f9e3f5978 upstream.

GFP_NOWAIT allocation can fail anytime - it doesn't wait for memory being
available and it fails if the mempool is exhausted and there is not enough
memory.

If we go down this path:
  map_bio -> mg_start -> alloc_migration -> mempool_alloc(GFP_NOWAIT)
we can see that map_bio() doesn't check the return value of mg_start(),
and the bio is leaked.

If we go down this path:
  map_bio -> mg_start -> mg_lock_writes -> alloc_prison_cell ->
  dm_bio_prison_alloc_cell_v2 -> mempool_alloc(GFP_NOWAIT) ->
  mg_lock_writes -> mg_complete
the bio is ended with an error - it is unacceptable because it could
cause filesystem corruption if the machine ran out of memory
temporarily.

Change GFP_NOWAIT to GFP_NOIO, so that the mempool code will properly
wait until memory becomes available. mempool_alloc with GFP_NOIO can't
fail, so remove the code paths that deal with allocation failure.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agofs/dax: Fix pmd vs pte conflict detection
Dan Williams [Mon, 21 Oct 2019 16:29:20 +0000 (09:29 -0700)]
fs/dax: Fix pmd vs pte conflict detection

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 6370740e5f8ef12de7f9a9bf48a0393d202cd827 upstream.

Users reported a v5.3 performance regression and inability to establish
huge page mappings. A revised version of the ndctl "dax.sh" huge page
unit test identifies commit 23c84eb78375 "dax: Fix missed wakeup with
PMD faults" as the source.

Update get_unlocked_entry() to check for NULL entries before checking
the entry order, otherwise NULL is misinterpreted as a present pte
conflict. The 'order' check needs to happen before the locked check as
an unlocked entry at the wrong order must fallback to lookup the correct
order.

Reported-by: Jeff Smits <jeff.smits@intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agotracing: Fix race in perf_trace_buf initialization
Prateek Sood [Tue, 15 Oct 2019 06:17:25 +0000 (11:47 +0530)]
tracing: Fix race in perf_trace_buf initialization

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 6b1340cc00edeadd52ebd8a45171f38c8de2a387 upstream.

A race condition exists while initialiazing perf_trace_buf from
perf_trace_init() and perf_kprobe_init().

      CPU0                                        CPU1
perf_trace_init()
  mutex_lock(&event_mutex)
    perf_trace_event_init()
      perf_trace_event_reg()
        total_ref_count == 0
buf = alloc_percpu()
        perf_trace_buf[i] = buf
        tp_event->class->reg() //fails       perf_kprobe_init()
goto fail                              perf_trace_event_init()
                                                 perf_trace_event_reg()
        fail:
  total_ref_count == 0

                                                   total_ref_count == 0
                                                   buf = alloc_percpu()
                                                   perf_trace_buf[i] = buf
                                                   tp_event->class->reg()
                                                   total_ref_count++

          free_percpu(perf_trace_buf[i])
          perf_trace_buf[i] = NULL

Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
fix this race.

The race caused the following bug:

 Unable to handle kernel paging request at virtual address 0000003106f2003c
 Mem abort info:
   ESR = 0x96000045
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000045
   CM = 0, WnR = 1
 user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
 [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
 Internal error: Oops: 96000045 [#1] PREEMPT SMP
 Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
 pstate: 80400005 (Nzcv daif +PAN -UAO)
 pc : __memset+0x20/0x1ac
 lr : memset+0x3c/0x50
 sp : ffffffc09319fc50

  __memset+0x20/0x1ac
  perf_trace_buf_alloc+0x140/0x1a0
  perf_trace_sys_enter+0x158/0x310
  syscall_trace_enter+0x348/0x7c0
  el0_svc_common+0x11c/0x368
  el0_svc_handler+0x12c/0x198
  el0_svc+0x8/0xc

Ramdumps showed the following:
  total_ref_count = 3
  perf_trace_buf = (
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL)

Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org
Cc: stable@vger.kernel.org
Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoperf/aux: Fix AUX output stopping
Alexander Shishkin [Tue, 22 Oct 2019 07:39:40 +0000 (10:39 +0300)]
perf/aux: Fix AUX output stopping

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f3a519e4add93b7b31a6616f0b09635ff2e6a159 upstream.

Commit:

  8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoCIFS: Fix use after free of file info structures
Pavel Shilovsky [Wed, 23 Oct 2019 22:37:19 +0000 (15:37 -0700)]
CIFS: Fix use after free of file info structures

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1a67c415965752879e2e9fad407bc44fc7f25f23 upstream.

Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.

Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agocifs: Fix missed free operations
Chuhong Yuan [Mon, 14 Oct 2019 07:15:31 +0000 (15:15 +0800)]
cifs: Fix missed free operations

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 783bf7b8b641167fb6f3f4f787f60ae62bad41b3 upstream.

cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.

CC: Stable <stable@vger.kernel.org>
Fixes: aa081859b10c ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoCIFS: avoid using MID 0xFFFF
Roberto Bergantinos Corpas [Mon, 14 Oct 2019 08:59:23 +0000 (10:59 +0200)]
CIFS: avoid using MID 0xFFFF

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 03d9a9fe3f3aec508e485dd3dcfa1e99933b4bdb upstream.

According to MS-CIFS specification MID 0xFFFF should not be used by the
CIFS client, but we actually do. Besides, this has proven to cause races
leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
MID is a 2 byte value easy to reach in CurrentMid which may conflict with
an oplock break notification request coming from server

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
Marc Zyngier [Fri, 13 Sep 2019 09:57:50 +0000 (10:57 +0100)]
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 603afdc9438ac546181e843f807253d75d3dbc45 upstream.

Allow the user to select the workaround for TX2-219, and update
the silicon-errata.rst file to reflect this.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
Marc Zyngier [Tue, 9 Apr 2019 15:26:21 +0000 (16:26 +0100)]
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 93916beb70143c46bf1d2bacf814be3a124b253b upstream.

It appears that the only case where we need to apply the TX2_219_TVM
mitigation is when the core is in SMT mode. So let's condition the
enabling on detecting a CPU whose MPIDR_EL1.Aff0 is non-zero.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: Avoid Cavium TX2 erratum 219 when switching TTBR
Marc Zyngier [Tue, 9 Apr 2019 15:22:24 +0000 (16:22 +0100)]
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 9405447ef79bc93101373e130f72e9e6cbf17dbb upstream.

As a PRFM instruction racing against a TTBR update can have undesirable
effects on TX2, NOP-out such PRFM on cores that are affected by
the TX2-219 erratum.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
Marc Zyngier [Thu, 7 Feb 2019 16:01:21 +0000 (16:01 +0000)]
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set

BugLink: https://bugs.launchpad.net/bugs/1850456
commit d3ec3a08fa700c8b46abb137dce4e2514a6f9668 upstream.

In order to workaround the TX2-219 erratum, it is necessary to trap
TTBRx_EL1 accesses to EL2. This is done by setting HCR_EL2.TVM on
guest entry, which has the side effect of trapping all the other
VM-related sysregs as well.

To minimize the overhead, a fast path is used so that we don't
have to go all the way back to the main sysreg handling code,
unless the rest of the hypervisor expects to see these accesses.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoEDAC/ghes: Fix Use after free in ghes_edac remove path
James Morse [Mon, 14 Oct 2019 17:19:18 +0000 (18:19 +0100)]
EDAC/ghes: Fix Use after free in ghes_edac remove path

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1e72e673b9d102ff2e8333e74b3308d012ddf75b upstream.

ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.

ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.

When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.

But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.

This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.

Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.

ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.

This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.

This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.

 [ bp: merge into a single patch. ]

Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoparisc: Fix vmap memory leak in ioremap()/iounmap()
Helge Deller [Fri, 4 Oct 2019 17:23:37 +0000 (19:23 +0200)]
parisc: Fix vmap memory leak in ioremap()/iounmap()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 513f7f747e1cba81f28a436911fba0b485878ebd upstream.

Sven noticed that calling ioremap() and iounmap() multiple times leads
to a vmap memory leak:
vmap allocation for size 4198400 failed:
use vmalloc=<size> to increase size

It seems we missed calling vunmap() in iounmap().

Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agolib/vdso: Make clock_getres() POSIX compliant again
Thomas Gleixner [Mon, 21 Oct 2019 10:07:15 +0000 (12:07 +0200)]
lib/vdso: Make clock_getres() POSIX compliant again

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1638b8f096ca165965189b9626564c933c79fe63 upstream.

A recent commit removed the NULL pointer check from the clock_getres()
implementation causing a test case to fault.

POSIX requires an explicit NULL pointer check for clock_getres() aside of
the validity check of the clock_id argument for obscure reasons.

Add it back for both 32bit and 64bit.

Note, this is only a partial revert of the offending commit which does not
bring back the broken fallback invocation in the the 32bit compat
implementations of clock_getres() and clock_gettime().

Fixes: a9446a906f52 ("lib/vdso/32: Remove inconsistent NULL pointer checks")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910211202260.1904@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agos390/kaslr: add support for R_390_GLOB_DAT relocation type
Gerald Schaefer [Mon, 21 Oct 2019 17:56:00 +0000 (19:56 +0200)]
s390/kaslr: add support for R_390_GLOB_DAT relocation type

BugLink: https://bugs.launchpad.net/bugs/1850456
commit ac49303d9ef0ad98b79867a380ef23480e48870b upstream.

Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".

Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.

Fixes: 805bc0bc238f ("s390/kernel: build a relocatable kernel")
Cc: <stable@vger.kernel.org> # v5.2+
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agos390/zcrypt: fix memleak at release
Johan Hovold [Thu, 10 Oct 2019 13:13:33 +0000 (15:13 +0200)]
s390/zcrypt: fix memleak at release

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 388bb19be8eab4674a660e0c97eaf60775362bc7 upstream.

If a process is interrupted while accessing the crypto device and the
global ap_perms_mutex is contented, release() could return early and
fail to free related resources.

Fixes: 00fab2350e6b ("s390/zcrypt: multiple zcrypt device nodes support")
Cc: <stable@vger.kernel.org> # 4.19
Cc: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoxtensa: fix change_bit in exclusive access option
Max Filippov [Wed, 16 Oct 2019 04:51:43 +0000 (21:51 -0700)]
xtensa: fix change_bit in exclusive access option

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 775fd6bfefc66a8c33e91dd9687ed530643b954d upstream.

change_bit implementation for XCHAL_HAVE_EXCLUSIVE case changes all bits
except the one required due to copy-paste error from clear_bit.

Cc: stable@vger.kernel.org # v5.2+
Fixes: f7c34874f04a ("xtensa: add exclusive atomics support")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoxtensa: drop EXPORT_SYMBOL for outs*/ins*
Max Filippov [Mon, 14 Oct 2019 22:48:19 +0000 (15:48 -0700)]
xtensa: drop EXPORT_SYMBOL for outs*/ins*

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 8b39da985194aac2998dd9e3a22d00b596cebf1e upstream.

Custom outs*/ins* implementations are long gone from the xtensa port,
remove matching EXPORT_SYMBOLs.
This fixes the following build warnings issued by modpost since commit
15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions"):

  WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL

Cc: stable@vger.kernel.org
Fixes: d38efc1f150f ("xtensa: adopt generic io routines")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agozram: fix race between backing_dev_show and backing_dev_store
Chenwandun [Sat, 19 Oct 2019 03:20:14 +0000 (20:20 -0700)]
zram: fix race between backing_dev_show and backing_dev_store

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f7daefe4231e57381d92c2e2ad905a899c28e402 upstream.

CPU0:        CPU1:
backing_dev_show        backing_dev_store
    ......    ......
    file = zram->backing_dev;
    down_read(&zram->init_lock);    down_read(&zram->init_init_lock)
    file_path(file, ...);    zram->backing_dev = backing_dev;
    up_read(&zram->init_lock);    up_read(&zram->init_lock);

gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.

backtrace:
  d_path+0xcc/0x174
  file_path+0x10/0x18
  backing_dev_show+0x40/0xb4
  dev_attr_show+0x20/0x54
  sysfs_kf_seq_show+0x9c/0x10c
  kernfs_seq_show+0x28/0x30
  seq_read+0x184/0x488
  kernfs_fop_read+0x5c/0x1a4
  __vfs_read+0x44/0x128
  vfs_read+0xa0/0x138
  SyS_read+0x54/0xb4

Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more...
Jane Chu [Mon, 14 Oct 2019 21:12:29 +0000 (14:12 -0700)]
mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 3d7fed4ad8ccb691d217efbb0f934e6a4df5ef91 upstream.

Mmap /dev/dax more than once, then read the poison location using
address from one of the mappings.  The other mappings due to not having
the page mapped in will cause SIGKILLs delivered to the process.
SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
handle the UE.

Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.

Details -

  ndctl inject-error --block=10 --count=1 namespace6.0

  ./read_poison -x dax6.0 -o 5120 -m 2
  mmaped address 0x7f5bb6600000
  mmaped address 0x7f3cf3600000
  doing local read at address 0x7f3cf3601400
  Killed

Console messages in instrumented kernel -

  mce: Uncorrected hardware memory error in user-access at edbe201400
  Memory failure: tk->addr = 7f5bb6601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  dev_pagemap_mapping_shift: page edbe201: no PUD
  Memory failure: tk->size_shift == 0
  Memory failure: Unable to find user space address edbe201 in read_poison
  Memory failure: tk->addr = 7f3cf3601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  Memory failure: tk->size_shift = 21
  Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
    => to deliver SIGKILL
  Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
    => to deliver SIGBUS

Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.com
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agohugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
David Hildenbrand [Sat, 19 Oct 2019 03:20:05 +0000 (20:20 -0700)]
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f231fe4235e22e18d847e05cbe705deaca56580a upstream.

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

Let's make sure that we only consider online memory (managed by the
buddy) that has initialized memmaps.  ZONE_DEVICE is not applicable.

page_zone() will call page_to_nid(), which will trigger
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) with CONFIG_PAGE_POISONING
and CONFIG_DEBUG_VM_PGFLAGS when called on uninitialized memmaps.  This
can be the case when an offline memory block (e.g., never onlined) is
spanned by a zone.

Note: As explained by Michal in [1], alloc_contig_range() will verify
the range.  So it boils down to the wrong access in this function.

[1] http://lkml.kernel.org/r/20180423000943.GO17484@dhcp22.suse.cz

Link: http://lkml.kernel.org/r/20191015120717.4858-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm: memblock: do not enforce current limit for memblock_phys* family
Mike Rapoport [Sat, 19 Oct 2019 03:20:01 +0000 (20:20 -0700)]
mm: memblock: do not enforce current limit for memblock_phys* family

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f3057ad767542be7bbac44e548cb44017178a163 upstream.

Until commit 92d12f9544b7 ("memblock: refactor internal allocation
functions") the maximal address for memblock allocations was forced to
memblock.current_limit only for the allocation functions returning
virtual address.  The changes introduced by that commit moved the limit
enforcement into the allocation core and as a result the allocation
functions returning physical address also started to limit allocations
to memblock.current_limit.

This caused breakage of etnaviv GPU driver:

  etnaviv etnaviv: bound 130000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
  etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108
  etnaviv-gpu 130000.gpu: command buffer outside valid memory window
  etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
  etnaviv-gpu 134000.gpu: command buffer outside valid memory window
  etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
  etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0

Restore the behaviour of memblock_phys* family so that these functions
will not enforce memblock.current_limit.

Link: http://lkml.kernel.org/r/1570915861-17633-1-git-send-email-rppt@kernel.org
Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Adam Ford <aford173@gmail.com>
Tested-by: Adam Ford <aford173@gmail.com> [imx6q-logicpd]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
Honglei Wang [Sat, 19 Oct 2019 03:19:58 +0000 (20:19 -0700)]
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b11edebbc967ebf5c55b8f9e1d5bb6d68ec3a7fd upstream.

Commit 1a61ab8038e72 ("mm: memcontrol: replace zone summing with
lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
instead of calculating it directly from lru_zone_size with an idea that
this would be more effective.

Tim has reported that this is not really the case for their database
benchmark which is showing an opposite results where lruvec_page_state
is taking up a huge chunk of CPU cycles (about 25% of the system time
which is roughly 7% of total cpu cycles) on 5.3 kernels.  The workload
is running on a larger machine (96cpus), it has many cgroups (500) and
it is heavily direct reclaim bound.

Tim Chen said:

: The problem can also be reproduced by running simple multi-threaded
: pmbench benchmark with a fast Optane SSD swap (see profile below).
:
:
: 6.15%     3.08%  pmbench          [kernel.vmlinux]            [k] lruvec_lru_size
:             |
:             |--3.07%--lruvec_lru_size
:             |          |
:             |          |--2.11%--cpumask_next
:             |          |          |
:             |          |           --1.66%--find_next_bit
:             |          |
:             |           --0.57%--call_function_interrupt
:             |                     |
:             |                      --0.55%--smp_call_function_interrupt
:             |
:             |--1.59%--0x441f0fc3d009
:             |          _ops_rdtsc_init_base_freq
:             |          access_histogram
:             |          page_fault
:             |          __do_page_fault
:             |          handle_mm_fault
:             |          __handle_mm_fault
:             |          |
:             |           --1.54%--do_swap_page
:             |                     swapin_readahead
:             |                     swap_cluster_readahead
:             |                     |
:             |                      --1.53%--read_swap_cache_async
:             |                                __read_swap_cache_async
:             |                                alloc_pages_vma
:             |                                __alloc_pages_nodemask
:             |                                __alloc_pages_slowpath
:             |                                try_to_free_pages
:             |                                do_try_to_free_pages
:             |                                shrink_node
:             |                                shrink_node_memcg
:             |                                |
:             |                                |--0.77%--lruvec_lru_size
:             |                                |
:             |                                 --0.76%--inactive_list_is_low
:             |                                           |
:             |                                            --0.76%--lruvec_lru_size
:             |
:              --1.50%--measure_read
:                        page_fault
:                        __do_page_fault
:                        handle_mm_fault
:                        __handle_mm_fault
:                        do_swap_page
:                        swapin_readahead
:                        swap_cluster_readahead
:                        |
:                         --1.48%--read_swap_cache_async
:                                   __read_swap_cache_async
:                                   alloc_pages_vma
:                                   __alloc_pages_nodemask
:                                   __alloc_pages_slowpath
:                                   try_to_free_pages
:                                   do_try_to_free_pages
:                                   shrink_node
:                                   shrink_node_memcg
:                                   |
:                                   |--0.75%--inactive_list_is_low
:                                   |          |
:                                   |           --0.75%--lruvec_lru_size
:                                   |
:                                    --0.73%--lruvec_lru_size

The likely culprit is the cache traffic the lruvec_page_state_local
generates.  Dave Hansen says:

: I was thinking purely of the cache footprint.  If it's reading
: pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
: bytes of cache *96 CPUs = 18k of data, mostly read-only.  1 cgroup would
: be 18k of data for the whole system and the caching would be pretty
: efficient and all 18k would probably survive a tight page fault loop in
: the L1.  500 cgroups would be ~90k of data per CPU thread which doesn't
: fit in the L1 and probably wouldn't survive a tight page fault loop if
: both logical threads were banging on different cgroups.
:
: It's just a theory, but it's why I noted the number of cgroups when I
: initially saw this show up in profiles

Fix the regression by partially reverting the said commit and calculate
the lru size explicitly.

Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@oracle.com
Fixes: 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
Vlastimil Babka [Mon, 14 Oct 2019 21:12:07 +0000 (14:12 -0700)]
mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit a2e9a5afce080226edbf1882d63d99bf32070e9e upstream.

Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn().  While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.

One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page.  This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.

The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock.  The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.

[1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/

Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
Roman Gushchin [Sat, 19 Oct 2019 03:19:44 +0000 (20:19 -0700)]
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b749ecfaf6c53ce79d6ab66afd2fc34189a073b1 upstream.

Karsten reported the following panic in __free_slab() happening on a s390x
machine:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
  Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
  Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
  Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
  Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
             0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
  Krnl Code: 00000000003cada6e310a1400004        lg      %r1,320(%r10)
             00000000003cadacc0e50046c286        brasl   %r14,ca32b8
            #00000000003cadb2a7f4fe36            brc     15,3caa1e
            >00000000003cadb6e32060800024        stg     %r2,128(%r6)
             00000000003cadbca7f4fd9e            brc     15,3ca8f8
             00000000003cadc0c0e50046790c        brasl   %r14,c99fd8
             00000000003cadc6a7f4fe2c            brc     15,3caa
             00000000003cadc6a7f4fe2c            brc     15,3caa1e
             00000000003cadcaecb1ffff00d9        aghik   %r11,%r1,-1
  Call Trace:
  (<00000000003cabcc> __free_slab+0x49c/0x6b0)
   <00000000001f5886> rcu_core+0x5a6/0x7e0
   <0000000000ca2dea> __do_softirq+0xf2/0x5c0
   <0000000000152644> irq_exit+0x104/0x130
   <000000000010d222> do_IRQ+0x9a/0xf0
   <0000000000ca2344> ext_int_handler+0x130/0x134
   <0000000000103648> enabled_wait+0x58/0x128
  (<0000000000103634> enabled_wait+0x44/0x128)
   <0000000000103b00> arch_cpu_idle+0x40/0x58
   <0000000000ca0544> default_idle_call+0x3c/0x68
   <000000000018eaa4> do_idle+0xec/0x1c0
   <000000000018ee0e> cpu_startup_entry+0x36/0x40
   <000000000122df34> arch_call_rest_init+0x5c/0x88
   <0000000000000000> 0x0
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   <00000000003ca8f4> __free_slab+0x1c4/0x6b0
  Kernel panic - not syncing: Fatal exception in interrupt

The kernel panics on an attempt to dereference the NULL memcg pointer.
When shutdown_cache() is called from the kmem_cache_destroy() context, a
memcg kmem_cache might have empty slab pages in a partial list, which are
still charged to the memory cgroup.

These pages are released by free_partial() at the beginning of
shutdown_cache(): either directly or by scheduling a RCU-delayed work
(if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag).  The latter case
is when the reported panic can happen: memcg_unlink_cache() is called
immediately after shrinking partial lists, without waiting for scheduled
RCU works.  It sets the kmem_cache->memcg_params.memcg pointer to NULL,
and the following attempt to dereference it by __free_slab() from the
RCU work context causes the panic.

To fix the issue, let's postpone the release of the memcg pointer to
destroy_memcg_params().  It's called from a separate work context by
slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
This guarantees that all scheduled page release RCU works will complete
before the memcg pointer will be zeroed.

Big thanks for Karsten for the perfect report containing all necessary
information, his help with the analysis of the problem and testing of the
fix.

Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/memunmap: don't access uninitialized memmap in memunmap_pages()
Aneesh Kumar K.V [Sat, 19 Oct 2019 03:19:39 +0000 (20:19 -0700)]
mm/memunmap: don't access uninitialized memmap in memunmap_pages()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 77e080e7680e1e615587352f70c87b9e98126d03 upstream.

Patch series "mm/memory_hotplug: Shrink zones before removing memory",
v6.

This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory.  Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.

We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).

We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
of code to a minimum.  Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.

--------------------------------------------------------------------------

Zones are now properly shrunk when offlining memory blocks or when
onlining failed.  This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.

Example:

  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0
  :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
  :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  98304
          present  65536
          managed  65536
  :/# echo 0 > /sys/devices/system/memory/memory43/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  32768
          present  32768
          managed  32768
  :/# echo 0 > /sys/devices/system/memory/memory41/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0

This patch (of 10):

With an altmap, the memmap falling into the reserved altmap space are not
initialized and, therefore, contain a garbage NID and a garbage zone.
Make sure to read the NID/zone from a memmap that was initialized.

This fixes a kernel crash that is observed when destroying a namespace:

  kernel BUG at include/linux/mm.h:1107!
  cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
      pc: c0000000004b9728: memunmap_pages+0x238/0x340
      lr: c0000000004b9724: memunmap_pages+0x234/0x340
  ...
      pid   = 3669, comm = ndctl
  kernel BUG at include/linux/mm.h:1107!
    devm_action_release+0x30/0x50
    release_nodes+0x268/0x2d0
    device_release_driver_internal+0x174/0x240
    unbind_store+0x13c/0x190
    drv_attr_store+0x44/0x60
    sysfs_kf_write+0x70/0xa0
    kernfs_fop_write+0x1ac/0x290
    __vfs_write+0x3c/0x70
    vfs_write+0xe4/0x200
    ksys_write+0x7c/0x140
    system_call+0x5c/0x68

The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm,
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
think we will never have driver reserved memory with
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).

[david@redhat.com: minimze code changes, rephrase description]
Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
Qian Cai [Sat, 19 Oct 2019 03:19:29 +0000 (20:19 -0700)]
mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo

BugLink: https://bugs.launchpad.net/bugs/1850456
commit a26ee565b6cd8dc2bf15ff6aa70bbb28f928b773 upstream.

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:

  :/# echo 1 > /sys/devices/system/memory/memory40/online
  :/# echo 1 > /sys/devices/system/memory/memory42/online
  :/# cat /proc/pagetypeinfo > test.file
   page:fffff2c585200000 is uninitialized and poisoned
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
   There is not page extension available.
   ------------[ cut here ]------------
   kernel BUG at include/linux/mm.h:1107!
   invalid opcode: 0000 [#1] SMP NOPTI

Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).

[david@redhat.com: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/slub: fix a deadlock in show_slab_objects()
Qian Cai [Mon, 14 Oct 2019 21:11:51 +0000 (14:11 -0700)]
mm/slub: fix a deadlock in show_slab_objects()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit e4f8e513c3d353c134ad4eef9fd0bba12406c7c8 upstream.

A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bcba63 ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e25f7f ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   #2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
Fixes: 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/memory-failure.c: don't access uninitialized memmaps in memory_failure()
David Hildenbrand [Sat, 19 Oct 2019 03:19:23 +0000 (20:19 -0700)]
mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 96c804a6ae8c59a9092b3d5dd581198472063184 upstream.

We should check for pfn_to_online_page() to not access uninitialized
memmaps.  Reshuffle the code so we don't have to duplicate the error
message.

Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agommc: sdhci-omap: Fix Tuning procedure for temperatures < -20C
Faiz Abbas [Thu, 10 Oct 2019 10:52:30 +0000 (16:22 +0530)]
mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C

BugLink: https://bugs.launchpad.net/bugs/1850456
commit feb40824d78eac5e48f56498dca941754dff33d7 upstream.

According to the App note[1] detailing the tuning algorithm, for
temperatures < -20C, the initial tuning value should be min(largest value
in LPW - 24, ceil(13/16 ratio of LPW)). The largest value in LPW is
(max_window + 4 * (max_len - 1)) and not (max_window + 4 * max_len) itself.
Fix this implementation.

[1] http://www.ti.com/lit/an/spraca9b/spraca9b.pdf

Fixes: 961de0a856e3 ("mmc: sdhci-omap: Workaround errata regarding SDR104/HS200 tuning failures (i929)")
Cc: stable@vger.kernel.org
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>