brcmfmac: assure SSID length from firmware is limited
The SSID length as received from firmware should not exceed
IEEE80211_MAX_SSID_LEN as that would result in heap overflow.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CVE-2019-9500
(cherry picked from commit 1b5e2423164b3670e8bc9174e4762d297990deff) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
brcmfmac: add subtype check for event handling in data path
For USB there is no separate channel being used to pass events
from firmware to the host driver and as such are passed over the
data path. In order to detect mock event messages an additional
check is needed on event subtype. This check is added conditionally
using unlikely() keyword.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CVE-2019-9503
(cherry picked from commit a4176ec356c73a46c07c181c6d04039fafa34a9f) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Alex Williamson [Thu, 18 Apr 2019 07:25:00 +0000 (09:25 +0200)]
vfio/type1: Limit DMA mappings per container
Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory. This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.
To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).
This fixes CVE-2019-3882.
Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
CVE-2019-3882
(cherry picked from commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
there is no seperate fs/autofs4 path for the module anymore.
Replace it with fs/autofs so the module will again be included
in the main (not extra) modules file.
BugLink: https://bugs.launchpad.net/bugs/1824333 Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: You-Sheng Yang <vicamo.yang@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
misc: rtsx: Enable OCP for rts522a rts524a rts525a rts5260
BugLink: https://bugs.launchpad.net/bugs/1825487
this enables and adds OCP function for Realtek A series cardreader chips
and fixes some OCP flow in rts5260.c
Signed-off-by: RickyWu <ricky_wu@realtek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bede03a579b3b4a036003c4862cc1baa4ddc351f) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Hui Wang [Thu, 18 Apr 2019 05:52:00 +0000 (07:52 +0200)]
ALSA: hda/realtek - add two more pin configuration sets to quirk table
BugLink: https://bugs.launchpad.net/bugs/1825272
We have two Dell laptops which have the codec 10ec0236 and 10ec0256
respectively, the headset mic on them can't work, need to apply the
quirk of ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. So adding their pin
configurations in the pin quirk table.
Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit b26e36b7ef36a8a3a147b1609b2505f8a4ecf511
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Marc Orr [Thu, 18 Apr 2019 07:33:00 +0000 (09:33 +0200)]
KVM: x86: nVMX: fix x2APIC VTPR read intercept
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
virtualization" is 0, a RDMSR of 808H should return the VTPR from the
virtual APIC page.
However, for nested, KVM currently fails to disable the read intercept
for this MSR. This means that a RDMSR exit takes precedence over
"virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
instead of sourcing the value from L2's virtual APIC page.
This patch fixes the issue by disabling the read intercept, in VMCS02,
for the VTPR when "APIC-register virtualization" is 0.
The issue described above and fix prescribed here, were verified with
a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
mode w/ nested".
Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CVE-2019-3887
(cherry picked from commit c73f4c998e1fd4249b9edfa39e23f4fda2b9b041) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Marc Orr [Thu, 18 Apr 2019 07:33:00 +0000 (09:33 +0200)]
KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
result, we discovered the potential for a buggy or malicious L1 to get
access to L0's x2APIC MSRs, via an L2, as follows.
1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
variable, in nested_vmx_prepare_msr_bitmap() to become true.
2. L1 disables "virtualize x2APIC mode" in VMCS12.
3. L1 enables "APIC-register virtualization" in VMCS12.
Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
intercepts with VMCS12's "virtualize x2APIC mode" control.
The scenario outlined above and fix prescribed here, were verified with
a related patch in kvm-unit-tests titled "Add leak scenario to
virt_x2apic_mode_test".
Note, it looks like this issue may have been introduced inadvertently
during a merge---see 15303ba5d1cd.
Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CVE-2019-3887
(cherry picked from commit acff78477b9b4f26ecdf65733a4ed77fe837e9dc) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xin Long [Thu, 18 Apr 2019 07:49:00 +0000 (09:49 +0200)]
sctp: implement memory accounting on rx path
sk_forward_alloc's updating is also done on rx path, but to be consistent
we change to use sk_mem_charge() in sctp_skb_set_owner_r().
In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
which doesn't work for mem_cgroup_sockets_enabled, so we change to use
sk_under_memory_pressure().
When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
should be called on both RENEGE or CHUNK DELIVERY path exit the memory
pressure status as soon as possible.
Note that sk_rmem_schedule() is using datalen to make things easy there.
Reported-by: Matteo Croce <mcroce@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874
(cherry picked from commit 9dde27de3e5efa0d032f3c891a0ca833a0d31911 linux-next) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xin Long [Thu, 18 Apr 2019 07:49:00 +0000 (09:49 +0200)]
sctp: implement memory accounting on tx path
Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been
used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to
check if the allocated should be raised, and call sk_mem_reclaim() to
check if the allocated should be reduced when it's under memory pressure.
If sk_wmem_schedule() returns false, which means no memory is allowed to
allocate, it will block and wait for memory to become available.
Note different from tcp, sctp wait_for_buf happens before allocating any
skb, so memory accounting check is done with the whole msg_len before it
too.
Reported-by: Matteo Croce <mcroce@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874
(cherry picked from commit 1033990ac5b2ab6cee93734cb6d301aa3a35bcaa linux-next) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
binder: fix race between munmap() and direct reclaim
An munmap() on a binder device causes binder_vma_close() to be called
which clears the alloc->vma pointer.
If direct reclaim causes binder_alloc_free_page() to be called, there
is a race where alloc->vma is read into a local vma pointer and then
used later after the mm->mmap_sem is acquired. This can result in
calling zap_page_range() with an invalid vma which manifests as a
use-after-free in zap_page_range().
The fix is to check alloc->vma after acquiring the mmap_sem (which we
were acquiring anyway) and skip zap_page_range() if it has changed
to NULL.
(backported from commit 5cec2d2e5839f9c0fec319c523a911e0a7fd299f)
[tyhicks: Backport to 5.0:
- The write mode of mmap_sem is held in 5.0] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Before this commit we used to rely on an llseek method that was
targeted for regular files for both directories and regular files.
However, the realfile's f_pos was not correctly handled when userspace
called lseek(2) on a shiftfs directory file. Give directories their
own llseek operation so that seeking on a directory file is properly
supported.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The fix/break ratio is off, let's revisit this later with a patch
to disable modeset only on hw which is known to be broken instead
for everyone.
Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Before this commit we used to keep a reference to the shiftfs mark
mount's shiftfs_super_info which was stashed in the superblock of the
mark mount. The problem is that we only take a reference to the mount of
the underlay, i.e. the filesystem that is *under* the shiftfs mark
mount. This means when someone performs a shiftfs mark mount, then a
shiftfs overlay mount and then immediately unmounts the shiftfs mark
mount we muck with invalid memory since shiftfs_put_super might have
already been called freeing that memory.
Another solution would be to start reference counting. But this would be
overkill. We only care about the passthrough mount option of the mark
mount. And we only need it to verify that on remount the new passthrough
options of the shiftfs overlay are a subset of the mark mount's
passthrough options. In other scenarios we don't care. So copying up is
good enough and also only needs to happen once on mount, i.e. when a new
superblock is created and the .fill_super method is called.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Jani Nikula [Sun, 14 Apr 2019 15:07:55 +0000 (18:07 +0300)]
drm/i915/dp: revert back to max link rate and lane count on eDP
BugLink: https://bugs.launchpad.net/bugs/1824216
Commit 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast
and narrow") started to optize the eDP 1.4+ link config, both per spec
and as preparation for display stream compression support.
Sadly, we again face panels that flat out fail with parameters they
claim to support. Revert, and go back to the drawing board.
v2: Actually revert to max params instead of just wide-and-slow.
UBUNTU: SAUCE: shiftfs: fix passing of attrs to underaly for setattr
BugLink: https://bugs.launchpad.net/bugs/1824717
shiftfs_setattr() makes a copy of the attrs it was passed to pass
to the lower fs. It then calls setattr_prepare() with the original
attrs, and this may make changes which are not reflected in the
attrs passed to the lower fs. To fix this, copy the attrs to the
new struct for the lower fs after calling setattr_prepare().
Additionally, notify_change() may have set ATTR_MODE when one of
ATTR_KILL_S[UG]ID is set, and passing this combination to
notify_change() will trigger a BUG(). Do as overlayfs and
ecryptfs both do, and clear ATTR_MODE if either of those bits
is set.
Yunsheng Lin [Wed, 10 Apr 2019 17:08:07 +0000 (11:08 -0600)]
net: hns3: fix for not calculating tx bd num correctly
BugLink: https://bugs.launchpad.net/bugs/1824194
When there is only one byte in a frag, the current calculation
using "(size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET"
will return zero, because HNS3_MAX_BD_SIZE is 65535 and
HNS3_MAX_BD_SIZE_OFFSET is 16. So it will cause tx error when
a frag's size is one byte.
This patch fixes it by using DIV_ROUND_UP.
Fixes: 3fe13ed95dd3 ("net: hns3: avoid mult + div op in critical data path") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5f543a54eec08228ab0cc0a49cf5d79dd32c9e5e) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-By: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1823862
bpfilter is needed for iptables. Include it in linux-modules so
that iptables will work with linux-virtual.
Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1824354
There is no reason shiftfs needs to be built-in. Change it to be
a module, and add the module to the generic inclusion list so
that shiftfs will be available in linux-virtual.
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: shiftfs: use translated ids when chaning lower fs attrs
BugLink: https://bugs.launchpad.net/bugs/1824350
shiftfs_setattr() is preparing a new set of attributes with the
owner translated for the lower fs, but it then passes the
original attrs. As a result the owner is set to the untranslated
owner, which causes the shiftfs inodes to also have incorrect
ids. For example:
# mkdir dir
# touch file
# ls -lh dir file
drwxr-xr-x 2 root root 4.0K Apr 11 13:05 dir
-rw-r--r-- 1 root root 0 Apr 11 13:05 file
# chown 500:500 dir file
# ls -lh dir file
drwxr-xr-x 2 10005001000500 4.0K Apr 11 12:42 dir
-rw-r--r-- 1 10005001000500 0 Apr 11 12:42 file
Fix this to pass the correct iattr struct to notify_change().
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: [Packaging] bind hv_kvp_daemon startup to hv_kvp device
BugLink: https://bugs.launchpad.net/bugs/1820063
The hv_kvp_daemon service requires the hv_kvp device and should
be started when the device appears and stopped in it disappears.
Solution is based off the one provided at
https://bugzilla.redhat.com/show_bug.cgi?id=1195029#c22.
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: apparmor: Restore Y/N in /sys for apparmor's "enabled"
Before commit c5459b829b71 ("LSM: Plumb visibility into optional "enabled"
state"), /sys/module/apparmor/parameters/enabled would show "Y" or "N"
since it was using the "bool" handler. After being changed to "int",
this switched to "1" or "0", breaking the userspace AppArmor detection
of dbus-broker. This restores the Y/N output while keeping the LSM
infrastructure happy.
Andrea Righi [Fri, 5 Apr 2019 07:31:53 +0000 (09:31 +0200)]
openvswitch: fix flow actions reallocation
BugLink: https://bugs.launchpad.net/bugs/1813244
The flow action buffer can be resized if it's not big enough to contain
all the requested flow actions. However, this resize doesn't take into
account the new requested size, the buffer is only increased by a factor
of 2x. This might be not enough to contain the new data, causing a
buffer overflow, for example:
Fix by making sure the new buffer is properly resized to contain all the
requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb) Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Juerg Haefliger <juergh@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1820187
In order to avoid frequent system interrupts when sending and
receiving packets. we replace disable_irq_nosync/enable_irq
with hinic_set_msix_state(), hinic_set_msix_state is used to
access memory mapped hinic devices.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905b464ad9008905db099f90ae20f373c7051804) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1823186
Shiftfs currently only passes through a few ioctl()s to the underlay. These
are ioctl()s that are generally considered safe. Doing it for random
ioctl()s would be a security issue. Permissions for ioctl()s are not
checked before the filesystem gets involved so if we were to override
credentials we e.g. could do a btrfs tree search in the underlay which we
normally wouldn't be allowed to do.
However, the btrfs filesystem allows unprivileged users to perform various
operations through its ioctl() interface. With shiftfs these ioctl() are
currently not working. To not regress users that expect btrfs ioctl()s to
work in unprivileged containers we can create a whitelist of ioctl()s that
we allow to go through to the underlay and for which we also switch
credentials.
The main problem is how we switch credentials. Since permissions checks for
ioctl()s are
done by the actual file system and not by the vfs this would mean that any
additional capable(<cap>)-based checks done by the filesystem would
unconditonally pass after we switch credentials. So to make credential
switching safe we drop *all* capabilities when switching credentials. This
means that only inode-based permission checks will pass.
Btrfs also allows unprivileged users to delete snapshots when the
filesystem is mounted with user_subvol_rm_allowed mount option or if the
the callers is capable(CAP_SYS_ADMIN). The latter should never be the case
with unprivileged users. To make sure we only allow removal of snapshots in
the former case we drop all capabilities (see above) when switching
credentials.
Additonally, btrfs allows the creation of snapshots. To make this work we
need to be (too) clever. When doing snapshots btrfs requires that an fd to
the directory the snapshot is supposed to be created in be passed along.
This fd obviously references a shiftfs file and as such a shiftfs dentry
and inode. This will cause btrfs to yell EXDEV. To circumnavigate this
problem we need to silently temporarily replace the passed in fd with an fd
that refers to a file that references a btrfs dentry and inode.
BugLink: https://bugs.launchpad.net/bugs/1823186
/* Introduction */
The shiftfs filesystem is implemented as a stacking filesystem. Since it is
a stacking filesystem it shares concepts with overlayfs and ecryptfs.
Usually, shiftfs will be stacked upon another filesystem. The filesystem on
top - shiftfs - is referred to as "upper filesystem" or "overlay" and the
filesystem it is stacked upon is referred to as "lower filesystem" or
"underlay".
/* Marked and Unmarked shiftfs mounts */
To use shiftfs it is necessary that a given mount is marked as shiftable via
the "mark" mount option. Any mount of shiftfs without the "mark" mount option
not on top of a shiftfs mount with the "mark" mount option will be refused with
EPERM.
After a marked shiftfs mount has been performed other shiftfs mounts
referencing the marked shiftfs mount can be created. These secondary shiftfs
mounts are usually what are of interest.
The marked shiftfs mount will take a reference to the underlying mountpoint of
the directory it is marking as shiftable. Any unmarked shiftfts mounts
referencing this marked shifts mount will take a second reference to this
directory as well. This ensures that the underlying marked shiftfs mount can be
unmounted thereby dropping the reference to the underlying directory without
invalidating the mountpoint of said directory since the non-marked shiftfs
mount still holds another reference to it.
/* Stacking Depth */
Shiftfs tries to keep the stack as flat as possible to avoid hitting the
kernel enforced filesystem stacking limit.
/* Permission Model */
When the mark shiftfs mount is created shiftfs will record the credentials of
the creator of the super block and stash it in the super block. When other
non-mark shiftfs mounts are created that reference the mark shiftfs mount they
will stash another reference to the creators credentials. Before calling into
the underlying filesystem shiftfs will switch to the creators credentials and
revert to the original credentials after the underlying filesystem operation
returns.
/* Mount Options */
- mark
When set the mark mount option indicates that the mount in question is
allowed to be shifted. Since shiftfs it mountable in by user namespace root
non-initial user namespace this mount options ensures that the system
administrator has decided that the marked mount is safe to be shifted.
To mark a mount as shiftable CAP_SYS_ADMIN in the user namespace is required.
- passthrough={0,1,2,3}
This mount options functions as a bitmask. When set to a non-zero value
shiftfs will try to act as an invisible shim sitting on top of the
underlying filesystem.
- 1: Shifts will report the filesystem type of the underlay for stat-like
system calls.
- 2: Shiftfs will passthrough whitelisted ioctl() to the underlay.
- 3: Shiftfs will both use 1 and 2.
Note that mount options on a marked mount cannot be changed.
/* Extended Attributes */
Shiftfs will make sure to translate extended attributes.
/* Inodes Numbers */
Shiftfs inodes numbers are copied up from the underlying filesystem, i.e.
shiftfs inode numbers will be identical to the corresponding underlying
filesystem's inode numbers. This has the advantage that inotify and friends
should work out of the box.
(In essence, shiftfs is nothing but a 1:1 mirror of the underlying filesystem's
dentries and inodes.)
/* Device Support */
Shiftfs only supports the creation of pipe and socket devices. Character and
block devices cannot be created through shiftfs.
James Bottomley [Thu, 4 Apr 2019 13:39:11 +0000 (15:39 +0200)]
shiftfs: uid/gid shifting bind mount
BugLink: https://bugs.launchpad.net/bugs/1823186
This allows any subtree to be uid/gid shifted and bound elsewhere. It
does this by operating simlarly to overlayfs. Its primary use is for
shifting the underlying uids of filesystems used to support
unpriviliged (uid shifted) containers. The usual use case here is
that the container is operating with an uid shifted unprivileged root
but sometimes needs to make use of or work with a filesystem image
that has root at real uid 0.
The mechanism is to allow any subordinate mount namespace to mount a
shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only allowing
it to mount marked subtrees (using the -o mark option as root). Once
mounted, the subtree is mapped via the super block user namespace so
that the interior ids of the mounting user namespace are the ids
written to the filesystem.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[ saf: use designated initializers for path declarations to fix errors
with struct randomization ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[update: port to 5.0] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
We don't need to send firmware data asynchronously, much simpler is just
use synchronous usb_bulk_msg().
[ stable note: this patch was originally developed as cleanup, but it
remove incorrect usage of page_frag_alloc(): alloc more than PAGE_SIZE
and create not ARCH_DMA_MINALIGN dma buffers needed at least for
performance reason. Was tested on 5.0 and 4.20, see
https://bugzilla.kernel.org/show_bug.cgi?id=202673 and
https://bugzilla.kernel.org/show_bug.cgi?id=202241 ]
Tested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When kzalloc() fails in push_stack(), free_verifier_state() will free
current verifier state. As push_stack() returns, dst_reg was restored
if ptr_is_dst_reg is false. However, as member of the cur_state,
dst_reg is also freed, and error occurs when dereferencing dst_reg.
Simply fix it by testing ret of push_stack() before restoring dst_reg.
Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a
vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
To avoid corruping %rip after such a reset, commit 0967b7bf1c22 ("KVM:
Skip pio instruction when it is emulated, not executed") changed the
behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
instruction prior to exiting to userspace. Full emulation doesn't need
such tricks becase re-emulating the instruction will naturally handle
%rip being changed to point at the reset vector.
Updating %rip prior to executing to userspace has several drawbacks:
- Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
fails it will likely yell about the wrong address.
- Single step exits to userspace for are effectively dropped as
KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
- Behavior of PIO emulation is different depending on whether it
goes down the fast path or the slow path.
Rather than skip the PIO instruction before exiting to userspace,
snapshot the linear %rip and cancel PIO completion if the current
value does not match the snapshot. For a 64-bit vCPU, i.e. the most
common scenario, the snapshot and comparison has negligible overhead
as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
VMREAD in this case.
All other alternatives to snapshotting the linear %rip that don't
rely on an explicit reset announcenment suffer from one corner case
or another. For example, canceling PIO completion on any write to
%rip fails if userspace does a save/restore of %rip, and attempting to
avoid that issue by canceling PIO only if %rip changed then fails if PIO
collides with the reset %rip. Attempting to zero in on the exact reset
vector won't work for APs, which means adding more hooks such as the
vCPU's MP_STATE, and so on and so forth.
Checking for a linear %rip match technically suffers from corner cases,
e.g. userspace could theoretically rewrite the underlying code page and
expect a different instruction to execute, or the guest hardcodes a PIO
reset at 0xfffffff0, but those are far, far outside of what can be
considered normal operation.
Fixes: 432baf60eee3 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O") Cc: <stable@vger.kernel.org> Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
regardless of hardware support under the pretense that KVM fully
emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts
handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
that it's emulated on AMD hosts.
Fixes: 1eaafe91a0df4 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
KVM's API requires thats ioctls must be issued from the same process
that created the VM. In other words, userspace can play games with a
VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
creator can do anything useful. Explicitly reject device ioctls that
are issued by a process other than the VM's creator, and update KVM's
API documentation to extend its requirements to device ioctls.
Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The SMT disable 'nosmt' command line argument is not working properly when
CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
required to be brought up due to the MCE issues, cannot work. The CPUs are
then kept in a half dead state.
As the 'nosmt' functionality has become popular due to the speculative
hardware vulnerabilities, the half torn down state is not a proper solution
to the problem.
Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
possible.
Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Micheal Kelley <michael.h.kelley@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Tianyu reported a crash in a CPU hotplug teardown callback when booting a
kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
parameter.
It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
forever in case that a bringup callback fails. Unfortunately this issue was
not recognized when the CPU hotplug code was reworked, so the shortcoming
just stayed in place.
When a bringup callback fails, the CPU hotplug code rolls back the
operation and takes the CPU offline.
The 'nosmt' command line argument uses a bringup failure to abort the
bringup of SMT sibling CPUs. This partial bringup is required due to the
MCE misdesign on Intel CPUs.
With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
teardown of a CPU including the synchronizations in various facilities like
RCU, NOHZ and others.
As a consequence the teardown callbacks which must be executed on the
outgoing CPU within stop machine with interrupts disabled are executed on
the control CPU in interrupt enabled and preemptible context causing the
kernel to crash and burn. The pre state machine code has a different
failure mode which is more subtle and resulting in a less obvious use after
free crash because the control side frees resources which are still in use
by the undead CPU.
But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.
The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
all architectures as the following architectures have either no hotplug
support at all or not all subarchitectures support it:
Crashing the kernel in such a situation is not an acceptable state
either.
Implement a minimal rollback variant by limiting the teardown to the point
where all regular teardown callbacks have been invoked and leave the CPU in
the 'dead' idle state. This has the following consequences:
- the CPU is brought down to the point where the stop_machine takedown
would happen.
- the CPU stays there forever and is idle
- The CPU is cleared in the CPU active mask, but not in the CPU online
mask which is a legit state.
- Interrupts are not forced away from the CPU
- All facilities which only look at online mask would still see it, but
that is the case during normal hotplug/unplug operations as well. It's
just a (way) longer time frame.
This will expose issues, which haven't been exposed before or only seldom,
because now the normally transient state of being non active but online is
a permanent state. In testing this exposed already an issue vs. work queues
where the vmstat code schedules work on the almost dead CPU which ends up
in an unbound workqueue and triggers 'preemtible context' warnings. This is
not a problem of this change, it merily exposes an already existing issue.
Still this is better than crashing fully without a chance to debug it.
This is mainly thought as workaround for those architectures which do not
support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions") Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Micheal Kelley <michael.h.kelley@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The rework of the watchdog core to use cpu_stop_work broke the watchdog
cpumask on CPU hotplug.
The watchdog_enable/disable() functions are now called unconditionally from
the hotplug callback, i.e. even on CPUs which are not in the watchdog
cpumask. As a consequence the watchdog can become unstoppable.
Only invoke them when the plugged CPU is in the watchdog cpumask.
Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work") Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903262245490.1789@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
On pseries, TLB multihit are reported as D-Cache Multihit. This is because
the wrongly populated mc_err_types[] array. Per PAPR, TLB error type is 0x04
and mc_err_types[4] points to "D-Cache" instead of "TLB" string. Fixup the
mc_err_types[] array.
It's not OK for memcmp to read past the end of the source or
destination buffers if that would cross a page boundary, because we
don't know that the next page is mapped.
As pointed out by Segher, we can read past the end of the source or
destination as long as we don't cross a 4K boundary, because that's
our minimum page size on all platforms.
The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
there we know that s1 is 8-byte aligned and we have at least 1 byte to
read, so a single 8-byte load won't read past the end of s1 and cross
a page boundary.
But we have to be more careful with s2. So check if it's within 8
bytes of a 4K boundary and if so go to the byte-by-byte loop.
Fixes: 2d9ee327adce ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()") Cc: stable@vger.kernel.org # v4.19+ Reported-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Tested-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent,
we currently use of_read_property() to obtain the pointer to the array
corresponding to the property "ibm,drc-indexes". The elements of this
array are of type __be32, but are accessed without any conversion to
the OS-endianness, which is buggy on a Little Endian OS.
Fix this by using of_property_read_u32_index() accessor function to
safely read the elements of the array.
Fixes: e83636ac3334 ("pseries/drc-info: Search DRC properties for CPU indexes") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
A TSC packet can slip past MTC packets so that the timestamp appears to
go backwards. One estimate is that can be up to about 40 CPU cycles,
which is certainly less than 0x1000 TSC ticks, but accept slippage an
order of magnitude more to be on the safe side.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Fixes: 79b58424b821c ("perf tools: Add Intel PT support for decoding MTC packets") Link: http://lkml.kernel.org/r/20190325135135.18348-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
and SIGSEGV that could not be traced back to a userspace code bug. They
had all the magic signs of an I/D cache coherency issue.
Now recently we noticed that the /proc/sys/vm/compact_memory interface
was quite efficient at provoking this class of userspace crashes.
Studying the code in mm/migrate.c there is a distinction made between
migrating a page that is mapped at the instant of migration and one that
is not mapped. Our problem turned out to be the non-mapped pages.
For the non-mapped page the code performs a copy of the page content and
all relevant meta-data of the page without doing the required D-cache
maintenance. This leaves dirty data in the D-cache of the CPU and on
the 1004K cores this data is not visible to the I-cache. A subsequent
page-fault that triggers a mapping of the page will happily serve the
process with potentially stale code.
What about ARM then, this bug should have seen greater exposure? Well
ARM became immune to this flaw back in 2010, see commit c01778001a4f
("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
My proposed fix moves the D-cache maintenance inside move_to_new_page to
make it common for both cases.
Link: http://lkml.kernel.org/r/20190315083502.11849-1-larper@axis.com Fixes: 97ee0524614 ("flush cache before installing new page at migraton") Signed-off-by: Lars Persson <larper@axis.com> Reviewed-by: Paul Burton <paul.burton@mips.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Due to has_unmovable_pages() taking an incorrect irqsave flag instead of
the isolation flag in set_migratetype_isolate(), there are issues with
HWPOSION and error reporting where dump_page() is not called when there
is an unmovable page.
Link: http://lkml.kernel.org/r/20190320204941.53731-1-cai@lca.pw Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory") Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When start_isolate_page_range() returned -EBUSY in __offline_pages(), it
calls memory_notify(MEM_CANCEL_OFFLINE, &arg) with an uninitialized
"arg". As the result, it triggers warnings below. Also, it is only
necessary to notify MEM_CANCEL_OFFLINE after MEM_GOING_OFFLINE.
While debugging something, I added a dump_page() into do_swap_page(),
and I got the splat from below. The issue happens when dereferencing
mapping->host in __dump_page():
Swap address space does not contain an inode information, and so
mapping->host equals NULL.
Although the dump_page() call was added artificially into
do_swap_page(), I am not sure if we can hit this from any other path, so
it looks worth fixing it. We can easily do that by checking
mapping->host first.
Link: http://lkml.kernel.org/r/20190318072931.29094-1-osalvador@suse.de Fixes: 1c6fb1d89e73c ("mm: print more information about mapping in __dump_page") Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When MPOL_MF_STRICT was specified and an existing page was already on a
node that does not follow the policy, mbind() should return -EIO. But
commit 6f4576e3687b ("mempolicy: apply page table walker on
queue_pages_range()") broke the rule.
And commit c8633798497c ("mm: mempolicy: mbind and migrate_pages support
thp migration") didn't return the correct value for THP mbind() too.
If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it
reaches queue_pages_to_pte_range() or queue_pages_pmd() to check if an
existing page was already on a node that does not follow the policy.
And, non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL was specified.
Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c
[akpm@linux-foundation.org: tweak code comment] Link: http://lkml.kernel.org/r/1553020556-38583-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: Cyril Hrubis <chrubis@suse.cz> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 is
defined (e.g. on arm64 platforms).
For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note that
we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc, as this is
not strictly necessary, and would cause a warning in mm/sl*b.c, as we
did not update GFP_SLAB_BUG_MASK.
Also, print an error when the physical address does not fit in
32-bit, to make debugging easier in the future.
Link: http://lkml.kernel.org/r/20181210011504.122604-3-drinkcat@chromium.org Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
v6.
This is a followup to the discussion in [1], [2].
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For L1 tables that are bigger than a page, we can just use
__get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
use GFP_DMA).
For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
1. This series, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]
This series is the most memory-efficient approach.
stable@ note:
We confirmed that this is a regression, and IOMMU errors happen on 4.19
and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA
with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
platforms (and maybe others?).
IOMMUs using ARMv7 short-descriptor format require page tables to be
allocated within the first 4GB of RAM, even on 64-bit systems. On arm64,
this is done by passing GFP_DMA32 flag to memory allocation functions.
For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
a full page using get_free_pages, so we considered 3 approaches:
1. This patch, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2
page tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable
to reuse freed fragments until the whole page is freed.
This change makes it possible to create a custom cache in DMA32 zone using
kmem_cache_create, then allocate memory using kmem_cache_alloc.
We do not create a DMA32 kmalloc cache array, as there are currently no
users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a
warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
unnecessary).
Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.
However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
__first_valid_page() to skip offline pages, undo_isolate_page_range()
here just waste CPU cycles looping around the offlining PFN range while
doing nothing, because __first_valid_page() will return NULL as
offline_isolated_pages() has already marked all memory sections within
the pfn range as offline via offline_mem_sections().
Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone counter
which would make some paths of the page allocation slower that the above
commit introduced.
Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.
[cai@lca.pw: v4] Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Currently there is no check on platform_get_irq() return value
in case it fails, hence never actually reporting any errors and
causing unexpected behavior when using such value as argument
for function regmap_irq_get_virq().
Fix this by adding a proper check, a message error and return
*irq* in case platform_get_irq() fails.
Addresses-Coverity-ID: 1443899 ("Improper use of negative value") Fixes: d2061f9cc32d ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
PD 2.0 sinks are supposed to accept src-capabilities with a 3.0 header and
simply ignore any src PDOs which the sink does not understand such as PPS
but some 2.0 sinks instead ignore the entire PD_DATA_SOURCE_CAP message,
causing contract negotiation to fail.
This commit fixes such sinks not working by re-trying the contract
negotiation with PD-2.0 source-caps messages if we don't have a contract
after PD_N_HARD_RESET_COUNT hard-reset attempts.
The problem fixed by this commit was noticed with a Type-C to VGA dongle.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When the kernel is compiled with preemption enabled, the URB completion
handler can run in parallel with the work responsible for waking up the
tty layer. If the URB handler sets the EVENT_TTY_WAKEUP bit during the
call to tty_port_tty_wakeup() to signal that there is room for additional
input, it will be cleared at the end of this call. As a result, TX traffic
on the upper layer will be blocked.
This can be seen with a kernel configured with CONFIG_PREEMPT, and a fast
modem connected with PPP running over a USB CDC-ACM port.
Use test_and_clear_bit() instead, which ensures that each wakeup requested
by the URB completion code will trigger a call to tty_port_tty_wakeup().
Commit 2f31a67f01a8 ("usb: xhci: Prevent bus suspend if a port connect
change or polling state is detected") was intended to prevent ports that
were still link training from being forced to U3 suspend state mid
enumeration.
This solved enumeration issues for devices with slow link training.
Turns out some devices are stuck in the link training/polling state,
and thus that patch will prevent suspend completely for these devices.
This is seen with USB3 card readers in some MacBooks.
Instead of preventing suspend, give some time to complete the link
training. On successful training the port will end up as connected
and enabled.
If port instead is stuck in link training the bus suspend will continue
suspending after 360ms (10 * 36ms) timeout (tPollingLFPSTimeout).
Original patch was sent to stable, this one should go there as well
Fixes: 2f31a67f01a8 ("usb: xhci: Prevent bus suspend if a port connect change or polling state is detected") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
A suspended SS port in U3 link state will go to U0 when resumed, but
can almost immediately after that enter U1 or U2 link power save
states before host controller driver reads the port status.
Host controller driver only checks for U0 state, and might miss
the finished resume, leaving flags unclear and skip notifying usb
code of the wake.
Add U1 and U2 to the possible link states when checking for finished
port resume.
When plugging BUFFALO LUA4-U3-AGT USB3.0 to Gigabit Ethernet LAN
Adapter, warning messages filled up dmesg.
[ 101.098287] xhci-hcd ee000000.usb: WARN Successful completion on short TX for slot 1 ep 4: needs XHCI_TRUST_TX_LENGTH quirk?
[ 101.117463] xhci-hcd ee000000.usb: WARN Successful completion on short TX for slot 1 ep 4: needs XHCI_TRUST_TX_LENGTH quirk?
[ 101.136513] xhci-hcd ee000000.usb: WARN Successful completion on short TX for slot 1 ep 4: needs XHCI_TRUST_TX_LENGTH quirk?
Adding the XHCI_TRUST_TX_LENGTH quirk resolves the issue.
There are cases where multiple device tree nodes point to the
same phy node by means of the "phys" property, but we should
only consider those nodes that are marked as available rather
than just any node.
Fixes: 98bfb3946695 ("usb: of: add an api to get dr_mode by the phy node") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
In f_hidg_write() the write_spinlock is acquired before calling
usb_ep_queue() which causes a deadlock when dummy_hcd is being used.
This is because dummy_queue() callbacks into f_hidg_req_complete() which
tries to acquire the same spinlock. This is (part of) the backtrace when
the deadlock occurs:
0xffffffffc06b1410 in f_hidg_req_complete
0xffffffffc06a590a in usb_gadget_giveback_request
0xffffffffc06cfff2 in dummy_queue
0xffffffffc06a4b96 in usb_ep_queue
0xffffffffc06b1eb6 in f_hidg_write
0xffffffff8127730b in __vfs_write
0xffffffff812774d1 in vfs_write
0xffffffff81277725 in SYSC_write
Fix this by releasing the write_spinlock before calling usb_ep_queue()
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com> Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@vger.kernel.org # 4.11+ Fixes: 749494b6bdbb ("usb: gadget: f_hid: fix: Move IN request allocation to set_alt()") Signed-off-by: Radoslav Gerganov <rgerganov@vmware.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
While only the first PHY supports mode switching, the remaining PHYs
work in USB host mode. They should support set_mode with mode=USB_HOST
instead of failing. This is especially needed now that the USB core does
set_mode for all USB ports, which was added in commit b97a31348379 ("usb:
core: comply to PHY framework").
Make set_mode with mode=USB_HOST a no-op instead of failing for the
non-OTG USB PHYs.
Current code test wrong value so it does not verify if the written
data is correctly read back. Fix it.
Also make it return -EPERM if read value does not match written bit,
just like it done for adnp_gpio_direction_output().
This patch fixes the PORT_SYNC_MODE_MASTER_SELECT macro
to correctly do the left shifting to set the port sync
master select correctly.
I have tested this fix on ICL.
When MI_FLUSH_DW post write hw status page in index mode, the index
value is in dword step and turned into address offset in cmd dword1.
As status page size is 4K, so can't exceed that.
This fixed upper bound check in cmd parser code which incorrectly
stopped VM for reason of invalid MI_FLUSH_DW write index.
v2:
- Fix upper bound as 4K page size because index value is address offset.
If drm_gem_handle_create() fails in vkms_gem_create(), then the
vkms_gem_object is freed twice: once when the reference is dropped by
drm_gem_object_put_unlocked(), and again by the extra calls to
drm_gem_object_release() and kfree().
Fix it by skipping the second release and free.
This bug was originally found in the vgem driver by syzkaller using
fault injection, but I noticed it's also present in the vkms driver.
Fixes: 559e50fd34d1 ("drm/vkms: Add dumb operations") Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: Haneen Mohammed <hamohammed.sa@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226220858.214438-1-ebiggers@kernel.org Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
If drm_gem_handle_create() fails in vgem_gem_create(), then the
drm_vgem_gem_object is freed twice: once when the reference is dropped
by drm_gem_object_put_unlocked(), and again by __vgem_gem_destroy().
This was hit by syzkaller using fault injection.
Fix it by skipping the second free.
Reported-by: syzbot+e73f2fb5ed5a5df36d33@syzkaller.appspotmail.com Fixes: af33a9190d02 ("drm/vgem: Enable dmabuf import interfaces") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Laura Abbott <labbott@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226214451.195123-1-ebiggers@kernel.org Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The ACPI specification states that if the "Guaranteed Performance
Register" is not implemented, the OSPM assumes guaranteed performance
to always be equal to nominal performance.
So for invalid or unimplemented guaranteed performance register, use
nominal performance as guaranteed performance.
This change will fall back to nominal_perf when guranteed_perf is
invalid. If nominal_perf is also invalid or not present, fall back
to the existing implementation, which is to read from HWP Capabilities
MSR.
Fixes: 86d333a8cc7f ("cpufreq: intel_pstate: Add base_frequency attribute") Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.20+ <stable@vger.kernel.org> # 4.20+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
We now wrap sbitmap waitqueues in an active counter, so we can avoid
iterating wakeups unless we have waiters there. This works as long as
everyone that's manipulating the waitqueues use the proper helpers. For
the tag wait case for shared tags, however, we add ourselves to the
waitqueue without incrementing/decrementing the ->ws_active count. This
means that wakeups can take a long time to happen.
Fix this by manually doing the inc/dec as needed for the wait queue
handling.
Makoto report a below KASAN error: zram does out-of-bounds read. Because
strscpy copies from source up to count bytes unconditionally. It could
cause out-of-bounds read on next object in slab.
To prevent it, use strlcpy which checks source's length automatically.
BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154
Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314
..
Call trace:
strscpy+0x68/0x154
idle_store+0xc4/0x34c
dev_attr_store+0x50/0x6c
sysfs_kf_write+0x98/0xb4
kernfs_fop_write+0x198/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
Allocated by task 1314:
__kmalloc+0x280/0x318
kernfs_fop_write+0xac/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
The buggy address belongs to the object at ffffffc0c3495a00
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
128-byte region [ffffffc0c3495a00, ffffffc0c3495a80)
The buggy address belongs to the page:
page:ffffffbf030d2500 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
page dumped because: kasan: bad access detected
Memory state around the buggy address: ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL. Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer. Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.
In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.
Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> [3.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
As per the ACPI specification, "Guaranteed Performance Register" is
a "Buffer" field and it cannot be "Integer", so treat the "Integer"
type for "Guaranteed Performance Register" field as invalid and
ignore its value in that case.
Also save one cpc_read() call when "Guaranteed Performance Register"
is not present, which means a register defined as:
"Register(SystemMemory, 0, 0, 0, 0)".
Fixes: 29523f095397 ("ACPI / CPPC: Add support for guaranteed performance") Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.20+ <stable@vger.kernel.org> # 4.20+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>