With the need to discourage use of less secure dialect, SMB1 (CIFS),
we temporarily upgraded the dialect to SMB3 in 4.13, but since there
are various servers which only support SMB2.1 (2.1 is more secure
than CIFS/SMB1) but not optimal for a default dialect - add support
for multidialect negotiation. cifs.ko will now request SMB2.1
or later (ie SMB2.1 or SMB3.0, SMB3.02) and the server will
pick the latest most secure one it can support.
In addition since we are sending multidialect negotiate, add
support for secure negotiate to validate that a man in the
middle didn't downgrade us.
Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
bsg-lib now embeddeds the job structure into the request, and
req->special can't be used anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in
/proc/PID/stat") stopped reporting eip/esp because it is
racy and dangerous for executing tasks. The comment adds:
As far as I know, there are no use programs that make any
material use of these fields, so just get rid of them.
However, existing userspace core-dump-handler applications (for
example, minicoredumper) are using these fields since they
provide an excellent cross-platform interface to these valuable
pointers. So that commit introduced a user space visible
regression.
Partially revert the change and make the readout possible for
tasks with the proper permissions and only if the target task
has the PF_DUMPCORE flag set.
Fixes: 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in> /proc/PID/stat") Reported-by: Marco Felsch <marco.felsch@preh.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linux API <linux-api@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/87poatfwg6.fsf@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.
It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().
It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.
Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.
Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
We want to catch command execution errors when resetting the device, so
propagate errors from the Set Features when setting up the host memory
buffer. We keep ignoring memory allocation failures, as the spec
clearly says that the controller must work without a host memory buffer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The initial chunk size for host memory buffer allocation is currently
PAGE_SIZE << MAX_ORDER. MAX_ORDER order allocation is usually failed
without CONFIG_DMA_CMA. So the HMB allocation is retried with chunk size
PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the
retry allocation works correctly.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
[hch: rebased] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
nvme_alloc_host_mem currently contains two loops that are interwinded,
and the outer retry loop turns out to be broken. Fix this by untangling
the two.
Based on a report an initial patch from Akinobu Mita.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
There is a race that cause cifs reconnect in cifs_mount,
- cifs_mount
- cifs_get_tcp_session
- [ start thread cifs_demultiplex_thread
- cifs_read_from_socket: -ECONNABORTED
- DELAY_WORK smb2_reconnect_server ]
- cifs_setup_session
- [ smb2_reconnect_server ]
auth_key.response was allocated in cifs_setup_session, and
will release when the session destoried. So when session re-
connect, auth_key.response should be check and released.
Tested with my system:
CIFS VFS: Free previous auth_key.response = ffff8800320bbf80
Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
In SMB2_open there are several paths where the SendReceive2
call will return an error before it sets rsp_iov.iov_base
thus leaving iov_base uninitialized.
Thus we need to check rsp before we dereference it in
the call to get_rfc1002_length().
A report of this issue was previously reported in
http://www.spinics.net/lists/linux-cifs/msg12846.html
UBUNTU: SAUCE: PCI: Disable broken RTIT_BAR of Intel TH
BugLink: http://bugs.launchpad.net/bugs/1715833
On some intergrations of the Intel TH the reported size of RTIT_BAR
doesn't match its actual size, which leads to overlaps with other
devices' resources.
For this reason, we need to disable the RTIT_BAR on Denverton where
it would overlap with XHCI MMIO space and effectively kill usb dead.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kevin Barnett [Thu, 10 Aug 2017 18:47:15 +0000 (13:47 -0500)]
scsi: smartpqi: change driver version to 1.1.2-125
BugLink: http://bugs.launchpad.net/bugs/1721381 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b98117caa0e3d99e4aee1114bcb03ae9ad02bf22) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 557900640b06752fc6a7f6ed545ad1f8e00face9) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
During an 1.) OS shutdown or 2.) kexec outside of a kdump, the Linux
kernel will clear BME on our controller.
If BME is cleared during a controller/host PCIe transfer, the controller
will lock up.
So we perform a PQI reset in the driver's shutdown callback function to
eliminate the possibility of a controller/host PCIe transfer being
active when the kernel clears BME immediately after calling the driver's
shutdown callback.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b6d478119edeaca964b46796fd26893b81f8a561) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kevin Barnett [Thu, 10 Aug 2017 18:46:57 +0000 (13:46 -0500)]
scsi: smartpqi: cleanup doorbell register usage.
BugLink: http://bugs.launchpad.net/bugs/1721381 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4f078e24080626764896055d857719cd886e6321) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 41555d540f18f72e8a52d5c4bc14c36413d09916) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 58322fe0069a2ae2a19cf29023cc0b82c7245762) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kevin Barnett [Thu, 10 Aug 2017 18:46:39 +0000 (13:46 -0500)]
scsi: smartpqi: add pqi reset quiesce support
BugLink: http://bugs.launchpad.net/bugs/1721381 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 336b68193165b1215d21dd05619dc262340e404b) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Benjamin Block [Tue, 3 Oct 2017 16:44:33 +0000 (13:44 -0300)]
UBUNTU: SAUCE: s390: update zfcpdump_defconfig
BugLink: http://bugs.launchpad.net/bugs/1719290
Starting with v4.13 generating a kernel config with the seed-file
zfcpdump_defconfig does not include CONFIG_DEBUG_FS anymore. Standalone
dump will not be possible w/o that option, so re-enable it again
explicitely.
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Colin King <colin.king@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Mika Westerberg [Fri, 18 Aug 2017 10:05:54 +0000 (13:05 +0300)]
pinctrl: intel: Add Intel Cannon Lake PCH-H pin controller support
BugLink: http://bugs.launchpad.net/bugs/1685729
This is desktop version Intel Cannon Lake PCH. The GPIO hardware is the
same but pin list differs a bit. Add support for this to the existing
Cannon Lake pin controller driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
(cherry picked from commit a663ccf0fea17609b92ecc066ce6e8dda559ca73) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Will Deacon [Tue, 3 Oct 2017 19:35:04 +0000 (14:35 -0500)]
arm64: mm: Use READ_ONCE when dereferencing pointer to pte table
On kernels built with support for transparent huge pages, different CPUs
can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
and they must take care to use READ_ONCE to avoid value tearing or caching
of stale values by the compiler. Unfortunately, these functions call into
our pgtable macros, which don't use READ_ONCE, and compiler caching has
been observed to cause the following crash during ext4 writeback:
This is because page_vma_mapped_walk loads the PMD twice before calling
pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
(where it sees a valid table pointer due to a concurrent pmd_populate).
However, the compiler inlines everything and caches the first value in
a register, which is subsequently used in pte_offset_phys which returns
a junk pointer that is later dereferenced when attempting to access the
relevant pte.
This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
that a stale value is not used. Whilst this is a point fix for a known
failure (and simple to backport), a full fix moving all of our page table
accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
page_vma_mapped_walk is in the works for a future kernel release.
Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") BugLink: https://launchpad.net/bugs/1721067 Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit f069faba688701c4d56b6c3452a130f97bf02e95
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Colin Ian King [Tue, 3 Oct 2017 12:12:54 +0000 (13:12 +0100)]
UBUNTU: SAUCE: LSM stacking: check for invalid zero sized writes
BugLink: http://bugs.launchpad.net/bugs/1720779
Writing zero bytes to /proc/$pid/task/$pid/attr/context via
security_setprocattr cause an oops in memcpy_erms. Fix this by
checking for zero size and returning -EINVAL for this invalid
write size.
Detected by running stress-ng --procfs 0
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Thu, 28 Sep 2017 17:47:13 +0000 (13:47 -0400)]
UBUNTU: SAUCE: LSM stacking: make sure LSM blob align on 64 bit boundaries
The security->task blob reserving the first 12 bytes means that LSM
blobs don't align on 64 byte boundaries. This is not a problem
for x86 but if an LSM stores a long or ptr in its blob, then some
architectures require it be aligned to the arch word size and
will through a fault.
Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Wed, 27 Sep 2017 06:05:22 +0000 (02:05 -0400)]
UBUNTU: SAUCE: LSM stacking: provide a way to specify the default display lsm
When the system boots up the desired default display LSM maybe different
than the first LSM initialized. Allow it to be set by specifying an
LSM with
security.display=apparmor
Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The base stacking code does not provide a way for users to specify the
desired stack using the kernel boot parameters. Enable specifying an
LSM stack on the command line by providing a comma separated list
ie.
security=apparmor,selinux
Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The stack configs in the base stacking patches are confusing and
separate the selinux/smack stacking from the other LSMs with an
"extreme" stacking entry which is extremely confusing.
Switch the "extreme" stacking to a select for mutually exclusive
LSMs, which provides a better explanation of what is happening.
Fixes: 6c5100029055 ("LSM: general but not extreme module stacking") Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The patch that adds the smack subdirs also changes how set_procattr
is handled. It makes the assumption that each LSM will attempt
to handle the context written in turn and return ENOENT, if
the LSM doesn't handle the context and another should try.
This is wrong in two ways, LSMs may return ENOENT as an error for
a context that is valid but is not present in currently loaded
policy, and the code is based on earlier versions where each
LSM is tried in turn.
Under the current patchset there is the concept of the current LSM
which will be used by applications using the old interfaces, to
ensure those interfaces which were not designed for multiple LSM
input/out only ever have to deal with a single LSM.
Fixes: 86400b03f812 ("procfs: add smack subdir to attrs") Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: LSM stacking: LSM: general but not extreme module stacking
Leverage the infrastructure management of the credential and
file security blobs to allow stacking of security modules in
all but the most extreme case. Security modules are informed
of the location of their data within the blobs at module
initialization.
Stacking is optional. If stacking is not configured the old
limit of one "major" security module applies. If stacking is
configured any combination that does not include both SELinux
and Smack is allowed.
A subdirectory has been added to /proc/.../attr for each of
SELinux and AppArmor (Smack introduced such a subdirectory earlier)
to disambiguate what data is provided in the proc/.../attr
interfaces. An entry "context" is added to /proc/.../attr and
to each of the subdirectories. The "context" entry provides
process attribute information in the form:
UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of the remaining blobs
Move management of the inode, ipc, key, msg_msg, sock and superblock
security blobs from the security modules to the infrastructure.
Use of the blob pointers is abstracted in the security modules.
Move management of task security blobs into the security
infrastructure. Modules are required to identify the space
they require. At this time there are no modules that use
task blobs.
Move the management of file security blobs from the individual
security modules to the security infrastructure. The security modules
using file blobs have been updated accordingly. Modules are required
to identify the space they need at module initialization. In some
cases a module no longer needs to supply a blob management hook, in
which case the hook has been removed.
Move the management of credential security blobs from the
individual security modules to the security infrastructure.
The security modules using credential blobs have been updated
accordingly. Modules are required to identify the space they
require at module initialization. In some cases a module no
longer needs to supply blob management hook, in which case
the hook has been removed.
UBUNTU: SAUCE: LSM stacking: procfs: add smack subdir to attrs
Back in 2007 I made what turned out to be a rather serious
mistake in the implementation of the Smack security module.
The SELinux module used an interface in /proc to manipulate
the security context on processes. Rather than use a similar
interface, I used the same interface. The AppArmor team did
likewise. Now /proc/.../attr/current will tell you the
security "context" of the process, but it will be different
depending on the security module you're using.
This patch provides a subdirectory in /proc/.../attr for
Smack. Smack user space can use the "current" file in
this subdirectory and never have to worry about getting
SELinux attributes by mistake. Programs that use the
old interface will continue to work (or fail, as the case
may be) as before.
This patch does not include subdirectories for SELinux
or AppArmor. I do have a patch that provides those, and
will happily make it available should anyone see value
in it.
xhci: set missing SuperSpeedPlus Link Protocol bit in roothub descriptor
BugLink: http://bugs.launchpad.net/bugs/1720045
A SuperSpeedPlus roothub needs to have the Link Protocol (LP) bit set in
the bmSublinkSpeedAttr[] entry of a SuperSpeedPlus descriptor.
If the xhci controller has an optional Protocol Speed ID (PSI) table then
that will be used as a base to create the roothub SuperSpeedPlus
descriptor.
The PSI table does not however necessary contain the LP bit so we need
to set it manually.
Check the psi speed and set LP bit if speed is 10Gbps or higher.
We're not setting it for 5 to 10Gbps as USB 3.1 specification always
mention SuperSpeedPlus for 10Gbps or higher, and some SSIC USB 3.0 speeds
can be over 5Gbps, such as SSIC-G3B-L1 at 5830 Mbps
Tony Luck [Thu, 24 Aug 2017 16:26:52 +0000 (09:26 -0700)]
x86/intel_rdt: Turn off most RDT features on Skylake
BugLink: http://bugs.launchpad.net/bugs/1713619
Errata list is included in this document:
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/6th-gen-x-series-spec-update.pdf
with more details in:
https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
But the tl;dr summary (using tags from first of the documents) is:
SKZ4 MBM does not accurately track write bandwidth
SKZ17 CMT counters may not count accurately
SKZ18 CAT may not restrict cacheline allocation under certain conditions
SKZ19 MBM counters may undercount
Disable all these features on Skylake models. Users who understand the
errata may re-enable using boot command line options.
Tony Luck [Thu, 24 Aug 2017 16:26:51 +0000 (09:26 -0700)]
x86/intel_rdt: Add command line options for resource director technology
BugLink: http://bugs.launchpad.net/bugs/1713619
Command line options allow us to ignore features that we don't want.
Also we can re-enable options that have been disabled on a platform
(so long as the underlying h/w actually supports the option).
[ tglx: Marked the option array __initdata and the helper function __init ]
Colin Ian King [Tue, 8 Aug 2017 09:28:59 +0000 (10:28 +0100)]
x86/intel_rdt: Remove redundant ternary operator on return
BugLink: http://bugs.launchpad.net/bugs/1591609
The use of the ternary operator is redundant as ret can never be
non-zero at that point. Instead, just return nbytes.
Detected by CoverityScan, CID#1452658 ("Logically dead code")
Vikas Shivappa [Wed, 16 Aug 2017 01:00:43 +0000 (18:00 -0700)]
x86/intel_rdt/cqm: Improve limbo list processing
BugLink: http://bugs.launchpad.net/bugs/1591609
During a mkdir, the entire limbo list is synchronously checked on each
package for free RMIDs by sending IPIs. With a large number of RMIDs (SKL
has 192) this creates a intolerable amount of work in IPIs.
Replace the IPI based checking of the limbo list with asynchronous worker
threads on each package which periodically scan the limbo list and move the
RMIDs that have:
llc_occupancy < threshold_occupancy
on all packages to the free list.
mkdir now returns -ENOSPC if the free list and the limbo list ere empty or
returns -EBUSY if there are RMIDs on the limbo list and the free list is
empty.
Getting rid of the IPIs also simplifies the data structures and the
serialization required for handling the lists.
Vikas Shivappa [Wed, 16 Aug 2017 01:00:42 +0000 (18:00 -0700)]
x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
BugLink: http://bugs.launchpad.net/bugs/1591609
When a CPU is dying, the overflow worker is canceled and rescheduled on a
different CPU in the same domain. But if the timer is already about to
expire this essentially doubles the interval which might result in a non
detected overflow.
Cancel the overflow worker and reschedule it immediately on a different CPU
in same domain. The work could be flushed as well, but that would
reschedule it on the same CPU.
Vikas Shivappa [Wed, 9 Aug 2017 18:46:34 +0000 (11:46 -0700)]
x86/intel_rdt: Modify the intel_pqr_state for better performance
BugLink: http://bugs.launchpad.net/bugs/1591609
Currently we have pqr_state and rdt_default_state which store the cached
CLOSID/RMIDs and the user configured cpu default values respectively. We
touch both of these during context switch. Put all of them in one
structure so that we can spare a cache line.
Vikas Shivappa [Wed, 9 Aug 2017 18:46:33 +0000 (11:46 -0700)]
x86/intel_rdt/cqm: Clear the default RMID during hotcpu
BugLink: http://bugs.launchpad.net/bugs/1591609
The user configured per cpu default RMID is not cleared during cpu
hotplug. This may lead to incorrect RMID values after a cpu goes offline
and again comes back online. Clear the per cpu default RMID during cpu
offline and online handling.
x86/intel_rdt: Show bitmask of shareable resource with other executing units
BugLink: http://bugs.launchpad.net/bugs/1591609
CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource.
Each set bit within the length of the CBM indicates the corresponding
unit of the resource allocation may be used by other entities in the
platform (e.g. an integrated graphics engine or hardware units outside
the processor core and have direct access to the resource). Each
cleared bit within the length of the CBM indicates the corresponding
allocation unit can be configured to implement a priority-based
allocation scheme without interference with other hardware agents in
the system. Bits outside the length of the CBM are reserved.
More details on the bit mask are described in x86 Software Developer's
Manual.
The bitmask is shown in "info" directory for each resource. It's
up to user to decide how to use the bitmask within a CBM in a partition
to share or isolate a resource with other executing units.
BugLink: http://bugs.launchpad.net/bugs/1591609
Set up a delayed work queue for each domain that will read all
the MBM counters of active RMIDs once per second to make sure
that they don't wrap around between reads from users.
[Tony: Added the initializations for the work structure and completed
the patch]
BugLink: http://bugs.launchpad.net/bugs/1591609
MBM counters are monotonically increasing counts representing the total
memory bytes at a particular time. In order to calculate total_bytes for
an rdtgroup, we store the value of the counter when we create an
rdtgroup or when a new domain comes online.
When the total_bytes(all memory controller bytes) or local_bytes(local
memory controller bytes) file in "mon_data" is read it shows the
total bytes for that rdtgroup since its creation. User can snapshot this
at different time intervals to obtain bytes/second.
Tony Luck [Tue, 25 Jul 2017 21:14:45 +0000 (14:14 -0700)]
x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
BugLink: http://bugs.launchpad.net/bugs/1591609
Check CPUID bits for whether each of the MBM events is supported.
Allocate space for each RMID for each counter in each domain to save
previous MSR counter value and running total of data.
Create files in each of the monitor directories.