UBUNTU: SAUCE: PCI: Disable broken RTIT_BAR of Intel TH
BugLink: http://bugs.launchpad.net/bugs/1715833
On some intergrations of the Intel TH the reported size of RTIT_BAR
doesn't match its actual size, which leads to overlaps with other
devices' resources.
For this reason, we need to disable the RTIT_BAR on Denverton where
it would overlap with XHCI MMIO space and effectively kill usb dead.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kevin Barnett [Thu, 10 Aug 2017 18:47:15 +0000 (13:47 -0500)]
scsi: smartpqi: change driver version to 1.1.2-125
BugLink: http://bugs.launchpad.net/bugs/1721381 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b98117caa0e3d99e4aee1114bcb03ae9ad02bf22) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 557900640b06752fc6a7f6ed545ad1f8e00face9) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
During an 1.) OS shutdown or 2.) kexec outside of a kdump, the Linux
kernel will clear BME on our controller.
If BME is cleared during a controller/host PCIe transfer, the controller
will lock up.
So we perform a PQI reset in the driver's shutdown callback function to
eliminate the possibility of a controller/host PCIe transfer being
active when the kernel clears BME immediately after calling the driver's
shutdown callback.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b6d478119edeaca964b46796fd26893b81f8a561) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kevin Barnett [Thu, 10 Aug 2017 18:46:57 +0000 (13:46 -0500)]
scsi: smartpqi: cleanup doorbell register usage.
BugLink: http://bugs.launchpad.net/bugs/1721381 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4f078e24080626764896055d857719cd886e6321) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 41555d540f18f72e8a52d5c4bc14c36413d09916) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 58322fe0069a2ae2a19cf29023cc0b82c7245762) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kevin Barnett [Thu, 10 Aug 2017 18:46:39 +0000 (13:46 -0500)]
scsi: smartpqi: add pqi reset quiesce support
BugLink: http://bugs.launchpad.net/bugs/1721381 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 336b68193165b1215d21dd05619dc262340e404b) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Benjamin Block [Tue, 3 Oct 2017 16:44:33 +0000 (13:44 -0300)]
UBUNTU: SAUCE: s390: update zfcpdump_defconfig
BugLink: http://bugs.launchpad.net/bugs/1719290
Starting with v4.13 generating a kernel config with the seed-file
zfcpdump_defconfig does not include CONFIG_DEBUG_FS anymore. Standalone
dump will not be possible w/o that option, so re-enable it again
explicitely.
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Colin King <colin.king@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Mika Westerberg [Fri, 18 Aug 2017 10:05:54 +0000 (13:05 +0300)]
pinctrl: intel: Add Intel Cannon Lake PCH-H pin controller support
BugLink: http://bugs.launchpad.net/bugs/1685729
This is desktop version Intel Cannon Lake PCH. The GPIO hardware is the
same but pin list differs a bit. Add support for this to the existing
Cannon Lake pin controller driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
(cherry picked from commit a663ccf0fea17609b92ecc066ce6e8dda559ca73) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Will Deacon [Tue, 3 Oct 2017 19:35:04 +0000 (14:35 -0500)]
arm64: mm: Use READ_ONCE when dereferencing pointer to pte table
On kernels built with support for transparent huge pages, different CPUs
can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
and they must take care to use READ_ONCE to avoid value tearing or caching
of stale values by the compiler. Unfortunately, these functions call into
our pgtable macros, which don't use READ_ONCE, and compiler caching has
been observed to cause the following crash during ext4 writeback:
This is because page_vma_mapped_walk loads the PMD twice before calling
pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
(where it sees a valid table pointer due to a concurrent pmd_populate).
However, the compiler inlines everything and caches the first value in
a register, which is subsequently used in pte_offset_phys which returns
a junk pointer that is later dereferenced when attempting to access the
relevant pte.
This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
that a stale value is not used. Whilst this is a point fix for a known
failure (and simple to backport), a full fix moving all of our page table
accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
page_vma_mapped_walk is in the works for a future kernel release.
Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") BugLink: https://launchpad.net/bugs/1721067 Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit f069faba688701c4d56b6c3452a130f97bf02e95
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Colin Ian King [Tue, 3 Oct 2017 12:12:54 +0000 (13:12 +0100)]
UBUNTU: SAUCE: LSM stacking: check for invalid zero sized writes
BugLink: http://bugs.launchpad.net/bugs/1720779
Writing zero bytes to /proc/$pid/task/$pid/attr/context via
security_setprocattr cause an oops in memcpy_erms. Fix this by
checking for zero size and returning -EINVAL for this invalid
write size.
Detected by running stress-ng --procfs 0
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Thu, 28 Sep 2017 17:47:13 +0000 (13:47 -0400)]
UBUNTU: SAUCE: LSM stacking: make sure LSM blob align on 64 bit boundaries
The security->task blob reserving the first 12 bytes means that LSM
blobs don't align on 64 byte boundaries. This is not a problem
for x86 but if an LSM stores a long or ptr in its blob, then some
architectures require it be aligned to the arch word size and
will through a fault.
Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Wed, 27 Sep 2017 06:05:22 +0000 (02:05 -0400)]
UBUNTU: SAUCE: LSM stacking: provide a way to specify the default display lsm
When the system boots up the desired default display LSM maybe different
than the first LSM initialized. Allow it to be set by specifying an
LSM with
security.display=apparmor
Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The base stacking code does not provide a way for users to specify the
desired stack using the kernel boot parameters. Enable specifying an
LSM stack on the command line by providing a comma separated list
ie.
security=apparmor,selinux
Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The stack configs in the base stacking patches are confusing and
separate the selinux/smack stacking from the other LSMs with an
"extreme" stacking entry which is extremely confusing.
Switch the "extreme" stacking to a select for mutually exclusive
LSMs, which provides a better explanation of what is happening.
Fixes: 6c5100029055 ("LSM: general but not extreme module stacking") Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The patch that adds the smack subdirs also changes how set_procattr
is handled. It makes the assumption that each LSM will attempt
to handle the context written in turn and return ENOENT, if
the LSM doesn't handle the context and another should try.
This is wrong in two ways, LSMs may return ENOENT as an error for
a context that is valid but is not present in currently loaded
policy, and the code is based on earlier versions where each
LSM is tried in turn.
Under the current patchset there is the concept of the current LSM
which will be used by applications using the old interfaces, to
ensure those interfaces which were not designed for multiple LSM
input/out only ever have to deal with a single LSM.
Fixes: 86400b03f812 ("procfs: add smack subdir to attrs") Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: LSM stacking: LSM: general but not extreme module stacking
Leverage the infrastructure management of the credential and
file security blobs to allow stacking of security modules in
all but the most extreme case. Security modules are informed
of the location of their data within the blobs at module
initialization.
Stacking is optional. If stacking is not configured the old
limit of one "major" security module applies. If stacking is
configured any combination that does not include both SELinux
and Smack is allowed.
A subdirectory has been added to /proc/.../attr for each of
SELinux and AppArmor (Smack introduced such a subdirectory earlier)
to disambiguate what data is provided in the proc/.../attr
interfaces. An entry "context" is added to /proc/.../attr and
to each of the subdirectories. The "context" entry provides
process attribute information in the form:
UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of the remaining blobs
Move management of the inode, ipc, key, msg_msg, sock and superblock
security blobs from the security modules to the infrastructure.
Use of the blob pointers is abstracted in the security modules.
Move management of task security blobs into the security
infrastructure. Modules are required to identify the space
they require. At this time there are no modules that use
task blobs.
Move the management of file security blobs from the individual
security modules to the security infrastructure. The security modules
using file blobs have been updated accordingly. Modules are required
to identify the space they need at module initialization. In some
cases a module no longer needs to supply a blob management hook, in
which case the hook has been removed.
Move the management of credential security blobs from the
individual security modules to the security infrastructure.
The security modules using credential blobs have been updated
accordingly. Modules are required to identify the space they
require at module initialization. In some cases a module no
longer needs to supply blob management hook, in which case
the hook has been removed.
UBUNTU: SAUCE: LSM stacking: procfs: add smack subdir to attrs
Back in 2007 I made what turned out to be a rather serious
mistake in the implementation of the Smack security module.
The SELinux module used an interface in /proc to manipulate
the security context on processes. Rather than use a similar
interface, I used the same interface. The AppArmor team did
likewise. Now /proc/.../attr/current will tell you the
security "context" of the process, but it will be different
depending on the security module you're using.
This patch provides a subdirectory in /proc/.../attr for
Smack. Smack user space can use the "current" file in
this subdirectory and never have to worry about getting
SELinux attributes by mistake. Programs that use the
old interface will continue to work (or fail, as the case
may be) as before.
This patch does not include subdirectories for SELinux
or AppArmor. I do have a patch that provides those, and
will happily make it available should anyone see value
in it.
xhci: set missing SuperSpeedPlus Link Protocol bit in roothub descriptor
BugLink: http://bugs.launchpad.net/bugs/1720045
A SuperSpeedPlus roothub needs to have the Link Protocol (LP) bit set in
the bmSublinkSpeedAttr[] entry of a SuperSpeedPlus descriptor.
If the xhci controller has an optional Protocol Speed ID (PSI) table then
that will be used as a base to create the roothub SuperSpeedPlus
descriptor.
The PSI table does not however necessary contain the LP bit so we need
to set it manually.
Check the psi speed and set LP bit if speed is 10Gbps or higher.
We're not setting it for 5 to 10Gbps as USB 3.1 specification always
mention SuperSpeedPlus for 10Gbps or higher, and some SSIC USB 3.0 speeds
can be over 5Gbps, such as SSIC-G3B-L1 at 5830 Mbps
Tony Luck [Thu, 24 Aug 2017 16:26:52 +0000 (09:26 -0700)]
x86/intel_rdt: Turn off most RDT features on Skylake
BugLink: http://bugs.launchpad.net/bugs/1713619
Errata list is included in this document:
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/6th-gen-x-series-spec-update.pdf
with more details in:
https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
But the tl;dr summary (using tags from first of the documents) is:
SKZ4 MBM does not accurately track write bandwidth
SKZ17 CMT counters may not count accurately
SKZ18 CAT may not restrict cacheline allocation under certain conditions
SKZ19 MBM counters may undercount
Disable all these features on Skylake models. Users who understand the
errata may re-enable using boot command line options.
Tony Luck [Thu, 24 Aug 2017 16:26:51 +0000 (09:26 -0700)]
x86/intel_rdt: Add command line options for resource director technology
BugLink: http://bugs.launchpad.net/bugs/1713619
Command line options allow us to ignore features that we don't want.
Also we can re-enable options that have been disabled on a platform
(so long as the underlying h/w actually supports the option).
[ tglx: Marked the option array __initdata and the helper function __init ]
Colin Ian King [Tue, 8 Aug 2017 09:28:59 +0000 (10:28 +0100)]
x86/intel_rdt: Remove redundant ternary operator on return
BugLink: http://bugs.launchpad.net/bugs/1591609
The use of the ternary operator is redundant as ret can never be
non-zero at that point. Instead, just return nbytes.
Detected by CoverityScan, CID#1452658 ("Logically dead code")
Vikas Shivappa [Wed, 16 Aug 2017 01:00:43 +0000 (18:00 -0700)]
x86/intel_rdt/cqm: Improve limbo list processing
BugLink: http://bugs.launchpad.net/bugs/1591609
During a mkdir, the entire limbo list is synchronously checked on each
package for free RMIDs by sending IPIs. With a large number of RMIDs (SKL
has 192) this creates a intolerable amount of work in IPIs.
Replace the IPI based checking of the limbo list with asynchronous worker
threads on each package which periodically scan the limbo list and move the
RMIDs that have:
llc_occupancy < threshold_occupancy
on all packages to the free list.
mkdir now returns -ENOSPC if the free list and the limbo list ere empty or
returns -EBUSY if there are RMIDs on the limbo list and the free list is
empty.
Getting rid of the IPIs also simplifies the data structures and the
serialization required for handling the lists.
Vikas Shivappa [Wed, 16 Aug 2017 01:00:42 +0000 (18:00 -0700)]
x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
BugLink: http://bugs.launchpad.net/bugs/1591609
When a CPU is dying, the overflow worker is canceled and rescheduled on a
different CPU in the same domain. But if the timer is already about to
expire this essentially doubles the interval which might result in a non
detected overflow.
Cancel the overflow worker and reschedule it immediately on a different CPU
in same domain. The work could be flushed as well, but that would
reschedule it on the same CPU.
Vikas Shivappa [Wed, 9 Aug 2017 18:46:34 +0000 (11:46 -0700)]
x86/intel_rdt: Modify the intel_pqr_state for better performance
BugLink: http://bugs.launchpad.net/bugs/1591609
Currently we have pqr_state and rdt_default_state which store the cached
CLOSID/RMIDs and the user configured cpu default values respectively. We
touch both of these during context switch. Put all of them in one
structure so that we can spare a cache line.
Vikas Shivappa [Wed, 9 Aug 2017 18:46:33 +0000 (11:46 -0700)]
x86/intel_rdt/cqm: Clear the default RMID during hotcpu
BugLink: http://bugs.launchpad.net/bugs/1591609
The user configured per cpu default RMID is not cleared during cpu
hotplug. This may lead to incorrect RMID values after a cpu goes offline
and again comes back online. Clear the per cpu default RMID during cpu
offline and online handling.
x86/intel_rdt: Show bitmask of shareable resource with other executing units
BugLink: http://bugs.launchpad.net/bugs/1591609
CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource.
Each set bit within the length of the CBM indicates the corresponding
unit of the resource allocation may be used by other entities in the
platform (e.g. an integrated graphics engine or hardware units outside
the processor core and have direct access to the resource). Each
cleared bit within the length of the CBM indicates the corresponding
allocation unit can be configured to implement a priority-based
allocation scheme without interference with other hardware agents in
the system. Bits outside the length of the CBM are reserved.
More details on the bit mask are described in x86 Software Developer's
Manual.
The bitmask is shown in "info" directory for each resource. It's
up to user to decide how to use the bitmask within a CBM in a partition
to share or isolate a resource with other executing units.
BugLink: http://bugs.launchpad.net/bugs/1591609
Set up a delayed work queue for each domain that will read all
the MBM counters of active RMIDs once per second to make sure
that they don't wrap around between reads from users.
[Tony: Added the initializations for the work structure and completed
the patch]
BugLink: http://bugs.launchpad.net/bugs/1591609
MBM counters are monotonically increasing counts representing the total
memory bytes at a particular time. In order to calculate total_bytes for
an rdtgroup, we store the value of the counter when we create an
rdtgroup or when a new domain comes online.
When the total_bytes(all memory controller bytes) or local_bytes(local
memory controller bytes) file in "mon_data" is read it shows the
total bytes for that rdtgroup since its creation. User can snapshot this
at different time intervals to obtain bytes/second.
Tony Luck [Tue, 25 Jul 2017 21:14:45 +0000 (14:14 -0700)]
x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
BugLink: http://bugs.launchpad.net/bugs/1591609
Check CPUID bits for whether each of the MBM events is supported.
Allocate space for each RMID for each counter in each domain to save
previous MSR counter value and running total of data.
Create files in each of the monitor directories.
BugLink: http://bugs.launchpad.net/bugs/1591609
Resource groups have a per domain directory under "mon_data". Add or
remove these directories as and when domains come online and go offline.
Also update the per cpu rmids and cache upon onlining and offlining
cpus.
BugLink: http://bugs.launchpad.net/bugs/1591609
OS associates an RMID/CLOSid to a task by writing the per CPU
IA32_PQR_ASSOC MSR when a task is scheduled in.
The sched_in code will stay as no-op unless we are running on Intel SKU
which supports either resource control or monitoring and we also enable
them by mounting the resctrl fs. The per cpu CLOSid/RMID values are
cached and the write is performed only when a task with a different
CLOSid/RMID is scheduled in.
x86/intel_rdt: Introduce rdt_enable_key for scheduling
BugLink: http://bugs.launchpad.net/bugs/1591609
Introduce the usage of rdt_enable_key in sched_in code as a preparation
to add RDT monitoring support for sched_in.
BugLink: http://bugs.launchpad.net/bugs/1591609
Add monitoring support during mount and unmount. Since root directory is
a "ctrl_mon" directory which can control and monitor resources create
the "mon_groups" directory which can hold monitor groups and a
"mon_data" directory which would hold all monitoring data like the rest
of resource groups.
The mount succeeds if either of monitoring or control/allocation is
enabled. If only monitoring is enabled user can still create monitor
groups under the "/sys/fs/resctrl/mon_groups/" and any mkdir under root
would fail. If only control/allocation is enabled all of the monitoring
related directories/files would not exist and resctrl would work in
legacy mode.
BugLink: http://bugs.launchpad.net/bugs/1591609
Resource groups (ctrl_mon and monitor groups) are represented by
directories in resctrl fs. Add support to remove the directories.
When a ctrl_mon directory is removed all the cpus and tasks are assigned
back to the root rdtgroup. When a monitor group is removed the cpus and
tasks are returned to the parent control group.
BugLink: http://bugs.launchpad.net/bugs/1591609
Re-factor the code to separate the ctrl group removal from the rmdir to
prepare to add RDT monitoring group removal.
BugLink: http://bugs.launchpad.net/bugs/1591609
Add a mon_data directory for the root rdtgroup and all other rdtgroups.
The directory holds all of the monitored data for all domains and events
of all resources being monitored.
The mon_data itself has a list of directories in the format
mon_<domain_name>_<domain_id>. Each of these subdirectories contain one
file per event in the mode "0444". Reading the file displays a snapshot
of the monitored data for the event the file represents.
For ex, on a 2 socket Broadwell with llc_occupancy being
monitored the mon_data contents look as below:
$ ls /sys/fs/resctrl/p1/mon_data/
mon_L3_00
mon_L3_01
Each domain directory has one file per event:
$ ls /sys/fs/resctrl/p1/mon_data/mon_L3_00/
llc_occupancy
To read current llc_occupancy of ctrl_mon group p1
$ cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy 33789096
[This patch idea is based on Tony's sample patches to organise data in a
per domain directory and have one file per event (and use the fp->priv to
store mon data bits)]
x86/intel_rdt: Prepare for RDT monitor data support
BugLink: http://bugs.launchpad.net/bugs/1591609
Rename the intel_rdt_schemata file to intel_rdt_ctrlmondata as we now
want to add support for RDT monitoring data for the events that are
supported in later patches.
BugLink: http://bugs.launchpad.net/bugs/1591609
The cpus file is extended to support resource monitoring. This is used
to over-ride the RMID of the default group when running on specific
CPUs. It works similar to the resource control. The "cpus" and
"cpus_list" file is present in default group, ctrl_mon groups and
monitor groups.
Each "cpus" file or cpu_list file reads a cpumask or list showing which
CPUs belong to the resource group. By default all online cpus belong to
the default root group. A CPU can be present in one "ctrl_mon" and one
"monitor" group simultaneously. They can be added to a resource group by
writing the CPU to the file. When a CPU is added to a ctrl_mon group it
is automatically removed from the previous ctrl_mon group. A CPU can be
added to a monitor group only if it is present in the parent ctrl_mon
group and when a CPU is added to a monitor group, it is automatically
removed from the previous monitor group. When CPUs go offline, they are
automatically removed from the ctrl_mon and monitor groups.
x86/intel_rdt: Prepare to add RDT monitor cpus file support
BugLink: http://bugs.launchpad.net/bugs/1591609
Separate the ctrl cpus file handling from the generic cpus file handling
and convert the per cpu closid from u32 to a struct which will be used
later to add rmid to the same struct. Also cleanup some name space.
BugLink: http://bugs.launchpad.net/bugs/1591609
The root directory, ctrl_mon and monitor groups are populated
with a read/write file named "tasks". When read, it shows all the task
IDs assigned to the resource group.
Tasks can be added to groups by writing the PID to the file. A task can
be present in one "ctrl_mon" group "and" one "monitor" group. IOW a
PID_x can be seen in a ctrl_mon group and a monitor group at the same
time. When a task is added to a ctrl_mon group, it is automatically
removed from the previous ctrl_mon group where it belonged. Similarly if
a task is moved to a monitor group it is removed from the previous
monitor group . Also since the monitor groups can only have subset of
tasks of parent ctrl_mon group, a task can be moved to a monitor group
only if its already present in the parent ctrl_mon group.
Task membership is indicated by a new field in the task_struct "u32
rmid" which holds the RMID for the task. RMID=0 is reserved for the
default root group where the tasks belong to at mount.
[tony: zero the rmid if rdtgroup was deleted when task was being moved]
BugLink: http://bugs.launchpad.net/bugs/1591609
OS associates a CLOSid(Class of service id) to a task by writing the
high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in.
CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and
it is zero indexed. Hence change the type to u32 from int.
x86/intel_rdt/cqm: Add mkdir support for RDT monitoring
BugLink: http://bugs.launchpad.net/bugs/1591609
Resource control groups can be created using mkdir in resctrl
fs(rdtgroup). In order to extend the resctrl interface to support
monitoring the control groups, extend the current mkdir to support
resource monitoring also.
This allows the rdtgroup created under the root directory to be able to
both control and monitor resources (ctrl_mon group). The ctrl_mon groups
are associated with one CLOSID like the legacy rdtgroups and one
RMID(Resource monitoring ID) as well. Hardware uses RMID to track the
resource usage. Once either of the CLOSID or RMID are exhausted, the
mkdir fails with -ENOSPC. If there are RMIDs in limbo list but not free
an -EBUSY is returned. User can also monitor a subset of the ctrl_mon
rdtgroup's tasks/cpus using the monitor groups. The monitor groups are
created using mkdir under the "mon_groups" directory in every ctrl_mon
group.
[Merged Tony's code: Removed a lot of common mkdir code, a fix to handling
of the list of the child rdtgroups and some cleanups in list
traversal. Also the changes to have similar alloc and free for CLOS/RMID
and return -EBUSY when RMIDs are in limbo and not free]
x86/intel_rdt: Prepare for RDT monitoring mkdir support
BugLink: http://bugs.launchpad.net/bugs/1591609
Separate the ctrl mkdir code from the rest in order to prepare for
adding support for RDT monitoring mkdir support as well.