[mirror_ubuntu-artful-kernel.git] / Documentation / x86 / intel_rdt_ui.txt

User Interface for Resource Allocation in Intel Resource Director Technology

Copyright (C) 2016 Intel Corporation

Fenghua Yu <fenghua.yu@intel.com>
Tony Luck <tony.luck@intel.com>
Vikas Shivappa <vikas.shivappa@intel.com>

This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".

To use the feature mount the file system:

 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl

mount options are:

"cdp": Enable code/data prioritization in L3 cache allocations.


Info directory
--------------

The 'info' directory contains information about the enabled
resources. Each resource has its own subdirectory. The subdirectory
names reflect the resource names.
Cache resource(L3/L2)  subdirectory contains the following files:

"num_closids":  	The number of CLOSIDs which are valid for this
			resource. The kernel uses the smallest number of
			CLOSIDs of all enabled resources as limit.

"cbm_mask":     	The bitmask which is valid for this resource.
			This mask is equivalent to 100%.

"min_cbm_bits": 	The minimum number of consecutive bits which
			must be set when writing a mask.

Memory bandwitdh(MB) subdirectory contains the following files:

"min_bandwidth":	The minimum memory bandwidth percentage which
			user can request.

"bandwidth_gran":	The granularity in which the memory bandwidth
			percentage is allocated. The allocated
			b/w percentage is rounded off to the next
			control step available on the hardware. The
			available bandwidth control steps are:
			min_bandwidth + N * bandwidth_gran.

"delay_linear": 	Indicates if the delay scale is linear or
			non-linear. This field is purely informational
			only.

Resource groups
---------------
Resource groups are represented as directories in the resctrl file
system. The default group is the root directory. Other groups may be
created as desired by the system administrator using the "mkdir(1)"
command, and removed using "rmdir(1)".

There are three files associated with each group:

"tasks": A list of tasks that belongs to this group. Tasks can be
	added to a group by writing the task ID to the "tasks" file
	(which will automatically remove them from the previous
	group to which they belonged). New tasks created by fork(2)
	and clone(2) are added to the same group as their parent.
	If a pid is not in any sub partition, it is in root partition
	(i.e. default partition).

"cpus": A bitmask of logical CPUs assigned to this group. Writing
	a new mask can add/remove CPUs from this group. Added CPUs
	are removed from their previous group. Removed ones are
	given to the default (root) group. You cannot remove CPUs
	from the default group.

"cpus_list": One or more CPU ranges of logical CPUs assigned to this
	     group. Same rules apply like for the "cpus" file.

"schemata": A list of all the resources available to this group.
	Each resource has its own line and format - see below for
	details.

When a task is running the following rules define which resources
are available to it:

1) If the task is a member of a non-default group, then the schemata
for that group is used.

2) Else if the task belongs to the default group, but is running on a
CPU that is assigned to some specific group, then the schemata for
the CPU's group is used.

3) Otherwise the schemata for the default group is used.


Schemata files - general concepts
---------------------------------
Each line in the file describes one resource. The line starts with
the name of the resource, followed by specific values to be applied
in each of the instances of that resource on the system.

Cache IDs
---------
On current generation systems there is one L3 cache per socket and L2
caches are generally just shared by the hyperthreads on a core, but this
isn't an architectural requirement. We could have multiple separate L3
caches on a socket, multiple cores could share an L2 cache. So instead
of using "socket" or "core" to define the set of logical cpus sharing
a resource we use a "Cache ID". At a given cache level this will be a
unique number across the whole system (but it isn't guaranteed to be a
contiguous sequence, there may be gaps).  To find the ID for each logical
CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id

Cache Bit Masks (CBM)
---------------------
For cache resources we describe the portion of the cache that is available
for allocation using a bitmask. The maximum value of the mask is defined
by each cpu model (and may be different for different cache levels). It
is found using CPUID, but is also provided in the "info" directory of
the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
requires that these masks have all the '1' bits in a contiguous block. So
0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
of the capacity of the cache. You could partition the cache into four
equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.

Memory bandwidth(b/w) percentage
--------------------------------
For Memory b/w resource, user controls the resource by indicating the
percentage of total memory b/w.

The minimum bandwidth percentage value for each cpu model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the cpu model and can
be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are rounded
to the next control step available on the hardware.

The bandwidth throttling is a core specific mechanism on some of Intel
SKUs. Using a high bandwidth and a low bandwidth setting on two threads
sharing a core will result in both threads being throttled to use the
low bandwidth.

L3 details (code and data prioritization disabled)
--------------------------------------------------
With CDP disabled the L3 schemata format is:

	L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...

L3 details (CDP enabled via mount option to resctrl)
----------------------------------------------------
When CDP is enabled L3 control is split into two separate resources
so you can specify independent masks for code and data like this:

	L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
	L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...

L2 details
----------
L2 cache does not support code and data prioritization, so the
schemata format is always:

	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...

Memory b/w Allocation details
-----------------------------

Memory b/w domain is L3 cache.

	MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...

Reading/writing the schemata file
---------------------------------
Reading the schemata file will show the state of all resources
on all domains. When writing you only need to specify those values
which you wish to change.  E.g.

# cat schemata
L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
# echo "L3DATA:2=3c0;" > schemata
# cat schemata
L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff

Example 1
---------
On a two socket machine (one L3 cache per socket) with just four bits
for cache bit masks, minimum b/w of 10% with a memory bandwidth
granularity of 10%

# mount -t resctrl resctrl /sys/fs/resctrl
# cd /sys/fs/resctrl
# mkdir p0 p1
# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata

The default resource group is unmodified, so we have access to all parts
of all caches (its schemata file reads "L3:0=f;1=f").

Tasks that are under the control of group "p0" may only allocate from the
"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
Tasks in group "p1" use the "lower" 50% of cache on both sockets.

Similarly, tasks that are under the control of group "p0" may use a
maximum memory b/w of 50% on socket0 and 50% on socket 1.
Tasks in group "p1" may also use 50% memory b/w on both sockets.
Note that unlike cache masks, memory b/w cannot specify whether these
allocations can overlap or not. The allocations specifies the maximum
b/w that the group may be able to use and the system admin can configure
the b/w accordingly.

Example 2
---------
Again two sockets, but this time with a more realistic 20-bit mask.

Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
neighbors, each of the two real-time tasks exclusively occupies one quarter
of L3 cache on socket 0.

# mount -t resctrl resctrl /sys/fs/resctrl
# cd /sys/fs/resctrl

First we reset the schemata for the default group so that the "upper"
50% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
ordinary tasks:

# echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata

Next we make a resource group for our first real time task and give
it access to the "top" 25% of the cache on socket 0.

# mkdir p0
# echo "L3:0=f8000;1=fffff" > p0/schemata

Finally we move our first real time task into this resource group. We
also use taskset(1) to ensure the task always runs on a dedicated CPU
on socket 0. Most uses of resource groups will also constrain which
processors tasks run on.

# echo 1234 > p0/tasks
# taskset -cp 1 1234

Ditto for the second real time task (with the remaining 25% of cache):

# mkdir p1
# echo "L3:0=7c00;1=fffff" > p1/schemata
# echo 5678 > p1/tasks
# taskset -cp 2 5678

For the same 2 socket system with memory b/w resource and CAT L3 the
schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
10):

For our first real time task this would request 20% memory b/w on socket
0.

# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata

For our second real time task this would request an other 20% memory b/w
on socket 0.

# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata

Example 3
---------

A single socket system which has real-time tasks running on core 4-7 and
non real-time workload assigned to core 0-3. The real-time tasks share text
and data, so a per task association is not required and due to interaction
with the kernel it's desired that the kernel on these cores shares L3 with
the tasks.

# mount -t resctrl resctrl /sys/fs/resctrl
# cd /sys/fs/resctrl

First we reset the schemata for the default group so that the "upper"
50% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
cannot be used by ordinary tasks:

# echo "L3:0=3ff\nMB:0=50" > schemata

Next we make a resource group for our real time cores and give it access
to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
socket 0.

# mkdir p0
# echo "L3:0=ffc00\nMB:0=50" > p0/schemata

Finally we move core 4-7 over to the new group and make sure that the
kernel and the tasks running there get 50% of the cache. They should
also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
siblings and only the real time threads are scheduled on the cores 4-7.

# echo F0 > p0/cpus

4) Locking between applications

Certain operations on the resctrl filesystem, composed of read/writes
to/from multiple files, must be atomic.

As an example, the allocation of an exclusive reservation of L3 cache
involves:

  1. Read the cbmmasks from each directory
  2. Find a contiguous set of bits in the global CBM bitmask that is clear
     in any of the directory cbmmasks
  3. Create a new directory
  4. Set the bits found in step 2 to the new directory "schemata" file

If two applications attempt to allocate space concurrently then they can
end up allocating the same bits so the reservations are shared instead of
exclusive.

To coordinate atomic operations on the resctrlfs and to avoid the problem
above, the following locking procedure is recommended:

Locking is based on flock, which is available in libc and also as a shell
script command

Write lock:

 A) Take flock(LOCK_EX) on /sys/fs/resctrl
 B) Read/write the directory structure.
 C) funlock

Read lock:

 A) Take flock(LOCK_SH) on /sys/fs/resctrl
 B) If success read the directory structure.
 C) funlock

Example with bash:

# Atomically read directory structure
$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl

# Read directory contents and create new subdirectory

$ cat create-dir.sh
find /sys/fs/resctrl/ > output.txt
mask = function-of(output.txt)
mkdir /sys/fs/resctrl/newres/
echo mask > /sys/fs/resctrl/newres/schemata

$ flock /sys/fs/resctrl/ ./create-dir.sh

Example with C:

/*
 * Example code do take advisory locks
 * before accessing resctrl filesystem
 */
#include <sys/file.h>
#include <stdlib.h>

void resctrl_take_shared_lock(int fd)
{
	int ret;

	/* take shared lock on resctrl filesystem */
	ret = flock(fd, LOCK_SH);
	if (ret) {
		perror("flock");
		exit(-1);
	}
}

void resctrl_take_exclusive_lock(int fd)
{
	int ret;

	/* release lock on resctrl filesystem */
	ret = flock(fd, LOCK_EX);
	if (ret) {
		perror("flock");
		exit(-1);
	}
}

void resctrl_release_lock(int fd)
{
	int ret;

	/* take shared lock on resctrl filesystem */
	ret = flock(fd, LOCK_UN);
	if (ret) {
		perror("flock");
		exit(-1);
	}
}

void main(void)
{
	int fd, ret;

	fd = open("/sys/fs/resctrl", O_DIRECTORY);
	if (fd == -1) {
		perror("open");
		exit(-1);
	}
	resctrl_take_shared_lock(fd);
	/* code to read directory contents */
	resctrl_release_lock(fd);

	resctrl_take_exclusive_lock(fd);
	/* code to read and write directory contents */
	resctrl_release_lock(fd);
}
Commit	Line	Data
f20e5789 FY	1	User Interface for Resource Allocation in Intel Resource Director Technology
	2
	3	Copyright (C) 2016 Intel Corporation
	4
	5	Fenghua Yu <fenghua.yu@intel.com>
	6	Tony Luck <tony.luck@intel.com>
a9cad3d4	7	Vikas Shivappa <vikas.shivappa@intel.com>
f20e5789 FY	8
	9	This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
	10	X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
	11
	12	To use the feature mount the file system:
	13
	14	# mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
	15
	16	mount options are:
	17
	18	"cdp": Enable code/data prioritization in L3 cache allocations.
	19
	20
458b0d6e TG	21	Info directory
	22	--------------
	23
	24	The 'info' directory contains information about the enabled
	25	resources. Each resource has its own subdirectory. The subdirectory
a9cad3d4 VS	26	names reflect the resource names.
a9cad3d4 VS	27	Cache resource(L3/L2) subdirectory contains the following files:
458b0d6e	28
a9cad3d4 VS	29	"num_closids": The number of CLOSIDs which are valid for this
	30	resource. The kernel uses the smallest number of
	31	CLOSIDs of all enabled resources as limit.
458b0d6e	32
a9cad3d4 VS	33	"cbm_mask": The bitmask which is valid for this resource.
a9cad3d4 VS	34	This mask is equivalent to 100%.
458b0d6e	35
a9cad3d4 VS	36	"min_cbm_bits": The minimum number of consecutive bits which
a9cad3d4 VS	37	must be set when writing a mask.
458b0d6e	38
a9cad3d4 VS	39	Memory bandwitdh(MB) subdirectory contains the following files:
	40
	41	"min_bandwidth": The minimum memory bandwidth percentage which
	42	user can request.
	43
	44	"bandwidth_gran": The granularity in which the memory bandwidth
	45	percentage is allocated. The allocated
	46	b/w percentage is rounded off to the next
	47	control step available on the hardware. The
	48	available bandwidth control steps are:
	49	min_bandwidth + N * bandwidth_gran.
	50
	51	"delay_linear": Indicates if the delay scale is linear or
	52	non-linear. This field is purely informational
	53	only.
458b0d6e	54
f20e5789 FY	55	Resource groups
	56	---------------
	57	Resource groups are represented as directories in the resctrl file
	58	system. The default group is the root directory. Other groups may be
	59	created as desired by the system administrator using the "mkdir(1)"
	60	command, and removed using "rmdir(1)".
	61
	62	There are three files associated with each group:
	63
	64	"tasks": A list of tasks that belongs to this group. Tasks can be
	65	added to a group by writing the task ID to the "tasks" file
	66	(which will automatically remove them from the previous
	67	group to which they belonged). New tasks created by fork(2)
	68	and clone(2) are added to the same group as their parent.
	69	If a pid is not in any sub partition, it is in root partition
	70	(i.e. default partition).
	71
	72	"cpus": A bitmask of logical CPUs assigned to this group. Writing
	73	a new mask can add/remove CPUs from this group. Added CPUs
	74	are removed from their previous group. Removed ones are
	75	given to the default (root) group. You cannot remove CPUs
	76	from the default group.
	77
4ffa3c97 JO	78	"cpus_list": One or more CPU ranges of logical CPUs assigned to this
	79	group. Same rules apply like for the "cpus" file.
	80
f20e5789 FY	81	"schemata": A list of all the resources available to this group.
	82	Each resource has its own line and format - see below for
	83	details.
	84
	85	When a task is running the following rules define which resources
	86	are available to it:
	87
	88	1) If the task is a member of a non-default group, then the schemata
	89	for that group is used.
	90
	91	2) Else if the task belongs to the default group, but is running on a
	92	CPU that is assigned to some specific group, then the schemata for
	93	the CPU's group is used.
	94
	95	3) Otherwise the schemata for the default group is used.
	96
	97
	98	Schemata files - general concepts
	99	---------------------------------
	100	Each line in the file describes one resource. The line starts with
	101	the name of the resource, followed by specific values to be applied
	102	in each of the instances of that resource on the system.
	103
	104	Cache IDs
	105	---------
	106	On current generation systems there is one L3 cache per socket and L2
	107	caches are generally just shared by the hyperthreads on a core, but this
	108	isn't an architectural requirement. We could have multiple separate L3
	109	caches on a socket, multiple cores could share an L2 cache. So instead
	110	of using "socket" or "core" to define the set of logical cpus sharing
	111	a resource we use a "Cache ID". At a given cache level this will be a
	112	unique number across the whole system (but it isn't guaranteed to be a
	113	contiguous sequence, there may be gaps). To find the ID for each logical
	114	CPU look in /sys/devices/system/cpu/cpu/cache/index/id
	115
	116	Cache Bit Masks (CBM)
	117	---------------------
	118	For cache resources we describe the portion of the cache that is available
	119	for allocation using a bitmask. The maximum value of the mask is defined
	120	by each cpu model (and may be different for different cache levels). It
	121	is found using CPUID, but is also provided in the "info" directory of
	122	the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
	123	requires that these masks have all the '1' bits in a contiguous block. So
	124	0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
	125	and 0xA are not. On a system with a 20-bit mask each bit represents 5%
	126	of the capacity of the cache. You could partition the cache into four
	127	equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
	128
a9cad3d4 VS	129	Memory bandwidth(b/w) percentage
	130	--------------------------------
	131	For Memory b/w resource, user controls the resource by indicating the
	132	percentage of total memory b/w.
	133
	134	The minimum bandwidth percentage value for each cpu model is predefined
	135	and can be looked up through "info/MB/min_bandwidth". The bandwidth
	136	granularity that is allocated is also dependent on the cpu model and can
	137	be looked up at "info/MB/bandwidth_gran". The available bandwidth
	138	control steps are: min_bw + N * bw_gran. Intermediate values are rounded
	139	to the next control step available on the hardware.
	140
	141	The bandwidth throttling is a core specific mechanism on some of Intel
	142	SKUs. Using a high bandwidth and a low bandwidth setting on two threads
	143	sharing a core will result in both threads being throttled to use the
	144	low bandwidth.
f20e5789 FY	145
	146	L3 details (code and data prioritization disabled)
	147	--------------------------------------------------
	148	With CDP disabled the L3 schemata format is:
	149
	150	L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
	151
	152	L3 details (CDP enabled via mount option to resctrl)
	153	----------------------------------------------------
	154	When CDP is enabled L3 control is split into two separate resources
	155	so you can specify independent masks for code and data like this:
	156
	157	L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
	158	L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
	159
	160	L2 details
	161	----------
	162	L2 cache does not support code and data prioritization, so the
	163	schemata format is always:
	164
	165	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
	166
a9cad3d4 VS	167	Memory b/w Allocation details
	168	-----------------------------
	169
	170	Memory b/w domain is L3 cache.
	171
	172	MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
	173
c4026b7b TL	174	Reading/writing the schemata file
	175	---------------------------------
	176	Reading the schemata file will show the state of all resources
	177	on all domains. When writing you only need to specify those values
	178	which you wish to change. E.g.
	179
	180	# cat schemata
	181	L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
	182	L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
	183	# echo "L3DATA:2=3c0;" > schemata
	184	# cat schemata
	185	L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
	186	L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
	187
f20e5789 FY	188	Example 1
	189	---------
	190	On a two socket machine (one L3 cache per socket) with just four bits
a9cad3d4 VS	191	for cache bit masks, minimum b/w of 10% with a memory bandwidth
a9cad3d4 VS	192	granularity of 10%
f20e5789 FY	193
	194	# mount -t resctrl resctrl /sys/fs/resctrl
	195	# cd /sys/fs/resctrl
	196	# mkdir p0 p1
a9cad3d4 VS	197	# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
a9cad3d4 VS	198	# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
f20e5789 FY	199
	200	The default resource group is unmodified, so we have access to all parts
	201	of all caches (its schemata file reads "L3:0=f;1=f").
	202
	203	Tasks that are under the control of group "p0" may only allocate from the
	204	"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
	205	Tasks in group "p1" use the "lower" 50% of cache on both sockets.
	206
a9cad3d4 VS	207	Similarly, tasks that are under the control of group "p0" may use a
	208	maximum memory b/w of 50% on socket0 and 50% on socket 1.
	209	Tasks in group "p1" may also use 50% memory b/w on both sockets.
	210	Note that unlike cache masks, memory b/w cannot specify whether these
	211	allocations can overlap or not. The allocations specifies the maximum
	212	b/w that the group may be able to use and the system admin can configure
	213	the b/w accordingly.
	214
f20e5789 FY	215	Example 2
	216	---------
	217	Again two sockets, but this time with a more realistic 20-bit mask.
	218
	219	Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
	220	processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
	221	neighbors, each of the two real-time tasks exclusively occupies one quarter
	222	of L3 cache on socket 0.
	223
	224	# mount -t resctrl resctrl /sys/fs/resctrl
	225	# cd /sys/fs/resctrl
	226
	227	First we reset the schemata for the default group so that the "upper"
a9cad3d4 VS	228	50% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
a9cad3d4 VS	229	ordinary tasks:
f20e5789	230
a9cad3d4	231	# echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata
f20e5789 FY	232
	233	Next we make a resource group for our first real time task and give
	234	it access to the "top" 25% of the cache on socket 0.
	235
	236	# mkdir p0
	237	# echo "L3:0=f8000;1=fffff" > p0/schemata
	238
	239	Finally we move our first real time task into this resource group. We
	240	also use taskset(1) to ensure the task always runs on a dedicated CPU
	241	on socket 0. Most uses of resource groups will also constrain which
	242	processors tasks run on.
	243
	244	# echo 1234 > p0/tasks
	245	# taskset -cp 1 1234
	246
	247	Ditto for the second real time task (with the remaining 25% of cache):
	248
	249	# mkdir p1
	250	# echo "L3:0=7c00;1=fffff" > p1/schemata
	251	# echo 5678 > p1/tasks
	252	# taskset -cp 2 5678
	253
a9cad3d4 VS	254	For the same 2 socket system with memory b/w resource and CAT L3 the
	255	schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
	256	10):
	257
	258	For our first real time task this would request 20% memory b/w on socket
	259	0.
	260
	261	# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
	262
	263	For our second real time task this would request an other 20% memory b/w
	264	on socket 0.
	265
	266	# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
	267
f20e5789 FY	268	Example 3
	269	---------
	270
	271	A single socket system which has real-time tasks running on core 4-7 and
	272	non real-time workload assigned to core 0-3. The real-time tasks share text
	273	and data, so a per task association is not required and due to interaction
	274	with the kernel it's desired that the kernel on these cores shares L3 with
	275	the tasks.
	276
	277	# mount -t resctrl resctrl /sys/fs/resctrl
	278	# cd /sys/fs/resctrl
	279
	280	First we reset the schemata for the default group so that the "upper"
a9cad3d4 VS	281	50% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
a9cad3d4 VS	282	cannot be used by ordinary tasks:
f20e5789	283
a9cad3d4	284	# echo "L3:0=3ff\nMB:0=50" > schemata
f20e5789	285
a9cad3d4 VS	286	Next we make a resource group for our real time cores and give it access
	287	to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
	288	socket 0.
f20e5789 FY	289
f20e5789 FY	290	# mkdir p0
a9cad3d4	291	# echo "L3:0=ffc00\nMB:0=50" > p0/schemata
f20e5789 FY	292
f20e5789 FY	293	Finally we move core 4-7 over to the new group and make sure that the
a9cad3d4 VS	294	kernel and the tasks running there get 50% of the cache. They should
	295	also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
	296	siblings and only the real time threads are scheduled on the cores 4-7.
f20e5789	297
fb8fb46c	298	# echo F0 > p0/cpus
3c2a769d MT	299
	300	4) Locking between applications
	301
	302	Certain operations on the resctrl filesystem, composed of read/writes
	303	to/from multiple files, must be atomic.
	304
	305	As an example, the allocation of an exclusive reservation of L3 cache
	306	involves:
	307
	308	1. Read the cbmmasks from each directory
	309	2. Find a contiguous set of bits in the global CBM bitmask that is clear
	310	in any of the directory cbmmasks
	311	3. Create a new directory
	312	4. Set the bits found in step 2 to the new directory "schemata" file
	313
	314	If two applications attempt to allocate space concurrently then they can
	315	end up allocating the same bits so the reservations are shared instead of
	316	exclusive.
	317
	318	To coordinate atomic operations on the resctrlfs and to avoid the problem
	319	above, the following locking procedure is recommended:
	320
	321	Locking is based on flock, which is available in libc and also as a shell
	322	script command
	323
	324	Write lock:
	325
	326	A) Take flock(LOCK_EX) on /sys/fs/resctrl
	327	B) Read/write the directory structure.
	328	C) funlock
	329
	330	Read lock:
	331
	332	A) Take flock(LOCK_SH) on /sys/fs/resctrl
	333	B) If success read the directory structure.
	334	C) funlock
	335
	336	Example with bash:
	337
	338	# Atomically read directory structure
	339	$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
	340
	341	# Read directory contents and create new subdirectory
	342
	343	$ cat create-dir.sh
	344	find /sys/fs/resctrl/ > output.txt
	345	mask = function-of(output.txt)
	346	mkdir /sys/fs/resctrl/newres/
	347	echo mask > /sys/fs/resctrl/newres/schemata
	348
	349	$ flock /sys/fs/resctrl/ ./create-dir.sh
	350
	351	Example with C:
	352
	353	/*
	354	* Example code do take advisory locks
	355	* before accessing resctrl filesystem
	356	*/
	357	#include <sys/file.h>
	358	#include <stdlib.h>
	359
	360	void resctrl_take_shared_lock(int fd)
	361	{
	362	int ret;
363
364	/* take shared lock on resctrl filesystem */
365	ret = flock(fd, LOCK_SH);
366	if (ret) {
367	perror("flock");
368	exit(-1);
369	}
370	}
371
372	void resctrl_take_exclusive_lock(int fd)
373	{
374	int ret;
375
376	/* release lock on resctrl filesystem */
377	ret = flock(fd, LOCK_EX);
378	if (ret) {
379	perror("flock");
380	exit(-1);
381	}
382	}
383
384	void resctrl_release_lock(int fd)
385	{
386	int ret;
387
388	/* take shared lock on resctrl filesystem */
389	ret = flock(fd, LOCK_UN);
390	if (ret) {
391	perror("flock");
392	exit(-1);
393	}
394	}
395
396	void main(void)
397	{
398	int fd, ret;
399
400	fd = open("/sys/fs/resctrl", O_DIRECTORY);
401	if (fd == -1) {
402	perror("open");
403	exit(-1);
404	}
405	resctrl_take_shared_lock(fd);
406	/* code to read directory contents */
407	resctrl_release_lock(fd);
408
409	resctrl_take_exclusive_lock(fd);
410	/* code to read and write directory contents */
411	resctrl_release_lock(fd);
412	}