]>
Commit | Line | Data |
---|---|---|
e15fbb67 AK |
1 | CGroup Namespaces |
2 | ||
3 | CGroup Namespace provides a mechanism to virtualize the view of the | |
4 | /proc/<pid>/cgroup file. The CLONE_NEWCGROUP clone-flag can be used with | |
5 | clone() and unshare() syscalls to create a new cgroup namespace. | |
6 | The process running inside the cgroup namespace will have its /proc/<pid>/cgroup | |
7 | output restricted to cgroupns-root. cgroupns-root is the cgroup of the process | |
8 | at the time of creation of the cgroup namespace. | |
9 | ||
10 | Prior to CGroup Namespace, the /proc/<pid>/cgroup file used to show complete | |
11 | path of the cgroup of a process. In a container setup (where a set of cgroups | |
12 | and namespaces are intended to isolate processes), the /proc/<pid>/cgroup file | |
13 | may leak potential system level information to the isolated processes. | |
14 | ||
15 | For Example: | |
16 | $ cat /proc/self/cgroup | |
17 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1 | |
18 | ||
19 | The path '/batchjobs/container_id1' can generally be considered as system-data | |
20 | and its desirable to not expose it to the isolated process. | |
21 | ||
22 | CGroup Namespaces can be used to restrict visibility of this path. | |
23 | For Example: | |
24 | # Before creating cgroup namespace | |
25 | $ ls -l /proc/self/ns/cgroup | |
26 | lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] | |
27 | $ cat /proc/self/cgroup | |
28 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1 | |
29 | ||
30 | # unshare(CLONE_NEWCGROUP) and exec /bin/bash | |
31 | $ ~/unshare -c | |
32 | [ns]$ ls -l /proc/self/ns/cgroup | |
33 | lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] | |
34 | # From within new cgroupns, process sees that its in the root cgroup | |
35 | [ns]$ cat /proc/self/cgroup | |
36 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/ | |
37 | ||
38 | # From global cgroupns: | |
39 | $ cat /proc/<pid>/cgroup | |
40 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1 | |
41 | ||
42 | # Unshare cgroupns along with userns and mountns | |
43 | # Following calls unshare(CLONE_NEWCGROUP|CLONE_NEWUSER|CLONE_NEWNS), then | |
44 | # sets up uid/gid map and execs /bin/bash | |
45 | $ ~/unshare -c -u -m | |
46 | # Originally, we were in /batchjobs/container_id1 cgroup. Mount our own cgroup | |
47 | # hierarchy. | |
48 | [ns]$ mount -t cgroup cgroup /tmp/cgroup | |
49 | [ns]$ ls -l /tmp/cgroup | |
50 | total 0 | |
51 | -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.controllers | |
52 | -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.populated | |
53 | -rw-r--r-- 1 root root 0 2014-10-13 09:25 cgroup.procs | |
54 | -rw-r--r-- 1 root root 0 2014-10-13 09:32 cgroup.subtree_control | |
55 | ||
56 | The cgroupns-root (/batchjobs/container_id1 in above example) becomes the | |
57 | filesystem root for the namespace specific cgroupfs mount. | |
58 | ||
59 | The virtualization of /proc/self/cgroup file combined with restricting | |
60 | the view of cgroup hierarchy by namespace-private cgroupfs mount | |
61 | should provide a completely isolated cgroup view inside the container. | |
62 | ||
63 | In its current form, the cgroup namespaces patcheset provides following | |
64 | behavior: | |
65 | ||
66 | (1) The 'cgroupns-root' for a cgroup namespace is the cgroup in which | |
67 | the process calling unshare is running. | |
68 | For ex. if a process in /batchjobs/container_id1 cgroup calls unshare, | |
69 | cgroup /batchjobs/container_id1 becomes the cgroupns-root. | |
70 | For the init_cgroup_ns, this is the real root ('/') cgroup | |
71 | (identified in code as cgrp_dfl_root.cgrp). | |
72 | ||
73 | (2) The cgroupns-root cgroup does not change even if the namespace | |
74 | creator process later moves to a different cgroup. | |
75 | $ ~/unshare -c # unshare cgroupns in some cgroup | |
76 | [ns]$ cat /proc/self/cgroup | |
77 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/ | |
78 | [ns]$ mkdir sub_cgrp_1 | |
79 | [ns]$ echo 0 > sub_cgrp_1/cgroup.procs | |
80 | [ns]$ cat /proc/self/cgroup | |
81 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1 | |
82 | ||
83 | (3) Each process gets its CGROUPNS specific view of /proc/<pid>/cgroup | |
84 | (a) Processes running inside the cgroup namespace will be able to see | |
85 | cgroup paths (in /proc/self/cgroup) only inside their root cgroup | |
86 | [ns]$ sleep 100000 & # From within unshared cgroupns | |
87 | [1] 7353 | |
88 | [ns]$ echo 7353 > sub_cgrp_1/cgroup.procs | |
89 | [ns]$ cat /proc/7353/cgroup | |
90 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1 | |
91 | ||
92 | (b) From global cgroupns, the real cgroup path will be visible: | |
93 | $ cat /proc/7353/cgroup | |
94 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1/sub_cgrp_1 | |
95 | ||
96 | (c) From a sibling cgroupns (cgroupns root-ed at a different cgroup), cgroup | |
97 | path relative to its own cgroupns-root will be shown: | |
98 | # ns2's cgroupns-root is at '/batchjobs/container_id2' | |
99 | [ns2]$ cat /proc/7353/cgroup | |
100 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/../container_id2/sub_cgrp_1 | |
101 | ||
102 | Note that the relative path always starts with '/' to indicate that its | |
103 | relative to the cgroupns-root of the caller. | |
104 | ||
105 | (4) Processes inside a cgroupns can move in-and-out of the cgroupns-root | |
106 | (if they have proper access to external cgroups). | |
107 | # From inside cgroupns (with cgroupns-root at /batchjobs/container_id1), and | |
108 | # assuming that the global hierarchy is still accessible inside cgroupns: | |
109 | $ cat /proc/7353/cgroup | |
110 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1 | |
111 | $ echo 7353 > batchjobs/container_id2/cgroup.procs | |
112 | $ cat /proc/7353/cgroup | |
113 | 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/../container_id2 | |
114 | ||
115 | Note that this kind of setup is not encouraged. A task inside cgroupns | |
116 | should only be exposed to its own cgroupns hierarchy. Otherwise it makes | |
117 | the virtualization of /proc/<pid>/cgroup less useful. | |
118 | ||
119 | (5) Setns to another cgroup namespace is allowed when: | |
120 | (a) the process has CAP_SYS_ADMIN in its current userns | |
121 | (b) the process has CAP_SYS_ADMIN in the target cgroupns' userns | |
122 | No implicit cgroup changes happen with attaching to another cgroupns. It | |
123 | is expected that the somone moves the attaching process under the target | |
124 | cgroupns-root. | |
125 | ||
126 | (6) When some thread from a multi-threaded process unshares its | |
127 | cgroup-namespace, the new cgroupns gets applied to the entire process (all | |
128 | the threads). For the unified-hierarchy this is expected as it only allows | |
129 | process-level containerization. For the legacy hierarchies this may be | |
130 | unexpected. So all the threads in the process will have the same cgroup. | |
131 | ||
132 | (7) The cgroup namespace is alive as long as there is atleast 1 | |
133 | process inside it. When the last process exits, the cgroup | |
134 | namespace is destroyed. The cgroupns-root and the actual cgroups | |
135 | remain though. | |
136 | ||
137 | (8) Namespace specific cgroup hierarchy can be mounted by a process running | |
138 | inside cgroupns: | |
139 | $ mount -t cgroup -o __DEVEL__sane_behavior cgroup $MOUNT_POINT | |
140 | ||
141 | This will mount the unified cgroup hierarchy with cgroupns-root as the | |
142 | filesystem root. The process needs CAP_SYS_ADMIN in its userns and mntns. |