]> git.proxmox.com Git - mirror_lxc.git/blob - doc/api-extensions.md
api-extension: add missing seccomp_proxy_send_notify_fd extension
[mirror_lxc.git] / doc / api-extensions.md
1 # API extensions
2
3 The changes below were introduced to the LXC API after the 3.0 API was finalized.
4
5 They are all backward compatible and can be detected by client tools by
6 called the `lxc_has_api_extension` function.
7
8 ## lxc\_log
9
10 This introduces a way to initialize a logging instance from the API for a given
11 container.
12
13 ## lxc\_config\_item\_is\_supported
14
15 This introduces the `lxc_config_item_is_supported` function. It allows users to
16 check whether their LXC instance supports a given configuration key.
17
18 ## console\_log
19
20 This adds support to container's console log. The console log is implemented as
21 an efficient ringbuffer.
22
23 ## reboot2
24
25 This adds `reboot2()` as a new API extension. This function properly waits
26 until a reboot succeeded. It takes a timeout argument. When set to `> 0`
27 `reboot2()` will block until the timeout is reached, if timeout is set to zero
28 `reboot2()` will not block, if set to -1 `reboot2()` will block indefinitely.
29
30 ## mount\_injection
31
32 This adds support for injecting and removing mounts into/from a running
33 containers. Two new API functions `mount()` and `umount()` are added. They
34 mirror the current mount and umount API of the kernel.
35
36 ## seccomp\_allow\_nesting
37
38 This adds support for seccomp filters to be stacked regardless of whether a seccomp profile is already loaded. This allows nested containers to load their own seccomp profile.
39
40 ## seccomp\_notify
41
42 This adds "notify" as seccomp action that will cause LXC to register a seccomp listener and retrieve a listener file descriptor from the kernel. When a syscall is made that is registered as "notify" the kernel will generate a poll event and send a message over the file descriptor.
43
44 The caller can read this message, inspect the syscalls including its arguments. Based on this information the caller is expected to send back a message informing the kernel which action to take. Until that message is sent the kernel will block the calling process. The format of the messages to read and sent is documented in seccomp itself.
45
46 A new API function `seccomp_notify_fd()` has been added which allows callers to retrieve the notifier fd for the container's seccomp filter.
47
48 ## network\_veth\_routes
49
50 This introduces the `lxc.net.[i].veth.ipv4.route` and `lxc.net.[i].veth.ipv6.route` properties
51 on `veth` type network interfaces. This allows adding static routes on host to the container's
52 network interface.
53
54 ## network\_ipvlan
55
56 This introduces the `ipvlan` network type.
57
58 Example usage:
59
60 ```
61 lxc.net[i].type=ipvlan
62 lxc.net[i].ipvlan.mode=[l3|l3s|l2] (defaults to l3)
63 lxc.net[i].ipvlan.isolation=[bridge|private|vepa] (defaults to bridge)
64 lxc.net[i].link=eth0
65 lxc.net[i].flags=up
66 ```
67
68 ## network\_l2proxy
69
70 This introduces the `lxc.net.[i].l2proxy` that can be either `0` or `1`. Defaults to `0`.
71 This, when used with `lxc.net.[i].link`, will add IP neighbour proxy entries on the linked device
72 for any IPv4 and IPv6 addresses on the container's network device.
73
74 For IPv4 addresses it will check the following sysctl values and fail with an error if not set:
75
76 ```
77 net.ipv4.conf.[link].forwarding=1
78 ```
79
80 For IPv6 addresses it will check the following sysctl values and fail with an error if not set:
81
82 ```
83 net.ipv6.conf.[link].proxy_ndp=1
84 net.ipv6.conf.[link].forwarding=1
85 ```
86
87 ## network\_gateway\_device\_route
88
89 This introduces the ability to specify `lxc.net.[i].ipv4.gateway` and/or
90 `lxc.net.[i].ipv6.gateway` with a value of `dev` which will cause the default gateway
91 inside the container to be created as a device route without destination gateway IP needed.
92 This is primarily intended for use with layer 3 networking devices, such as IPVLAN.
93
94 ## network\_phys\_macvlan\_mtu
95
96 This introduces the ability to specify a custom MTU for `phys` and `macvlan` devices using the
97 `lxc.net.[i].mtu` property.
98
99 ## network\_veth\_router
100
101 This introduces the ability to specify a `lxc.net.[i].veth.mode` setting, which takes a value of "bridge" or "router". This defaults to "bridge".
102
103 In "router" mode static routes are created on the host for the container's IP addresses pointing to the host side veth interface. In addition to the routes, a static IP neighbour proxy is added to the host side veth interface for the IPv4 and IPv6 gateway IPs.
104
105
106 ## cgroup2\_devices
107
108 This enables `LXC` to make use of the new devices controller in the unified cgroup hierarchy. `LXC` will now create, load, and attach bpf program to the cgroup of the container when the controller is available.
109
110 ## cgroup2
111
112 This enables `LXC` to make complete use of the unified cgroup hierarchy. With this extension it is possible to run `LXC` containers on systems that use a pure unified cgroup layout.
113
114 ## init\_pidfd
115
116 This adds a new API function `init_pidfd()` which allows to retrieve a pidfd for the container's init process allowing process management interactions such as sending signal to be completely reliable and rac-e free.
117
118 ## pidfd
119
120 When running on kernels that support pidfds LXC will rely on them for most operations. This makes interacting with containers not just more reliable it also makes it significantly safer and eliminates various races inherent to PID-based kernel APIs. LXC will require that the running kernel at least support `pidfd_send_signal()`, `CLONE_PIDFD`, `P_PIDFD`, and pidfd polling support. Any kernel starting with `Linux 5.4` should have full support for pidfds.
121
122 ## cgroup\_advanced\_isolation
123
124 Privileged containers will usually be able to override the cgroup limits given to them. This introduces three new configuration keys `lxc.cgroup.dir.monitor`, `lxc.cgroup.dir.container`, and `lxc.cgroup.dir.container.inner`. The `lxc.cgroup.dir.monitor` and `lxc.cgroup.dir.container` keys can be used to set to place the `monitor` and the `container` into different cgroups. The `lxc.cgroup.dir.container.inner` key can be set to a cgroup that is concatenated with `lxc.cgroup.dir.container`. When `lxc.cgroup.dir.container.inner` is set the container will be placed into the `lxc.cgroup.dir.container.inner` cgroup but the limits will be set in the `lxc.cgroup.dir.container` cgroup. This way privileged containers cannot escape their cgroup limits.
125
126
127 ## time\_namespace
128
129 This adds time namespace support to LXC.
130
131 ## seccomp\_allow\_deny\_syntax
132
133 This adds the ability to use "denylist" and "allowlist" in seccomp v2 policies.
134
135 ## devpts\_fd
136
137 This adds the ability to allocate a file descriptor for the devpts instance of
138 the container.
139
140 ## seccomp\_notify\_fd\_active
141
142 Retrieve the seccomp notifier fd from a running container.
143
144 ## seccomp\_proxy\_send\_notify\_fd
145
146 Whether the seccomp notify proxy sends a long a notify fd file descriptor.