]>
Commit | Line | Data |
---|---|---|
17926a79 DH |
1 | ====================== |
2 | RxRPC NETWORK PROTOCOL | |
3 | ====================== | |
4 | ||
5 | The RxRPC protocol driver provides a reliable two-phase transport on top of UDP | |
6 | that can be used to perform RxRPC remote operations. This is done over sockets | |
7 | of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and | |
8 | receive data, aborts and errors. | |
9 | ||
10 | Contents of this document: | |
11 | ||
12 | (*) Overview. | |
13 | ||
14 | (*) RxRPC protocol summary. | |
15 | ||
16 | (*) AF_RXRPC driver model. | |
17 | ||
18 | (*) Control messages. | |
19 | ||
20 | (*) Socket options. | |
21 | ||
22 | (*) Security. | |
23 | ||
24 | (*) Example client usage. | |
25 | ||
26 | (*) Example server usage. | |
27 | ||
651350d1 DH |
28 | (*) AF_RXRPC kernel interface. |
29 | ||
17926a79 DH |
30 | |
31 | ======== | |
32 | OVERVIEW | |
33 | ======== | |
34 | ||
35 | RxRPC is a two-layer protocol. There is a session layer which provides | |
36 | reliable virtual connections using UDP over IPv4 (or IPv6) as the transport | |
37 | layer, but implements a real network protocol; and there's the presentation | |
38 | layer which renders structured data to binary blobs and back again using XDR | |
39 | (as does SunRPC): | |
40 | ||
41 | +-------------+ | |
42 | | Application | | |
43 | +-------------+ | |
44 | | XDR | Presentation | |
45 | +-------------+ | |
46 | | RxRPC | Session | |
47 | +-------------+ | |
48 | | UDP | Transport | |
49 | +-------------+ | |
50 | ||
51 | ||
52 | AF_RXRPC provides: | |
53 | ||
54 | (1) Part of an RxRPC facility for both kernel and userspace applications by | |
55 | making the session part of it a Linux network protocol (AF_RXRPC). | |
56 | ||
57 | (2) A two-phase protocol. The client transmits a blob (the request) and then | |
58 | receives a blob (the reply), and the server receives the request and then | |
59 | transmits the reply. | |
60 | ||
61 | (3) Retention of the reusable bits of the transport system set up for one call | |
62 | to speed up subsequent calls. | |
63 | ||
64 | (4) A secure protocol, using the Linux kernel's key retention facility to | |
65 | manage security on the client end. The server end must of necessity be | |
66 | more active in security negotiations. | |
67 | ||
68 | AF_RXRPC does not provide XDR marshalling/presentation facilities. That is | |
69 | left to the application. AF_RXRPC only deals in blobs. Even the operation ID | |
70 | is just the first four bytes of the request blob, and as such is beyond the | |
71 | kernel's interest. | |
72 | ||
73 | ||
74 | Sockets of AF_RXRPC family are: | |
75 | ||
76 | (1) created as type SOCK_DGRAM; | |
77 | ||
78 | (2) provided with a protocol of the type of underlying transport they're going | |
79 | to use - currently only PF_INET is supported. | |
80 | ||
81 | ||
82 | The Andrew File System (AFS) is an example of an application that uses this and | |
83 | that has both kernel (filesystem) and userspace (utility) components. | |
84 | ||
85 | ||
86 | ====================== | |
87 | RXRPC PROTOCOL SUMMARY | |
88 | ====================== | |
89 | ||
90 | An overview of the RxRPC protocol: | |
91 | ||
92 | (*) RxRPC sits on top of another networking protocol (UDP is the only option | |
93 | currently), and uses this to provide network transport. UDP ports, for | |
94 | example, provide transport endpoints. | |
95 | ||
96 | (*) RxRPC supports multiple virtual "connections" from any given transport | |
97 | endpoint, thus allowing the endpoints to be shared, even to the same | |
98 | remote endpoint. | |
99 | ||
100 | (*) Each connection goes to a particular "service". A connection may not go | |
101 | to multiple services. A service may be considered the RxRPC equivalent of | |
102 | a port number. AF_RXRPC permits multiple services to share an endpoint. | |
103 | ||
104 | (*) Client-originating packets are marked, thus a transport endpoint can be | |
105 | shared between client and server connections (connections have a | |
106 | direction). | |
107 | ||
108 | (*) Up to a billion connections may be supported concurrently between one | |
109 | local transport endpoint and one service on one remote endpoint. An RxRPC | |
110 | connection is described by seven numbers: | |
111 | ||
112 | Local address } | |
113 | Local port } Transport (UDP) address | |
114 | Remote address } | |
115 | Remote port } | |
116 | Direction | |
117 | Connection ID | |
118 | Service ID | |
119 | ||
120 | (*) Each RxRPC operation is a "call". A connection may make up to four | |
121 | billion calls, but only up to four calls may be in progress on a | |
122 | connection at any one time. | |
123 | ||
124 | (*) Calls are two-phase and asymmetric: the client sends its request data, | |
125 | which the service receives; then the service transmits the reply data | |
126 | which the client receives. | |
127 | ||
128 | (*) The data blobs are of indefinite size, the end of a phase is marked with a | |
129 | flag in the packet. The number of packets of data making up one blob may | |
130 | not exceed 4 billion, however, as this would cause the sequence number to | |
131 | wrap. | |
132 | ||
133 | (*) The first four bytes of the request data are the service operation ID. | |
134 | ||
135 | (*) Security is negotiated on a per-connection basis. The connection is | |
136 | initiated by the first data packet on it arriving. If security is | |
137 | requested, the server then issues a "challenge" and then the client | |
138 | replies with a "response". If the response is successful, the security is | |
139 | set for the lifetime of that connection, and all subsequent calls made | |
140 | upon it use that same security. In the event that the server lets a | |
141 | connection lapse before the client, the security will be renegotiated if | |
142 | the client uses the connection again. | |
143 | ||
144 | (*) Calls use ACK packets to handle reliability. Data packets are also | |
145 | explicitly sequenced per call. | |
146 | ||
147 | (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs. | |
148 | A hard-ACK indicates to the far side that all the data received to a point | |
149 | has been received and processed; a soft-ACK indicates that the data has | |
150 | been received but may yet be discarded and re-requested. The sender may | |
151 | not discard any transmittable packets until they've been hard-ACK'd. | |
152 | ||
153 | (*) Reception of a reply data packet implicitly hard-ACK's all the data | |
154 | packets that make up the request. | |
155 | ||
156 | (*) An call is complete when the request has been sent, the reply has been | |
157 | received and the final hard-ACK on the last packet of the reply has | |
158 | reached the server. | |
159 | ||
160 | (*) An call may be aborted by either end at any time up to its completion. | |
161 | ||
162 | ||
163 | ===================== | |
164 | AF_RXRPC DRIVER MODEL | |
165 | ===================== | |
166 | ||
167 | About the AF_RXRPC driver: | |
168 | ||
169 | (*) The AF_RXRPC protocol transparently uses internal sockets of the transport | |
170 | protocol to represent transport endpoints. | |
171 | ||
172 | (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC | |
173 | connections are handled transparently. One client socket may be used to | |
174 | make multiple simultaneous calls to the same service. One server socket | |
175 | may handle calls from many clients. | |
176 | ||
177 | (*) Additional parallel client connections will be initiated to support extra | |
178 | concurrent calls, up to a tunable limit. | |
179 | ||
180 | (*) Each connection is retained for a certain amount of time [tunable] after | |
181 | the last call currently using it has completed in case a new call is made | |
182 | that could reuse it. | |
183 | ||
184 | (*) Each internal UDP socket is retained [tunable] for a certain amount of | |
185 | time [tunable] after the last connection using it discarded, in case a new | |
186 | connection is made that could use it. | |
187 | ||
188 | (*) A client-side connection is only shared between calls if they have have | |
189 | the same key struct describing their security (and assuming the calls | |
190 | would otherwise share the connection). Non-secured calls would also be | |
191 | able to share connections with each other. | |
192 | ||
193 | (*) A server-side connection is shared if the client says it is. | |
194 | ||
195 | (*) ACK'ing is handled by the protocol driver automatically, including ping | |
196 | replying. | |
197 | ||
198 | (*) SO_KEEPALIVE automatically pings the other side to keep the connection | |
199 | alive [TODO]. | |
200 | ||
201 | (*) If an ICMP error is received, all calls affected by that error will be | |
202 | aborted with an appropriate network error passed through recvmsg(). | |
203 | ||
204 | ||
205 | Interaction with the user of the RxRPC socket: | |
206 | ||
207 | (*) A socket is made into a server socket by binding an address with a | |
208 | non-zero service ID. | |
209 | ||
210 | (*) In the client, sending a request is achieved with one or more sendmsgs, | |
211 | followed by the reply being received with one or more recvmsgs. | |
212 | ||
213 | (*) The first sendmsg for a request to be sent from a client contains a tag to | |
214 | be used in all other sendmsgs or recvmsgs associated with that call. The | |
215 | tag is carried in the control data. | |
216 | ||
217 | (*) connect() is used to supply a default destination address for a client | |
218 | socket. This may be overridden by supplying an alternate address to the | |
219 | first sendmsg() of a call (struct msghdr::msg_name). | |
220 | ||
221 | (*) If connect() is called on an unbound client, a random local port will | |
222 | bound before the operation takes place. | |
223 | ||
224 | (*) A server socket may also be used to make client calls. To do this, the | |
225 | first sendmsg() of the call must specify the target address. The server's | |
226 | transport endpoint is used to send the packets. | |
227 | ||
228 | (*) Once the application has received the last message associated with a call, | |
229 | the tag is guaranteed not to be seen again, and so it can be used to pin | |
230 | client resources. A new call can then be initiated with the same tag | |
231 | without fear of interference. | |
232 | ||
233 | (*) In the server, a request is received with one or more recvmsgs, then the | |
234 | the reply is transmitted with one or more sendmsgs, and then the final ACK | |
235 | is received with a last recvmsg. | |
236 | ||
237 | (*) When sending data for a call, sendmsg is given MSG_MORE if there's more | |
238 | data to come on that call. | |
239 | ||
240 | (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more | |
241 | data to come for that call. | |
242 | ||
243 | (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg | |
244 | to indicate the terminal message for that call. | |
245 | ||
246 | (*) A call may be aborted by adding an abort control message to the control | |
247 | data. Issuing an abort terminates the kernel's use of that call's tag. | |
248 | Any messages waiting in the receive queue for that call will be discarded. | |
249 | ||
250 | (*) Aborts, busy notifications and challenge packets are delivered by recvmsg, | |
251 | and control data messages will be set to indicate the context. Receiving | |
252 | an abort or a busy message terminates the kernel's use of that call's tag. | |
253 | ||
254 | (*) The control data part of the msghdr struct is used for a number of things: | |
255 | ||
256 | (*) The tag of the intended or affected call. | |
257 | ||
258 | (*) Sending or receiving errors, aborts and busy notifications. | |
259 | ||
260 | (*) Notifications of incoming calls. | |
261 | ||
262 | (*) Sending debug requests and receiving debug replies [TODO]. | |
263 | ||
264 | (*) When the kernel has received and set up an incoming call, it sends a | |
265 | message to server application to let it know there's a new call awaiting | |
266 | its acceptance [recvmsg reports a special control message]. The server | |
267 | application then uses sendmsg to assign a tag to the new call. Once that | |
268 | is done, the first part of the request data will be delivered by recvmsg. | |
269 | ||
270 | (*) The server application has to provide the server socket with a keyring of | |
271 | secret keys corresponding to the security types it permits. When a secure | |
272 | connection is being set up, the kernel looks up the appropriate secret key | |
273 | in the keyring and then sends a challenge packet to the client and | |
274 | receives a response packet. The kernel then checks the authorisation of | |
275 | the packet and either aborts the connection or sets up the security. | |
276 | ||
277 | (*) The name of the key a client will use to secure its communications is | |
278 | nominated by a socket option. | |
279 | ||
280 | ||
281 | Notes on recvmsg: | |
282 | ||
283 | (*) If there's a sequence of data messages belonging to a particular call on | |
284 | the receive queue, then recvmsg will keep working through them until: | |
285 | ||
286 | (a) it meets the end of that call's received data, | |
287 | ||
288 | (b) it meets a non-data message, | |
289 | ||
290 | (c) it meets a message belonging to a different call, or | |
291 | ||
292 | (d) it fills the user buffer. | |
293 | ||
294 | If recvmsg is called in blocking mode, it will keep sleeping, awaiting the | |
295 | reception of further data, until one of the above four conditions is met. | |
296 | ||
297 | (2) MSG_PEEK operates similarly, but will return immediately if it has put any | |
298 | data in the buffer rather than sleeping until it can fill the buffer. | |
299 | ||
300 | (3) If a data message is only partially consumed in filling a user buffer, | |
301 | then the remainder of that message will be left on the front of the queue | |
302 | for the next taker. MSG_TRUNC will never be flagged. | |
303 | ||
304 | (4) If there is more data to be had on a call (it hasn't copied the last byte | |
305 | of the last data message in that phase yet), then MSG_MORE will be | |
306 | flagged. | |
307 | ||
308 | ||
309 | ================ | |
310 | CONTROL MESSAGES | |
311 | ================ | |
312 | ||
313 | AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex | |
314 | calls, to invoke certain actions and to report certain conditions. These are: | |
315 | ||
316 | MESSAGE ID SRT DATA MEANING | |
317 | ======================= === =========== =============================== | |
318 | RXRPC_USER_CALL_ID sr- User ID App's call specifier | |
319 | RXRPC_ABORT srt Abort code Abort code to issue/received | |
320 | RXRPC_ACK -rt n/a Final ACK received | |
321 | RXRPC_NET_ERROR -rt error num Network error on call | |
322 | RXRPC_BUSY -rt n/a Call rejected (server busy) | |
323 | RXRPC_LOCAL_ERROR -rt error num Local error encountered | |
324 | RXRPC_NEW_CALL -r- n/a New call received | |
325 | RXRPC_ACCEPT s-- n/a Accept new call | |
326 | ||
327 | (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message) | |
328 | ||
329 | (*) RXRPC_USER_CALL_ID | |
330 | ||
331 | This is used to indicate the application's call ID. It's an unsigned long | |
332 | that the app specifies in the client by attaching it to the first data | |
333 | message or in the server by passing it in association with an RXRPC_ACCEPT | |
334 | message. recvmsg() passes it in conjunction with all messages except | |
335 | those of the RXRPC_NEW_CALL message. | |
336 | ||
337 | (*) RXRPC_ABORT | |
338 | ||
339 | This is can be used by an application to abort a call by passing it to | |
340 | sendmsg, or it can be delivered by recvmsg to indicate a remote abort was | |
341 | received. Either way, it must be associated with an RXRPC_USER_CALL_ID to | |
342 | specify the call affected. If an abort is being sent, then error EBADSLT | |
343 | will be returned if there is no call with that user ID. | |
344 | ||
345 | (*) RXRPC_ACK | |
346 | ||
347 | This is delivered to a server application to indicate that the final ACK | |
348 | of a call was received from the client. It will be associated with an | |
349 | RXRPC_USER_CALL_ID to indicate the call that's now complete. | |
350 | ||
351 | (*) RXRPC_NET_ERROR | |
352 | ||
353 | This is delivered to an application to indicate that an ICMP error message | |
354 | was encountered in the process of trying to talk to the peer. An | |
355 | errno-class integer value will be included in the control message data | |
356 | indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call | |
357 | affected. | |
358 | ||
359 | (*) RXRPC_BUSY | |
360 | ||
361 | This is delivered to a client application to indicate that a call was | |
362 | rejected by the server due to the server being busy. It will be | |
363 | associated with an RXRPC_USER_CALL_ID to indicate the rejected call. | |
364 | ||
365 | (*) RXRPC_LOCAL_ERROR | |
366 | ||
367 | This is delivered to an application to indicate that a local error was | |
368 | encountered and that a call has been aborted because of it. An | |
369 | errno-class integer value will be included in the control message data | |
370 | indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call | |
371 | affected. | |
372 | ||
373 | (*) RXRPC_NEW_CALL | |
374 | ||
375 | This is delivered to indicate to a server application that a new call has | |
376 | arrived and is awaiting acceptance. No user ID is associated with this, | |
377 | as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT. | |
378 | ||
379 | (*) RXRPC_ACCEPT | |
380 | ||
381 | This is used by a server application to attempt to accept a call and | |
382 | assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID | |
383 | to indicate the user ID to be assigned. If there is no call to be | |
384 | accepted (it may have timed out, been aborted, etc.), then sendmsg will | |
385 | return error ENODATA. If the user ID is already in use by another call, | |
386 | then error EBADSLT will be returned. | |
387 | ||
388 | ||
389 | ============== | |
390 | SOCKET OPTIONS | |
391 | ============== | |
392 | ||
393 | AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: | |
394 | ||
395 | (*) RXRPC_SECURITY_KEY | |
396 | ||
397 | This is used to specify the description of the key to be used. The key is | |
398 | extracted from the calling process's keyrings with request_key() and | |
399 | should be of "rxrpc" type. | |
400 | ||
401 | The optval pointer points to the description string, and optlen indicates | |
402 | how long the string is, without the NUL terminator. | |
403 | ||
404 | (*) RXRPC_SECURITY_KEYRING | |
405 | ||
406 | Similar to above but specifies a keyring of server secret keys to use (key | |
407 | type "keyring"). See the "Security" section. | |
408 | ||
409 | (*) RXRPC_EXCLUSIVE_CONNECTION | |
410 | ||
411 | This is used to request that new connections should be used for each call | |
412 | made subsequently on this socket. optval should be NULL and optlen 0. | |
413 | ||
414 | (*) RXRPC_MIN_SECURITY_LEVEL | |
415 | ||
416 | This is used to specify the minimum security level required for calls on | |
417 | this socket. optval must point to an int containing one of the following | |
418 | values: | |
419 | ||
420 | (a) RXRPC_SECURITY_PLAIN | |
421 | ||
422 | Encrypted checksum only. | |
423 | ||
424 | (b) RXRPC_SECURITY_AUTH | |
425 | ||
426 | Encrypted checksum plus packet padded and first eight bytes of packet | |
427 | encrypted - which includes the actual packet length. | |
428 | ||
429 | (c) RXRPC_SECURITY_ENCRYPTED | |
430 | ||
431 | Encrypted checksum plus entire packet padded and encrypted, including | |
432 | actual packet length. | |
433 | ||
434 | ||
435 | ======== | |
436 | SECURITY | |
437 | ======== | |
438 | ||
439 | Currently, only the kerberos 4 equivalent protocol has been implemented | |
440 | (security index 2 - rxkad). This requires the rxkad module to be loaded and, | |
441 | on the client, tickets of the appropriate type to be obtained from the AFS | |
442 | kaserver or the kerberos server and installed as "rxrpc" type keys. This is | |
443 | normally done using the klog program. An example simple klog program can be | |
444 | found at: | |
445 | ||
446 | http://people.redhat.com/~dhowells/rxrpc/klog.c | |
447 | ||
448 | The payload provided to add_key() on the client should be of the following | |
449 | form: | |
450 | ||
451 | struct rxrpc_key_sec2_v1 { | |
452 | uint16_t security_index; /* 2 */ | |
453 | uint16_t ticket_length; /* length of ticket[] */ | |
454 | uint32_t expiry; /* time at which expires */ | |
455 | uint8_t kvno; /* key version number */ | |
456 | uint8_t __pad[3]; | |
457 | uint8_t session_key[8]; /* DES session key */ | |
458 | uint8_t ticket[0]; /* the encrypted ticket */ | |
459 | }; | |
460 | ||
461 | Where the ticket blob is just appended to the above structure. | |
462 | ||
463 | ||
464 | For the server, keys of type "rxrpc_s" must be made available to the server. | |
465 | They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an | |
466 | rxkad key for the AFS VL service). When such a key is created, it should be | |
467 | given the server's secret key as the instantiation data (see the example | |
468 | below). | |
469 | ||
470 | add_key("rxrpc_s", "52:2", secret_key, 8, keyring); | |
471 | ||
472 | A keyring is passed to the server socket by naming it in a sockopt. The server | |
473 | socket then looks the server secret keys up in this keyring when secure | |
474 | incoming connections are made. This can be seen in an example program that can | |
475 | be found at: | |
476 | ||
477 | http://people.redhat.com/~dhowells/rxrpc/listen.c | |
478 | ||
479 | ||
480 | ==================== | |
481 | EXAMPLE CLIENT USAGE | |
482 | ==================== | |
483 | ||
484 | A client would issue an operation by: | |
485 | ||
486 | (1) An RxRPC socket is set up by: | |
487 | ||
488 | client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); | |
489 | ||
490 | Where the third parameter indicates the protocol family of the transport | |
491 | socket used - usually IPv4 but it can also be IPv6 [TODO]. | |
492 | ||
493 | (2) A local address can optionally be bound: | |
494 | ||
495 | struct sockaddr_rxrpc srx = { | |
496 | .srx_family = AF_RXRPC, | |
497 | .srx_service = 0, /* we're a client */ | |
498 | .transport_type = SOCK_DGRAM, /* type of transport socket */ | |
499 | .transport.sin_family = AF_INET, | |
500 | .transport.sin_port = htons(7000), /* AFS callback */ | |
501 | .transport.sin_address = 0, /* all local interfaces */ | |
502 | }; | |
503 | bind(client, &srx, sizeof(srx)); | |
504 | ||
505 | This specifies the local UDP port to be used. If not given, a random | |
506 | non-privileged port will be used. A UDP port may be shared between | |
507 | several unrelated RxRPC sockets. Security is handled on a basis of | |
508 | per-RxRPC virtual connection. | |
509 | ||
510 | (3) The security is set: | |
511 | ||
512 | const char *key = "AFS:cambridge.redhat.com"; | |
513 | setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key)); | |
514 | ||
515 | This issues a request_key() to get the key representing the security | |
516 | context. The minimum security level can be set: | |
517 | ||
518 | unsigned int sec = RXRPC_SECURITY_ENCRYPTED; | |
519 | setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL, | |
520 | &sec, sizeof(sec)); | |
521 | ||
522 | (4) The server to be contacted can then be specified (alternatively this can | |
523 | be done through sendmsg): | |
524 | ||
525 | struct sockaddr_rxrpc srx = { | |
526 | .srx_family = AF_RXRPC, | |
527 | .srx_service = VL_SERVICE_ID, | |
528 | .transport_type = SOCK_DGRAM, /* type of transport socket */ | |
529 | .transport.sin_family = AF_INET, | |
530 | .transport.sin_port = htons(7005), /* AFS volume manager */ | |
531 | .transport.sin_address = ..., | |
532 | }; | |
533 | connect(client, &srx, sizeof(srx)); | |
534 | ||
535 | (5) The request data should then be posted to the server socket using a series | |
536 | of sendmsg() calls, each with the following control message attached: | |
537 | ||
538 | RXRPC_USER_CALL_ID - specifies the user ID for this call | |
539 | ||
540 | MSG_MORE should be set in msghdr::msg_flags on all but the last part of | |
541 | the request. Multiple requests may be made simultaneously. | |
542 | ||
025dfdaf | 543 | If a call is intended to go to a destination other than the default |
17926a79 DH |
544 | specified through connect(), then msghdr::msg_name should be set on the |
545 | first request message of that call. | |
546 | ||
547 | (6) The reply data will then be posted to the server socket for recvmsg() to | |
548 | pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data | |
549 | for a particular call to be read. MSG_EOR will be set on the terminal | |
550 | read for a call. | |
551 | ||
552 | All data will be delivered with the following control message attached: | |
553 | ||
554 | RXRPC_USER_CALL_ID - specifies the user ID for this call | |
555 | ||
556 | If an abort or error occurred, this will be returned in the control data | |
557 | buffer instead, and MSG_EOR will be flagged to indicate the end of that | |
558 | call. | |
559 | ||
560 | ||
561 | ==================== | |
562 | EXAMPLE SERVER USAGE | |
563 | ==================== | |
564 | ||
565 | A server would be set up to accept operations in the following manner: | |
566 | ||
567 | (1) An RxRPC socket is created by: | |
568 | ||
569 | server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); | |
570 | ||
571 | Where the third parameter indicates the address type of the transport | |
572 | socket used - usually IPv4. | |
573 | ||
574 | (2) Security is set up if desired by giving the socket a keyring with server | |
575 | secret keys in it: | |
576 | ||
577 | keyring = add_key("keyring", "AFSkeys", NULL, 0, | |
578 | KEY_SPEC_PROCESS_KEYRING); | |
579 | ||
580 | const char secret_key[8] = { | |
581 | 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 }; | |
582 | add_key("rxrpc_s", "52:2", secret_key, 8, keyring); | |
583 | ||
584 | setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7); | |
585 | ||
586 | The keyring can be manipulated after it has been given to the socket. This | |
587 | permits the server to add more keys, replace keys, etc. whilst it is live. | |
588 | ||
589 | (2) A local address must then be bound: | |
590 | ||
591 | struct sockaddr_rxrpc srx = { | |
592 | .srx_family = AF_RXRPC, | |
593 | .srx_service = VL_SERVICE_ID, /* RxRPC service ID */ | |
594 | .transport_type = SOCK_DGRAM, /* type of transport socket */ | |
595 | .transport.sin_family = AF_INET, | |
596 | .transport.sin_port = htons(7000), /* AFS callback */ | |
597 | .transport.sin_address = 0, /* all local interfaces */ | |
598 | }; | |
599 | bind(server, &srx, sizeof(srx)); | |
600 | ||
601 | (3) The server is then set to listen out for incoming calls: | |
602 | ||
603 | listen(server, 100); | |
604 | ||
605 | (4) The kernel notifies the server of pending incoming connections by sending | |
606 | it a message for each. This is received with recvmsg() on the server | |
607 | socket. It has no data, and has a single dataless control message | |
608 | attached: | |
609 | ||
610 | RXRPC_NEW_CALL | |
611 | ||
612 | The address that can be passed back by recvmsg() at this point should be | |
613 | ignored since the call for which the message was posted may have gone by | |
614 | the time it is accepted - in which case the first call still on the queue | |
615 | will be accepted. | |
616 | ||
617 | (5) The server then accepts the new call by issuing a sendmsg() with two | |
618 | pieces of control data and no actual data: | |
619 | ||
620 | RXRPC_ACCEPT - indicate connection acceptance | |
621 | RXRPC_USER_CALL_ID - specify user ID for this call | |
622 | ||
623 | (6) The first request data packet will then be posted to the server socket for | |
624 | recvmsg() to pick up. At that point, the RxRPC address for the call can | |
625 | be read from the address fields in the msghdr struct. | |
626 | ||
627 | Subsequent request data will be posted to the server socket for recvmsg() | |
628 | to collect as it arrives. All but the last piece of the request data will | |
629 | be delivered with MSG_MORE flagged. | |
630 | ||
631 | All data will be delivered with the following control message attached: | |
632 | ||
633 | RXRPC_USER_CALL_ID - specifies the user ID for this call | |
634 | ||
635 | (8) The reply data should then be posted to the server socket using a series | |
636 | of sendmsg() calls, each with the following control messages attached: | |
637 | ||
638 | RXRPC_USER_CALL_ID - specifies the user ID for this call | |
639 | ||
640 | MSG_MORE should be set in msghdr::msg_flags on all but the last message | |
641 | for a particular call. | |
642 | ||
643 | (9) The final ACK from the client will be posted for retrieval by recvmsg() | |
644 | when it is received. It will take the form of a dataless message with two | |
645 | control messages attached: | |
646 | ||
647 | RXRPC_USER_CALL_ID - specifies the user ID for this call | |
648 | RXRPC_ACK - indicates final ACK (no data) | |
649 | ||
650 | MSG_EOR will be flagged to indicate that this is the final message for | |
651 | this call. | |
652 | ||
653 | (10) Up to the point the final packet of reply data is sent, the call can be | |
654 | aborted by calling sendmsg() with a dataless message with the following | |
655 | control messages attached: | |
656 | ||
657 | RXRPC_USER_CALL_ID - specifies the user ID for this call | |
658 | RXRPC_ABORT - indicates abort code (4 byte data) | |
659 | ||
660 | Any packets waiting in the socket's receive queue will be discarded if | |
661 | this is issued. | |
662 | ||
663 | Note that all the communications for a particular service take place through | |
664 | the one server socket, using control messages on sendmsg() and recvmsg() to | |
665 | determine the call affected. | |
651350d1 DH |
666 | |
667 | ||
668 | ========================= | |
669 | AF_RXRPC KERNEL INTERFACE | |
670 | ========================= | |
671 | ||
672 | The AF_RXRPC module also provides an interface for use by in-kernel utilities | |
673 | such as the AFS filesystem. This permits such a utility to: | |
674 | ||
675 | (1) Use different keys directly on individual client calls on one socket | |
676 | rather than having to open a whole slew of sockets, one for each key it | |
677 | might want to use. | |
678 | ||
679 | (2) Avoid having RxRPC call request_key() at the point of issue of a call or | |
680 | opening of a socket. Instead the utility is responsible for requesting a | |
681 | key at the appropriate point. AFS, for instance, would do this during VFS | |
682 | operations such as open() or unlink(). The key is then handed through | |
683 | when the call is initiated. | |
684 | ||
685 | (3) Request the use of something other than GFP_KERNEL to allocate memory. | |
686 | ||
687 | (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be | |
688 | intercepted before they get put into the socket Rx queue and the socket | |
689 | buffers manipulated directly. | |
690 | ||
691 | To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket, | |
01dd2fbf | 692 | bind an address as appropriate and listen if it's to be a server socket, but |
651350d1 DH |
693 | then it passes this to the kernel interface functions. |
694 | ||
695 | The kernel interface functions are as follows: | |
696 | ||
697 | (*) Begin a new client call. | |
698 | ||
699 | struct rxrpc_call * | |
700 | rxrpc_kernel_begin_call(struct socket *sock, | |
701 | struct sockaddr_rxrpc *srx, | |
702 | struct key *key, | |
703 | unsigned long user_call_ID, | |
704 | gfp_t gfp); | |
705 | ||
706 | This allocates the infrastructure to make a new RxRPC call and assigns | |
707 | call and connection numbers. The call will be made on the UDP port that | |
708 | the socket is bound to. The call will go to the destination address of a | |
709 | connected client socket unless an alternative is supplied (srx is | |
710 | non-NULL). | |
711 | ||
712 | If a key is supplied then this will be used to secure the call instead of | |
713 | the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls | |
714 | secured in this way will still share connections if at all possible. | |
715 | ||
716 | The user_call_ID is equivalent to that supplied to sendmsg() in the | |
717 | control data buffer. It is entirely feasible to use this to point to a | |
718 | kernel data structure. | |
719 | ||
720 | If this function is successful, an opaque reference to the RxRPC call is | |
721 | returned. The caller now holds a reference on this and it must be | |
722 | properly ended. | |
723 | ||
724 | (*) End a client call. | |
725 | ||
726 | void rxrpc_kernel_end_call(struct rxrpc_call *call); | |
727 | ||
728 | This is used to end a previously begun call. The user_call_ID is expunged | |
729 | from AF_RXRPC's knowledge and will not be seen again in association with | |
730 | the specified call. | |
731 | ||
732 | (*) Send data through a call. | |
733 | ||
734 | int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg, | |
735 | size_t len); | |
736 | ||
737 | This is used to supply either the request part of a client call or the | |
738 | reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the | |
739 | data buffers to be used. msg_iov may not be NULL and must point | |
740 | exclusively to in-kernel virtual addresses. msg.msg_flags may be given | |
741 | MSG_MORE if there will be subsequent data sends for this call. | |
742 | ||
743 | The msg must not specify a destination address, control data or any flags | |
744 | other than MSG_MORE. len is the total amount of data to transmit. | |
745 | ||
746 | (*) Abort a call. | |
747 | ||
748 | void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code); | |
749 | ||
750 | This is used to abort a call if it's still in an abortable state. The | |
751 | abort code specified will be placed in the ABORT message sent. | |
752 | ||
753 | (*) Intercept received RxRPC messages. | |
754 | ||
755 | typedef void (*rxrpc_interceptor_t)(struct sock *sk, | |
756 | unsigned long user_call_ID, | |
757 | struct sk_buff *skb); | |
758 | ||
759 | void | |
760 | rxrpc_kernel_intercept_rx_messages(struct socket *sock, | |
761 | rxrpc_interceptor_t interceptor); | |
762 | ||
763 | This installs an interceptor function on the specified AF_RXRPC socket. | |
764 | All messages that would otherwise wind up in the socket's Rx queue are | |
765 | then diverted to this function. Note that care must be taken to process | |
766 | the messages in the right order to maintain DATA message sequentiality. | |
767 | ||
768 | The interceptor function itself is provided with the address of the socket | |
769 | and handling the incoming message, the ID assigned by the kernel utility | |
770 | to the call and the socket buffer containing the message. | |
771 | ||
772 | The skb->mark field indicates the type of message: | |
773 | ||
774 | MARK MEANING | |
775 | =============================== ======================================= | |
776 | RXRPC_SKB_MARK_DATA Data message | |
777 | RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call | |
778 | RXRPC_SKB_MARK_BUSY Client call rejected as server busy | |
779 | RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer | |
780 | RXRPC_SKB_MARK_NET_ERROR Network error detected | |
781 | RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered | |
782 | RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance | |
783 | ||
784 | The remote abort message can be probed with rxrpc_kernel_get_abort_code(). | |
785 | The two error messages can be probed with rxrpc_kernel_get_error_number(). | |
786 | A new call can be accepted with rxrpc_kernel_accept_call(). | |
787 | ||
788 | Data messages can have their contents extracted with the usual bunch of | |
789 | socket buffer manipulation functions. A data message can be determined to | |
790 | be the last one in a sequence with rxrpc_kernel_is_data_last(). When a | |
791 | data message has been used up, rxrpc_kernel_data_delivered() should be | |
792 | called on it.. | |
793 | ||
794 | Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose | |
795 | of. It is possible to get extra refs on all types of message for later | |
796 | freeing, but this may pin the state of a call until the message is finally | |
797 | freed. | |
798 | ||
799 | (*) Accept an incoming call. | |
800 | ||
801 | struct rxrpc_call * | |
802 | rxrpc_kernel_accept_call(struct socket *sock, | |
803 | unsigned long user_call_ID); | |
804 | ||
805 | This is used to accept an incoming call and to assign it a call ID. This | |
806 | function is similar to rxrpc_kernel_begin_call() and calls accepted must | |
807 | be ended in the same way. | |
808 | ||
809 | If this function is successful, an opaque reference to the RxRPC call is | |
810 | returned. The caller now holds a reference on this and it must be | |
811 | properly ended. | |
812 | ||
813 | (*) Reject an incoming call. | |
814 | ||
815 | int rxrpc_kernel_reject_call(struct socket *sock); | |
816 | ||
817 | This is used to reject the first incoming call on the socket's queue with | |
818 | a BUSY message. -ENODATA is returned if there were no incoming calls. | |
819 | Other errors may be returned if the call had been aborted (-ECONNABORTED) | |
820 | or had timed out (-ETIME). | |
821 | ||
822 | (*) Record the delivery of a data message and free it. | |
823 | ||
824 | void rxrpc_kernel_data_delivered(struct sk_buff *skb); | |
825 | ||
826 | This is used to record a data message as having been delivered and to | |
827 | update the ACK state for the call. The socket buffer will be freed. | |
828 | ||
829 | (*) Free a message. | |
830 | ||
831 | void rxrpc_kernel_free_skb(struct sk_buff *skb); | |
832 | ||
833 | This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC | |
834 | socket. | |
835 | ||
836 | (*) Determine if a data message is the last one on a call. | |
837 | ||
838 | bool rxrpc_kernel_is_data_last(struct sk_buff *skb); | |
839 | ||
840 | This is used to determine if a socket buffer holds the last data message | |
841 | to be received for a call (true will be returned if it does, false | |
842 | if not). | |
843 | ||
844 | The data message will be part of the reply on a client call and the | |
845 | request on an incoming call. In the latter case there will be more | |
846 | messages, but in the former case there will not. | |
847 | ||
848 | (*) Get the abort code from an abort message. | |
849 | ||
850 | u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb); | |
851 | ||
852 | This is used to extract the abort code from a remote abort message. | |
853 | ||
854 | (*) Get the error number from a local or network error message. | |
855 | ||
856 | int rxrpc_kernel_get_error_number(struct sk_buff *skb); | |
857 | ||
858 | This is used to extract the error number from a message indicating either | |
859 | a local error occurred or a network error occurred. | |
76181c13 DH |
860 | |
861 | (*) Allocate a null key for doing anonymous security. | |
862 | ||
863 | struct key *rxrpc_get_null_key(const char *keyname); | |
864 | ||
865 | This is used to allocate a null RxRPC key that can be used to indicate | |
866 | anonymous security for a particular domain. |