]>
Commit | Line | Data |
---|---|---|
7576b2b9 | 1 | ======================================= |
4047f8b1 | 2 | The padata parallel execution mechanism |
7576b2b9 MCC |
3 | ======================================= |
4 | ||
5 | :Last updated: for 2.6.36 | |
4047f8b1 JC |
6 | |
7 | Padata is a mechanism by which the kernel can farm work out to be done in | |
8 | parallel on multiple CPUs while retaining the ordering of tasks. It was | |
9 | developed for use with the IPsec code, which needs to be able to perform | |
10 | encryption and decryption on large numbers of packets without reordering | |
11 | those packets. The crypto developers made a point of writing padata in a | |
12 | sufficiently general fashion that it could be put to other uses as well. | |
13 | ||
14 | The first step in using padata is to set up a padata_instance structure for | |
7576b2b9 | 15 | overall control of how tasks are to be run:: |
4047f8b1 JC |
16 | |
17 | #include <linux/padata.h> | |
18 | ||
b128a304 | 19 | struct padata_instance *padata_alloc(const char *name, |
313910d3 SK |
20 | const struct cpumask *pcpumask, |
21 | const struct cpumask *cbcpumask); | |
4047f8b1 | 22 | |
b128a304 DJ |
23 | 'name' simply identifies the instance. |
24 | ||
313910d3 SK |
25 | The pcpumask describes which processors will be used to execute work |
26 | submitted to this instance in parallel. The cbcpumask defines which | |
2b24706a | 27 | processors are allowed to be used as the serialization callback processor. |
313910d3 SK |
28 | The workqueue wq is where the work will actually be done; it should be |
29 | a multithreaded queue, naturally. | |
30 | ||
31 | To allocate a padata instance with the cpu_possible_mask for both | |
7576b2b9 | 32 | cpumasks this helper function can be used:: |
313910d3 SK |
33 | |
34 | struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq); | |
35 | ||
36 | Note: Padata maintains two kinds of cpumasks internally. The user supplied | |
37 | cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable' | |
2b24706a RD |
38 | cpumasks. The usable cpumasks are always a subset of active CPUs in the |
39 | user supplied cpumasks; these are the cpumasks padata actually uses. So | |
40 | it is legal to supply a cpumask to padata that contains offline CPUs. | |
41 | Once an offline CPU in the user supplied cpumask comes online, padata | |
313910d3 | 42 | is going to use it. |
4047f8b1 | 43 | |
7576b2b9 | 44 | There are functions for enabling and disabling the instance:: |
4047f8b1 | 45 | |
2197f9a1 | 46 | int padata_start(struct padata_instance *pinst); |
4047f8b1 JC |
47 | void padata_stop(struct padata_instance *pinst); |
48 | ||
2197f9a1 SK |
49 | These functions are setting or clearing the "PADATA_INIT" flag; |
50 | if that flag is not set, other functions will refuse to work. | |
51 | padata_start returns zero on success (flag set) or -EINVAL if the | |
2b24706a | 52 | padata cpumask contains no active CPU (flag not set). |
2197f9a1 SK |
53 | padata_stop clears the flag and blocks until the padata instance |
54 | is unused. | |
4047f8b1 | 55 | |
7576b2b9 | 56 | The list of CPUs to be used can be adjusted with these functions:: |
4047f8b1 | 57 | |
313910d3 SK |
58 | int padata_set_cpumasks(struct padata_instance *pinst, |
59 | cpumask_var_t pcpumask, | |
60 | cpumask_var_t cbcpumask); | |
61 | int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, | |
4047f8b1 | 62 | cpumask_var_t cpumask); |
313910d3 SK |
63 | int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask); |
64 | int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask); | |
65 | ||
66 | Changing the CPU masks are expensive operations, though, so it should not be | |
67 | done with great frequency. | |
68 | ||
69 | It's possible to change both cpumasks of a padata instance with | |
70 | padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask) | |
2b24706a | 71 | and for the serial callback function (cbcpumask). padata_set_cpumask is used to |
313910d3 SK |
72 | change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL, |
73 | PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use. | |
2b24706a RD |
74 | To simply add or remove one CPU from a certain cpumask the functions |
75 | padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or | |
313910d3 SK |
76 | remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL. |
77 | ||
4047f8b1 | 78 | Actually submitting work to the padata instance requires the creation of a |
7576b2b9 | 79 | padata_priv structure:: |
4047f8b1 JC |
80 | |
81 | struct padata_priv { | |
82 | /* Other stuff here... */ | |
83 | void (*parallel)(struct padata_priv *padata); | |
84 | void (*serial)(struct padata_priv *padata); | |
85 | }; | |
86 | ||
87 | This structure will almost certainly be embedded within some larger | |
2b24706a | 88 | structure specific to the work to be done. Most of its fields are private to |
313910d3 | 89 | padata, but the structure should be zeroed at initialisation time, and the |
4047f8b1 JC |
90 | parallel() and serial() functions should be provided. Those functions will |
91 | be called in the process of getting the work done as we will see | |
92 | momentarily. | |
93 | ||
7576b2b9 | 94 | The submission of work is done with:: |
4047f8b1 JC |
95 | |
96 | int padata_do_parallel(struct padata_instance *pinst, | |
97 | struct padata_priv *padata, int cb_cpu); | |
98 | ||
99 | The pinst and padata structures must be set up as described above; cb_cpu | |
100 | specifies which CPU will be used for the final callback when the work is | |
101 | done; it must be in the current instance's CPU mask. The return value from | |
2197f9a1 SK |
102 | padata_do_parallel() is zero on success, indicating that the work is in |
103 | progress. -EBUSY means that somebody, somewhere else is messing with the | |
104 | instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being | |
105 | in that CPU mask or about a not running instance. | |
4047f8b1 JC |
106 | |
107 | Each task submitted to padata_do_parallel() will, in turn, be passed to | |
108 | exactly one call to the above-mentioned parallel() function, on one CPU, so | |
b128a304 | 109 | true parallelism is achieved by submitting multiple tasks. parallel() runs with |
4047f8b1 JC |
110 | software interrupts disabled and thus cannot sleep. The parallel() |
111 | function gets the padata_priv structure pointer as its lone parameter; | |
112 | information about the actual work to be done is probably obtained by using | |
113 | container_of() to find the enclosing structure. | |
114 | ||
115 | Note that parallel() has no return value; the padata subsystem assumes that | |
116 | parallel() will take responsibility for the task from this point. The work | |
117 | need not be completed during this call, but, if parallel() leaves work | |
118 | outstanding, it should be prepared to be called again with a new job before | |
119 | the previous one completes. When a task does complete, parallel() (or | |
120 | whatever function actually finishes the job) should inform padata of the | |
7576b2b9 | 121 | fact with a call to:: |
4047f8b1 JC |
122 | |
123 | void padata_do_serial(struct padata_priv *padata); | |
124 | ||
125 | At some point in the future, padata_do_serial() will trigger a call to the | |
126 | serial() function in the padata_priv structure. That call will happen on | |
127 | the CPU requested in the initial call to padata_do_parallel(); it, too, is | |
b128a304 | 128 | run with local software interrupts disabled. |
4047f8b1 JC |
129 | Note that this call may be deferred for a while since the padata code takes |
130 | pains to ensure that tasks are completed in the order in which they were | |
131 | submitted. | |
132 | ||
133 | The one remaining function in the padata API should be called to clean up | |
7576b2b9 | 134 | when a padata instance is no longer needed:: |
4047f8b1 JC |
135 | |
136 | void padata_free(struct padata_instance *pinst); | |
137 | ||
138 | This function will busy-wait while any remaining tasks are completed, so it | |
b128a304 | 139 | might be best not to call it while there is work outstanding. |