]>
Commit | Line | Data |
---|---|---|
4064174b JC |
1 | The Linux Journalling API |
2 | ========================= | |
3 | ||
4 | Overview | |
5 | -------- | |
6 | ||
7 | Details | |
8 | ~~~~~~~ | |
9 | ||
10 | The journalling layer is easy to use. You need to first of all create a | |
11 | journal_t data structure. There are two calls to do this dependent on | |
12 | how you decide to allocate the physical media on which the journal | |
13 | resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in | |
14 | filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used | |
15 | for journal stored on a raw device (in a continuous range of blocks). A | |
16 | journal_t is a typedef for a struct pointer, so when you are finally | |
17 | finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up | |
18 | any used kernel memory. | |
19 | ||
20 | Once you have got your journal_t object you need to 'mount' or load the | |
21 | journal file. The journalling layer expects the space for the journal | |
22 | was already allocated and initialized properly by the userspace tools. | |
23 | When loading the journal you must call :c:func:`jbd2_journal_load` to process | |
24 | journal contents. If the client file system detects the journal contents | |
25 | does not need to be processed (or even need not have valid contents), it | |
26 | may call :c:func:`jbd2_journal_wipe` to clear the journal contents before | |
27 | calling :c:func:`jbd2_journal_load`. | |
28 | ||
29 | Note that jbd2_journal_wipe(..,0) calls | |
30 | :c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding | |
31 | transactions in the journal and similarly :c:func:`jbd2_journal_load` will | |
32 | call :c:func:`jbd2_journal_recover` if necessary. I would advise reading | |
33 | :c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. | |
34 | ||
35 | Now you can go ahead and start modifying the underlying filesystem. | |
36 | Almost. | |
37 | ||
38 | You still need to actually journal your filesystem changes, this is done | |
39 | by wrapping them into transactions. Additionally you also need to wrap | |
40 | the modification of each of the buffers with calls to the journal layer, | |
41 | so it knows what the modifications you are actually making are. To do | |
42 | this use :c:func:`jbd2_journal_start` which returns a transaction handle. | |
43 | ||
44 | :c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, | |
45 | which indicates the end of a transaction are nestable calls, so you can | |
46 | reenter a transaction if necessary, but remember you must call | |
47 | :c:func:`jbd2_journal_stop` the same number of times as | |
48 | :c:func:`jbd2_journal_start` before the transaction is completed (or more | |
49 | accurately leaves the update phase). Ext4/VFS makes use of this feature to | |
50 | simplify handling of inode dirtying, quota support, etc. | |
51 | ||
52 | Inside each transaction you need to wrap the modifications to the | |
53 | individual buffers (blocks). Before you start to modify a buffer you | |
54 | need to call :c:func:`jbd2_journal_get_create_access()` / | |
55 | :c:func:`jbd2_journal_get_write_access()` / | |
56 | :c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the | |
57 | journalling layer to copy the unmodified | |
58 | data if it needs to. After all the buffer may be part of a previously | |
59 | uncommitted transaction. At this point you are at last ready to modify a | |
60 | buffer, and once you are have done so you need to call | |
61 | :c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a | |
62 | buffer you now know is now longer required to be pushed back on the | |
63 | device you can call :c:func:`jbd2_journal_forget` in much the same way as you | |
64 | might have used :c:func:`bforget` in the past. | |
65 | ||
66 | A :c:func:`jbd2_journal_flush` may be called at any time to commit and | |
67 | checkpoint all your transactions. | |
68 | ||
69 | Then at umount time , in your :c:func:`put_super` you can then call | |
70 | :c:func:`jbd2_journal_destroy` to clean up your in-core journal object. | |
71 | ||
72 | Unfortunately there a couple of ways the journal layer can cause a | |
73 | deadlock. The first thing to note is that each task can only have a | |
74 | single outstanding transaction at any one time, remember nothing commits | |
75 | until the outermost :c:func:`jbd2_journal_stop`. This means you must complete | |
76 | the transaction at the end of each file/inode/address etc. operation you | |
77 | perform, so that the journalling system isn't re-entered on another | |
78 | journal. Since transactions can't be nested/batched across differing | |
79 | journals, and another filesystem other than yours (say ext4) may be | |
80 | modified in a later syscall. | |
81 | ||
82 | The second case to bear in mind is that :c:func:`jbd2_journal_start` can block | |
83 | if there isn't enough space in the journal for your transaction (based | |
84 | on the passed nblocks param) - when it blocks it merely(!) needs to wait | |
85 | for transactions to complete and be committed from other tasks, so | |
86 | essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid | |
87 | deadlocks you must treat :c:func:`jbd2_journal_start` / | |
88 | :c:func:`jbd2_journal_stop` as if they were semaphores and include them in | |
89 | your semaphore ordering rules to prevent | |
90 | deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking | |
91 | behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as | |
92 | easily as on :c:func:`jbd2_journal_start`. | |
93 | ||
94 | Try to reserve the right number of blocks the first time. ;-). This will | |
95 | be the maximum number of blocks you are going to touch in this | |
96 | transaction. I advise having a look at at least ext4_jbd.h to see the | |
97 | basis on which ext4 uses to make these decisions. | |
98 | ||
99 | Another wriggle to watch out for is your on-disk block allocation | |
100 | strategy. Why? Because, if you do a delete, you need to ensure you | |
101 | haven't reused any of the freed blocks until the transaction freeing | |
102 | these blocks commits. If you reused these blocks and crash happens, | |
103 | there is no way to restore the contents of the reallocated blocks at the | |
104 | end of the last fully committed transaction. One simple way of doing | |
105 | this is to mark blocks as free in internal in-memory block allocation | |
106 | structures only after the transaction freeing them commits. Ext4 uses | |
107 | journal commit callback for this purpose. | |
108 | ||
109 | With journal commit callbacks you can ask the journalling layer to call | |
110 | a callback function when the transaction is finally committed to disk, | |
111 | so that you can do some of your own management. You ask the journalling | |
112 | layer for calling the callback by simply setting | |
113 | ``journal->j_commit_callback`` function pointer and that function is | |
114 | called after each transaction commit. You can also use | |
115 | ``transaction->t_private_list`` for attaching entries to a transaction | |
116 | that need processing when the transaction commits. | |
117 | ||
118 | JBD2 also provides a way to block all transaction updates via | |
119 | :c:func:`jbd2_journal_lock_updates()` / | |
120 | :c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a | |
121 | window with a clean and stable fs for a moment. E.g. | |
122 | ||
123 | :: | |
124 | ||
125 | ||
126 | jbd2_journal_lock_updates() //stop new stuff happening.. | |
127 | jbd2_journal_flush() // checkpoint everything. | |
128 | ..do stuff on stable fs | |
129 | jbd2_journal_unlock_updates() // carry on with filesystem use. | |
130 | ||
131 | The opportunities for abuse and DOS attacks with this should be obvious, | |
132 | if you allow unprivileged userspace to trigger codepaths containing | |
133 | these calls. | |
134 | ||
135 | Summary | |
136 | ~~~~~~~ | |
137 | ||
138 | Using the journal is a matter of wrapping the different context changes, | |
139 | being each mount, each modification (transaction) and each changed | |
140 | buffer to tell the journalling layer about them. | |
141 | ||
142 | Data Types | |
143 | ---------- | |
144 | ||
145 | The journalling layer uses typedefs to 'hide' the concrete definitions | |
146 | of the structures used. As a client of the JBD2 layer you can just rely | |
147 | on the using the pointer as a magic cookie of some sort. Obviously the | |
148 | hiding is not enforced as this is 'C'. | |
149 | ||
150 | Structures | |
151 | ~~~~~~~~~~ | |
152 | ||
153 | .. kernel-doc:: include/linux/jbd2.h | |
154 | :internal: | |
155 | ||
156 | Functions | |
157 | --------- | |
158 | ||
159 | The functions here are split into two groups those that affect a journal | |
160 | as a whole, and those which are used to manage transactions | |
161 | ||
162 | Journal Level | |
163 | ~~~~~~~~~~~~~ | |
164 | ||
165 | .. kernel-doc:: fs/jbd2/journal.c | |
166 | :export: | |
167 | ||
168 | .. kernel-doc:: fs/jbd2/recovery.c | |
169 | :internal: | |
170 | ||
171 | Transasction Level | |
172 | ~~~~~~~~~~~~~~~~~~ | |
173 | ||
174 | .. kernel-doc:: fs/jbd2/transaction.c | |
175 | ||
176 | See also | |
177 | -------- | |
178 | ||
179 | `Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen | |
180 | Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ | |
181 | ||
182 | `Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen | |
183 | Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ | |
184 |