]>
Commit | Line | Data |
---|---|---|
93e28d66 SD |
1 | /* |
2 | * CDDL HEADER START | |
3 | * | |
4 | * The contents of this file are subject to the terms of the | |
5 | * Common Development and Distribution License (the "License"). | |
6 | * You may not use this file except in compliance with the License. | |
7 | * | |
8 | * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE | |
1d3ba0bf | 9 | * or https://opensource.org/licenses/CDDL-1.0. |
93e28d66 SD |
10 | * See the License for the specific language governing permissions |
11 | * and limitations under the License. | |
12 | * | |
13 | * When distributing Covered Code, include this CDDL HEADER in each | |
14 | * file and include the License file at usr/src/OPENSOLARIS.LICENSE. | |
15 | * If applicable, add the following below this CDDL HEADER, with the | |
16 | * fields enclosed by brackets "[]" replaced with your own identifying | |
17 | * information: Portions Copyright [yyyy] [name of copyright owner] | |
18 | * | |
19 | * CDDL HEADER END | |
20 | */ | |
21 | ||
22 | /* | |
23 | * Copyright (c) 2018, 2019 by Delphix. All rights reserved. | |
24 | */ | |
25 | ||
26 | #include <sys/dmu_objset.h> | |
27 | #include <sys/metaslab.h> | |
28 | #include <sys/metaslab_impl.h> | |
29 | #include <sys/spa.h> | |
30 | #include <sys/spa_impl.h> | |
31 | #include <sys/spa_log_spacemap.h> | |
32 | #include <sys/vdev_impl.h> | |
33 | #include <sys/zap.h> | |
34 | ||
35 | /* | |
36 | * Log Space Maps | |
37 | * | |
38 | * Log space maps are an optimization in ZFS metadata allocations for pools | |
39 | * whose workloads are primarily random-writes. Random-write workloads are also | |
40 | * typically random-free, meaning that they are freeing from locations scattered | |
41 | * throughout the pool. This means that each TXG we will have to append some | |
42 | * FREE records to almost every metaslab. With log space maps, we hold their | |
43 | * changes in memory and log them altogether in one pool-wide space map on-disk | |
44 | * for persistence. As more blocks are accumulated in the log space maps and | |
45 | * more unflushed changes are accounted in memory, we flush a selected group | |
46 | * of metaslabs every TXG to relieve memory pressure and potential overheads | |
47 | * when loading the pool. Flushing a metaslab to disk relieves memory as we | |
48 | * flush any unflushed changes from memory to disk (i.e. the metaslab's space | |
49 | * map) and saves import time by making old log space maps obsolete and | |
50 | * eventually destroying them. [A log space map is said to be obsolete when all | |
51 | * its entries have made it to their corresponding metaslab space maps]. | |
52 | * | |
53 | * == On disk data structures used == | |
54 | * | |
55 | * - The pool has a new feature flag and a new entry in the MOS. The feature | |
56 | * is activated when we create the first log space map and remains active | |
57 | * for the lifetime of the pool. The new entry in the MOS Directory [refer | |
58 | * to DMU_POOL_LOG_SPACEMAP_ZAP] is populated with a ZAP whose key-value | |
59 | * pairs are of the form <key: txg, value: log space map object for that txg>. | |
60 | * This entry is our on-disk reference of the log space maps that exist in | |
61 | * the pool for each TXG and it is used during import to load all the | |
62 | * metaslab unflushed changes in memory. To see how this structure is first | |
63 | * created and later populated refer to spa_generate_syncing_log_sm(). To see | |
64 | * how it is used during import time refer to spa_ld_log_sm_metadata(). | |
65 | * | |
66 | * - Each vdev has a new entry in its vdev_top_zap (see field | |
67 | * VDEV_TOP_ZAP_MS_UNFLUSHED_PHYS_TXGS) which holds the msp_unflushed_txg of | |
68 | * each metaslab in this vdev. This field is the on-disk counterpart of the | |
69 | * in-memory field ms_unflushed_txg which tells us from which TXG and onwards | |
70 | * the metaslab haven't had its changes flushed. During import, we use this | |
71 | * to ignore any entries in the space map log that are for this metaslab but | |
72 | * from a TXG before msp_unflushed_txg. At that point, we also populate its | |
73 | * in-memory counterpart and from there both fields are updated every time | |
74 | * we flush that metaslab. | |
75 | * | |
76 | * - A space map is created every TXG and, during that TXG, it is used to log | |
77 | * all incoming changes (the log space map). When created, the log space map | |
78 | * is referenced in memory by spa_syncing_log_sm and its object ID is inserted | |
79 | * to the space map ZAP mentioned above. The log space map is closed at the | |
80 | * end of the TXG and will be destroyed when it becomes fully obsolete. We | |
81 | * know when a log space map has become obsolete by looking at the oldest | |
82 | * (and smallest) ms_unflushed_txg in the pool. If the value of that is bigger | |
83 | * than the log space map's TXG, then it means that there is no metaslab who | |
84 | * doesn't have the changes from that log and we can therefore destroy it. | |
85 | * [see spa_cleanup_old_sm_logs()]. | |
86 | * | |
87 | * == Important in-memory structures == | |
88 | * | |
89 | * - The per-spa field spa_metaslabs_by_flushed sorts all the metaslabs in | |
90 | * the pool by their ms_unflushed_txg field. It is primarily used for three | |
91 | * reasons. First of all, it is used during flushing where we try to flush | |
92 | * metaslabs in-order from the oldest-flushed to the most recently flushed | |
93 | * every TXG. Secondly, it helps us to lookup the ms_unflushed_txg of the | |
94 | * oldest flushed metaslab to distinguish which log space maps have become | |
95 | * obsolete and which ones are still relevant. Finally it tells us which | |
96 | * metaslabs have unflushed changes in a pool where this feature was just | |
97 | * enabled, as we don't immediately add all of the pool's metaslabs but we | |
98 | * add them over time as they go through metaslab_sync(). The reason that | |
99 | * we do that is to ease these pools into the behavior of the flushing | |
100 | * algorithm (described later on). | |
101 | * | |
102 | * - The per-spa field spa_sm_logs_by_txg can be thought as the in-memory | |
103 | * counterpart of the space map ZAP mentioned above. It's an AVL tree whose | |
104 | * nodes represent the log space maps in the pool. This in-memory | |
105 | * representation of log space maps in the pool sorts the log space maps by | |
106 | * the TXG that they were created (which is also the TXG of their unflushed | |
107 | * changes). It also contains the following extra information for each | |
108 | * space map: | |
109 | * [1] The number of metaslabs that were last flushed on that TXG. This is | |
110 | * important because if that counter is zero and this is the oldest | |
111 | * log then it means that it is also obsolete. | |
112 | * [2] The number of blocks of that space map. This field is used by the | |
113 | * block heuristic of our flushing algorithm (described later on). | |
114 | * It represents how many blocks of metadata changes ZFS had to write | |
115 | * to disk for that TXG. | |
116 | * | |
117 | * - The per-spa field spa_log_summary is a list of entries that summarizes | |
118 | * the metaslab and block counts of all the nodes of the spa_sm_logs_by_txg | |
119 | * AVL tree mentioned above. The reason this exists is that our flushing | |
120 | * algorithm (described later) tries to estimate how many metaslabs to flush | |
121 | * in each TXG by iterating over all the log space maps and looking at their | |
122 | * block counts. Summarizing that information means that don't have to | |
123 | * iterate through each space map, minimizing the runtime overhead of the | |
124 | * flushing algorithm which would be induced in syncing context. In terms of | |
125 | * implementation the log summary is used as a queue: | |
126 | * * we modify or pop entries from its head when we flush metaslabs | |
127 | * * we modify or append entries to its tail when we sync changes. | |
128 | * | |
129 | * - Each metaslab has two new range trees that hold its unflushed changes, | |
130 | * ms_unflushed_allocs and ms_unflushed_frees. These are always disjoint. | |
131 | * | |
132 | * == Flushing algorithm == | |
133 | * | |
134 | * The decision of how many metaslabs to flush on a give TXG is guided by | |
135 | * two heuristics: | |
136 | * | |
137 | * [1] The memory heuristic - | |
138 | * We keep track of the memory used by the unflushed trees from all the | |
139 | * metaslabs [see sus_memused of spa_unflushed_stats] and we ensure that it | |
140 | * stays below a certain threshold which is determined by an arbitrary hard | |
141 | * limit and an arbitrary percentage of the system's memory [see | |
142 | * spa_log_exceeds_memlimit()]. When we see that the memory usage of the | |
143 | * unflushed changes are passing that threshold, we flush metaslabs, which | |
144 | * empties their unflushed range trees, reducing the memory used. | |
145 | * | |
146 | * [2] The block heuristic - | |
147 | * We try to keep the total number of blocks in the log space maps in check | |
148 | * so the log doesn't grow indefinitely and we don't induce a lot of overhead | |
149 | * when loading the pool. At the same time we don't want to flush a lot of | |
150 | * metaslabs too often as this would defeat the purpose of the log space map. | |
151 | * As a result we set a limit in the amount of blocks that we think it's | |
152 | * acceptable for the log space maps to have and try not to cross it. | |
153 | * [see sus_blocklimit from spa_unflushed_stats]. | |
154 | * | |
155 | * In order to stay below the block limit every TXG we have to estimate how | |
156 | * many metaslabs we need to flush based on the current rate of incoming blocks | |
157 | * and our history of log space map blocks. The main idea here is to answer | |
158 | * the question of how many metaslabs do we need to flush in order to get rid | |
159 | * at least an X amount of log space map blocks. We can answer this question | |
160 | * by iterating backwards from the oldest log space map to the newest one | |
161 | * and looking at their metaslab and block counts. At this point the log summary | |
162 | * mentioned above comes handy as it reduces the amount of things that we have | |
163 | * to iterate (even though it may reduce the preciseness of our estimates due | |
164 | * to its aggregation of data). So with that in mind, we project the incoming | |
165 | * rate of the current TXG into the future and attempt to approximate how many | |
166 | * metaslabs would we need to flush from now in order to avoid exceeding our | |
167 | * block limit in different points in the future (granted that we would keep | |
168 | * flushing the same number of metaslabs for every TXG). Then we take the | |
169 | * maximum number from all these estimates to be on the safe side. For the | |
170 | * exact implementation details of algorithm refer to | |
171 | * spa_estimate_metaslabs_to_flush. | |
172 | */ | |
173 | ||
174 | /* | |
175 | * This is used as the block size for the space maps used for the | |
176 | * log space map feature. These space maps benefit from a bigger | |
177 | * block size as we expect to be writing a lot of data to them at | |
178 | * once. | |
179 | */ | |
18168da7 | 180 | static const unsigned long zfs_log_sm_blksz = 1ULL << 17; |
93e28d66 SD |
181 | |
182 | /* | |
e1cfd73f | 183 | * Percentage of the overall system's memory that ZFS allows to be |
93e28d66 SD |
184 | * used for unflushed changes (e.g. the sum of size of all the nodes |
185 | * in the unflushed trees). | |
186 | * | |
187 | * Note that this value is calculated over 1000000 for finer granularity | |
188 | * (thus the _ppm suffix; reads as "parts per million"). As an example, | |
189 | * the default of 1000 allows 0.1% of memory to be used. | |
190 | */ | |
ab8d9c17 | 191 | static uint64_t zfs_unflushed_max_mem_ppm = 1000; |
93e28d66 SD |
192 | |
193 | /* | |
194 | * Specific hard-limit in memory that ZFS allows to be used for | |
195 | * unflushed changes. | |
196 | */ | |
ab8d9c17 | 197 | static uint64_t zfs_unflushed_max_mem_amt = 1ULL << 30; |
93e28d66 SD |
198 | |
199 | /* | |
200 | * The following tunable determines the number of blocks that can be used for | |
201 | * the log space maps. It is expressed as a percentage of the total number of | |
202 | * metaslabs in the pool (i.e. the default of 400 means that the number of log | |
203 | * blocks is capped at 4 times the number of metaslabs). | |
204 | * | |
205 | * This value exists to tune our flushing algorithm, with higher values | |
206 | * flushing metaslabs less often (doing less I/Os) per TXG versus lower values | |
207 | * flushing metaslabs more aggressively with the upside of saving overheads | |
208 | * when loading the pool. Another factor in this tradeoff is that flushing | |
209 | * less often can potentially lead to better utilization of the metaslab space | |
210 | * map's block size as we accumulate more changes per flush. | |
211 | * | |
212 | * Given that this tunable indirectly controls the flush rate (metaslabs | |
213 | * flushed per txg) and that's why making it a percentage in terms of the | |
214 | * number of metaslabs in the pool makes sense here. | |
215 | * | |
216 | * As a rule of thumb we default this tunable to 400% based on the following: | |
217 | * | |
218 | * 1] Assuming a constant flush rate and a constant incoming rate of log blocks | |
219 | * it is reasonable to expect that the amount of obsolete entries changes | |
220 | * linearly from txg to txg (e.g. the oldest log should have the most | |
221 | * obsolete entries, and the most recent one the least). With this we could | |
222 | * say that, at any given time, about half of the entries in the whole space | |
223 | * map log are obsolete. Thus for every two entries for a metaslab in the | |
224 | * log space map, only one of them is valid and actually makes it to the | |
225 | * metaslab's space map. | |
226 | * [factor of 2] | |
227 | * 2] Each entry in the log space map is guaranteed to be two words while | |
228 | * entries in metaslab space maps are generally single-word. | |
229 | * [an extra factor of 2 - 400% overall] | |
230 | * 3] Even if [1] and [2] are slightly less than 2 each, we haven't taken into | |
231 | * account any consolidation of segments from the log space map to the | |
232 | * unflushed range trees nor their history (e.g. a segment being allocated, | |
233 | * then freed, then allocated again means 3 log space map entries but 0 | |
234 | * metaslab space map entries). Depending on the workload, we've seen ~1.8 | |
235 | * non-obsolete log space map entries per metaslab entry, for a total of | |
236 | * ~600%. Since most of these estimates though are workload dependent, we | |
237 | * default on 400% to be conservative. | |
238 | * | |
239 | * Thus we could say that even in the worst | |
240 | * case of [1] and [2], the factor should end up being 4. | |
241 | * | |
242 | * That said, regardless of the number of metaslabs in the pool we need to | |
243 | * provide upper and lower bounds for the log block limit. | |
244 | * [see zfs_unflushed_log_block_{min,max}] | |
245 | */ | |
ab8d9c17 | 246 | static uint_t zfs_unflushed_log_block_pct = 400; |
93e28d66 SD |
247 | |
248 | /* | |
249 | * If the number of metaslabs is small and our incoming rate is high, we could | |
250 | * get into a situation that we are flushing all our metaslabs every TXG. Thus | |
251 | * we always allow at least this many log blocks. | |
252 | */ | |
ab8d9c17 | 253 | static uint64_t zfs_unflushed_log_block_min = 1000; |
93e28d66 SD |
254 | |
255 | /* | |
256 | * If the log becomes too big, the import time of the pool can take a hit in | |
257 | * terms of performance. Thus we have a hard limit in the size of the log in | |
258 | * terms of blocks. | |
259 | */ | |
ab8d9c17 | 260 | static uint64_t zfs_unflushed_log_block_max = (1ULL << 17); |
600a02b8 AM |
261 | |
262 | /* | |
263 | * Also we have a hard limit in the size of the log in terms of dirty TXGs. | |
264 | */ | |
ab8d9c17 | 265 | static uint64_t zfs_unflushed_log_txg_max = 1000; |
93e28d66 SD |
266 | |
267 | /* | |
268 | * Max # of rows allowed for the log_summary. The tradeoff here is accuracy and | |
269 | * stability of the flushing algorithm (longer summary) vs its runtime overhead | |
270 | * (smaller summary is faster to traverse). | |
271 | */ | |
ab8d9c17 | 272 | static uint64_t zfs_max_logsm_summary_length = 10; |
93e28d66 SD |
273 | |
274 | /* | |
275 | * Tunable that sets the lower bound on the metaslabs to flush every TXG. | |
276 | * | |
277 | * Setting this to 0 has no effect since if the pool is idle we won't even be | |
278 | * creating log space maps and therefore we won't be flushing. On the other | |
279 | * hand if the pool has any incoming workload our block heuristic will start | |
280 | * flushing metaslabs anyway. | |
281 | * | |
282 | * The point of this tunable is to be used in extreme cases where we really | |
283 | * want to flush more metaslabs than our adaptable heuristic plans to flush. | |
284 | */ | |
ab8d9c17 | 285 | static uint64_t zfs_min_metaslabs_to_flush = 1; |
93e28d66 SD |
286 | |
287 | /* | |
288 | * Tunable that specifies how far in the past do we want to look when trying to | |
289 | * estimate the incoming log blocks for the current TXG. | |
290 | * | |
291 | * Setting this too high may not only increase runtime but also minimize the | |
292 | * effect of the incoming rates from the most recent TXGs as we take the | |
293 | * average over all the blocks that we walk | |
294 | * [see spa_estimate_incoming_log_blocks]. | |
295 | */ | |
ab8d9c17 | 296 | static uint64_t zfs_max_log_walking = 5; |
93e28d66 SD |
297 | |
298 | /* | |
299 | * This tunable exists solely for testing purposes. It ensures that the log | |
300 | * spacemaps are not flushed and destroyed during export in order for the | |
301 | * relevant log spacemap import code paths to be tested (effectively simulating | |
302 | * a crash). | |
303 | */ | |
304 | int zfs_keep_log_spacemaps_at_export = 0; | |
305 | ||
306 | static uint64_t | |
307 | spa_estimate_incoming_log_blocks(spa_t *spa) | |
308 | { | |
309 | ASSERT3U(spa_sync_pass(spa), ==, 1); | |
310 | uint64_t steps = 0, sum = 0; | |
311 | for (spa_log_sm_t *sls = avl_last(&spa->spa_sm_logs_by_txg); | |
312 | sls != NULL && steps < zfs_max_log_walking; | |
313 | sls = AVL_PREV(&spa->spa_sm_logs_by_txg, sls)) { | |
314 | if (sls->sls_txg == spa_syncing_txg(spa)) { | |
315 | /* | |
316 | * skip the log created in this TXG as this would | |
317 | * make our estimations inaccurate. | |
318 | */ | |
319 | continue; | |
320 | } | |
321 | sum += sls->sls_nblocks; | |
322 | steps++; | |
323 | } | |
324 | return ((steps > 0) ? DIV_ROUND_UP(sum, steps) : 0); | |
325 | } | |
326 | ||
327 | uint64_t | |
328 | spa_log_sm_blocklimit(spa_t *spa) | |
329 | { | |
330 | return (spa->spa_unflushed_stats.sus_blocklimit); | |
331 | } | |
332 | ||
333 | void | |
334 | spa_log_sm_set_blocklimit(spa_t *spa) | |
335 | { | |
336 | if (!spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)) { | |
337 | ASSERT0(spa_log_sm_blocklimit(spa)); | |
338 | return; | |
339 | } | |
340 | ||
600a02b8 AM |
341 | uint64_t msdcount = 0; |
342 | for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); | |
343 | e; e = list_next(&spa->spa_log_summary, e)) | |
344 | msdcount += e->lse_msdcount; | |
345 | ||
346 | uint64_t limit = msdcount * zfs_unflushed_log_block_pct / 100; | |
347 | spa->spa_unflushed_stats.sus_blocklimit = MIN(MAX(limit, | |
93e28d66 SD |
348 | zfs_unflushed_log_block_min), zfs_unflushed_log_block_max); |
349 | } | |
350 | ||
351 | uint64_t | |
352 | spa_log_sm_nblocks(spa_t *spa) | |
353 | { | |
354 | return (spa->spa_unflushed_stats.sus_nblocks); | |
355 | } | |
356 | ||
357 | /* | |
358 | * Ensure that the in-memory log space map structures and the summary | |
359 | * have the same block and metaslab counts. | |
360 | */ | |
361 | static void | |
362 | spa_log_summary_verify_counts(spa_t *spa) | |
363 | { | |
364 | ASSERT(spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)); | |
365 | ||
366 | if ((zfs_flags & ZFS_DEBUG_LOG_SPACEMAP) == 0) | |
367 | return; | |
368 | ||
369 | uint64_t ms_in_avl = avl_numnodes(&spa->spa_metaslabs_by_flushed); | |
370 | ||
371 | uint64_t ms_in_summary = 0, blk_in_summary = 0; | |
372 | for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); | |
373 | e; e = list_next(&spa->spa_log_summary, e)) { | |
374 | ms_in_summary += e->lse_mscount; | |
375 | blk_in_summary += e->lse_blkcount; | |
376 | } | |
377 | ||
378 | uint64_t ms_in_logs = 0, blk_in_logs = 0; | |
379 | for (spa_log_sm_t *sls = avl_first(&spa->spa_sm_logs_by_txg); | |
380 | sls; sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls)) { | |
381 | ms_in_logs += sls->sls_mscount; | |
382 | blk_in_logs += sls->sls_nblocks; | |
383 | } | |
384 | ||
385 | VERIFY3U(ms_in_logs, ==, ms_in_summary); | |
386 | VERIFY3U(ms_in_logs, ==, ms_in_avl); | |
387 | VERIFY3U(blk_in_logs, ==, blk_in_summary); | |
388 | VERIFY3U(blk_in_logs, ==, spa_log_sm_nblocks(spa)); | |
389 | } | |
390 | ||
391 | static boolean_t | |
600a02b8 | 392 | summary_entry_is_full(spa_t *spa, log_summary_entry_t *e, uint64_t txg) |
93e28d66 | 393 | { |
600a02b8 AM |
394 | if (e->lse_end == txg) |
395 | return (0); | |
396 | if (e->lse_txgcount >= DIV_ROUND_UP(zfs_unflushed_log_txg_max, | |
397 | zfs_max_logsm_summary_length)) | |
398 | return (1); | |
93e28d66 SD |
399 | uint64_t blocks_per_row = MAX(1, |
400 | DIV_ROUND_UP(spa_log_sm_blocklimit(spa), | |
401 | zfs_max_logsm_summary_length)); | |
402 | return (blocks_per_row <= e->lse_blkcount); | |
403 | } | |
404 | ||
405 | /* | |
406 | * Update the log summary information to reflect the fact that a metaslab | |
407 | * was flushed or destroyed (e.g due to device removal or pool export/destroy). | |
408 | * | |
e1cfd73f | 409 | * We typically flush the oldest flushed metaslab so the first (and oldest) |
93e28d66 SD |
410 | * entry of the summary is updated. However if that metaslab is getting loaded |
411 | * we may flush the second oldest one which may be part of an entry later in | |
412 | * the summary. Moreover, if we call into this function from metaslab_fini() | |
413 | * the metaslabs probably won't be ordered by ms_unflushed_txg. Thus we ask | |
414 | * for a txg as an argument so we can locate the appropriate summary entry for | |
415 | * the metaslab. | |
416 | */ | |
417 | void | |
600a02b8 | 418 | spa_log_summary_decrement_mscount(spa_t *spa, uint64_t txg, boolean_t dirty) |
93e28d66 SD |
419 | { |
420 | /* | |
421 | * We don't track summary data for read-only pools and this function | |
422 | * can be called from metaslab_fini(). In that case return immediately. | |
423 | */ | |
424 | if (!spa_writeable(spa)) | |
425 | return; | |
426 | ||
427 | log_summary_entry_t *target = NULL; | |
428 | for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); | |
429 | e != NULL; e = list_next(&spa->spa_log_summary, e)) { | |
430 | if (e->lse_start > txg) | |
431 | break; | |
432 | target = e; | |
433 | } | |
434 | ||
435 | if (target == NULL || target->lse_mscount == 0) { | |
436 | /* | |
437 | * We didn't find a summary entry for this metaslab. We must be | |
438 | * at the teardown of a spa_load() attempt that got an error | |
439 | * while reading the log space maps. | |
440 | */ | |
441 | VERIFY3S(spa_load_state(spa), ==, SPA_LOAD_ERROR); | |
442 | return; | |
443 | } | |
444 | ||
445 | target->lse_mscount--; | |
600a02b8 AM |
446 | if (dirty) |
447 | target->lse_msdcount--; | |
93e28d66 SD |
448 | } |
449 | ||
450 | /* | |
451 | * Update the log summary information to reflect the fact that we destroyed | |
452 | * old log space maps. Since we can only destroy the oldest log space maps, | |
453 | * we decrement the block count of the oldest summary entry and potentially | |
454 | * destroy it when that count hits 0. | |
455 | * | |
456 | * This function is called after a metaslab is flushed and typically that | |
457 | * metaslab is the oldest flushed, which means that this function will | |
458 | * typically decrement the block count of the first entry of the summary and | |
459 | * potentially free it if the block count gets to zero (its metaslab count | |
460 | * should be zero too at that point). | |
461 | * | |
462 | * There are certain scenarios though that don't work exactly like that so we | |
463 | * need to account for them: | |
464 | * | |
465 | * Scenario [1]: It is possible that after we flushed the oldest flushed | |
466 | * metaslab and we destroyed the oldest log space map, more recent logs had 0 | |
467 | * metaslabs pointing to them so we got rid of them too. This can happen due | |
468 | * to metaslabs being destroyed through device removal, or because the oldest | |
469 | * flushed metaslab was loading but we kept flushing more recently flushed | |
470 | * metaslabs due to the memory pressure of unflushed changes. Because of that, | |
471 | * we always iterate from the beginning of the summary and if blocks_gone is | |
472 | * bigger than the block_count of the current entry we free that entry (we | |
473 | * expect its metaslab count to be zero), we decrement blocks_gone and on to | |
474 | * the next entry repeating this procedure until blocks_gone gets decremented | |
475 | * to 0. Doing this also works for the typical case mentioned above. | |
476 | * | |
477 | * Scenario [2]: The oldest flushed metaslab isn't necessarily accounted by | |
478 | * the first (and oldest) entry in the summary. If the first few entries of | |
479 | * the summary were only accounting metaslabs from a device that was just | |
480 | * removed, then the current oldest flushed metaslab could be accounted by an | |
481 | * entry somewhere in the middle of the summary. Moreover flushing that | |
482 | * metaslab will destroy all the log space maps older than its ms_unflushed_txg | |
483 | * because they became obsolete after the removal. Thus, iterating as we did | |
484 | * for scenario [1] works out for this case too. | |
485 | * | |
486 | * Scenario [3]: At times we decide to flush all the metaslabs in the pool | |
487 | * in one TXG (either because we are exporting the pool or because our flushing | |
488 | * heuristics decided to do so). When that happens all the log space maps get | |
489 | * destroyed except the one created for the current TXG which doesn't have | |
490 | * any log blocks yet. As log space maps get destroyed with every metaslab that | |
491 | * we flush, entries in the summary are also destroyed. This brings a weird | |
492 | * corner-case when we flush the last metaslab and the log space map of the | |
493 | * current TXG is in the same summary entry with other log space maps that | |
494 | * are older. When that happens we are eventually left with this one last | |
495 | * summary entry whose blocks are gone (blocks_gone equals the entry's block | |
496 | * count) but its metaslab count is non-zero (because it accounts all the | |
497 | * metaslabs in the pool as they all got flushed). Under this scenario we can't | |
498 | * free this last summary entry as it's referencing all the metaslabs in the | |
499 | * pool and its block count will get incremented at the end of this sync (when | |
500 | * we close the syncing log space map). Thus we just decrement its current | |
501 | * block count and leave it alone. In the case that the pool gets exported, | |
502 | * its metaslab count will be decremented over time as we call metaslab_fini() | |
503 | * for all the metaslabs in the pool and the entry will be freed at | |
504 | * spa_unload_log_sm_metadata(). | |
505 | */ | |
506 | void | |
507 | spa_log_summary_decrement_blkcount(spa_t *spa, uint64_t blocks_gone) | |
508 | { | |
600a02b8 | 509 | log_summary_entry_t *e = list_head(&spa->spa_log_summary); |
a6ccb36b | 510 | ASSERT3P(e, !=, NULL); |
600a02b8 AM |
511 | if (e->lse_txgcount > 0) |
512 | e->lse_txgcount--; | |
513 | for (; e != NULL; e = list_head(&spa->spa_log_summary)) { | |
93e28d66 | 514 | if (e->lse_blkcount > blocks_gone) { |
93e28d66 SD |
515 | e->lse_blkcount -= blocks_gone; |
516 | blocks_gone = 0; | |
517 | break; | |
518 | } else if (e->lse_mscount == 0) { | |
519 | /* remove obsolete entry */ | |
520 | blocks_gone -= e->lse_blkcount; | |
521 | list_remove(&spa->spa_log_summary, e); | |
522 | kmem_free(e, sizeof (log_summary_entry_t)); | |
523 | } else { | |
524 | /* Verify that this is scenario [3] mentioned above. */ | |
525 | VERIFY3U(blocks_gone, ==, e->lse_blkcount); | |
526 | ||
527 | /* | |
528 | * Assert that this is scenario [3] further by ensuring | |
529 | * that this is the only entry in the summary. | |
530 | */ | |
531 | VERIFY3P(e, ==, list_tail(&spa->spa_log_summary)); | |
532 | ASSERT3P(e, ==, list_head(&spa->spa_log_summary)); | |
533 | ||
534 | blocks_gone = e->lse_blkcount = 0; | |
535 | break; | |
536 | } | |
537 | } | |
538 | ||
539 | /* | |
540 | * Ensure that there is no way we are trying to remove more blocks | |
541 | * than the # of blocks in the summary. | |
542 | */ | |
543 | ASSERT0(blocks_gone); | |
544 | } | |
545 | ||
546 | void | |
547 | spa_log_sm_decrement_mscount(spa_t *spa, uint64_t txg) | |
548 | { | |
549 | spa_log_sm_t target = { .sls_txg = txg }; | |
550 | spa_log_sm_t *sls = avl_find(&spa->spa_sm_logs_by_txg, | |
551 | &target, NULL); | |
552 | ||
553 | if (sls == NULL) { | |
554 | /* | |
555 | * We must be at the teardown of a spa_load() attempt that | |
556 | * got an error while reading the log space maps. | |
557 | */ | |
558 | VERIFY3S(spa_load_state(spa), ==, SPA_LOAD_ERROR); | |
559 | return; | |
560 | } | |
561 | ||
562 | ASSERT(sls->sls_mscount > 0); | |
563 | sls->sls_mscount--; | |
564 | } | |
565 | ||
566 | void | |
567 | spa_log_sm_increment_current_mscount(spa_t *spa) | |
568 | { | |
569 | spa_log_sm_t *last_sls = avl_last(&spa->spa_sm_logs_by_txg); | |
570 | ASSERT3U(last_sls->sls_txg, ==, spa_syncing_txg(spa)); | |
571 | last_sls->sls_mscount++; | |
572 | } | |
573 | ||
574 | static void | |
575 | summary_add_data(spa_t *spa, uint64_t txg, uint64_t metaslabs_flushed, | |
600a02b8 | 576 | uint64_t metaslabs_dirty, uint64_t nblocks) |
93e28d66 SD |
577 | { |
578 | log_summary_entry_t *e = list_tail(&spa->spa_log_summary); | |
579 | ||
600a02b8 | 580 | if (e == NULL || summary_entry_is_full(spa, e, txg)) { |
93e28d66 | 581 | e = kmem_zalloc(sizeof (log_summary_entry_t), KM_SLEEP); |
600a02b8 AM |
582 | e->lse_start = e->lse_end = txg; |
583 | e->lse_txgcount = 1; | |
93e28d66 SD |
584 | list_insert_tail(&spa->spa_log_summary, e); |
585 | } | |
586 | ||
587 | ASSERT3U(e->lse_start, <=, txg); | |
600a02b8 AM |
588 | if (e->lse_end < txg) { |
589 | e->lse_end = txg; | |
590 | e->lse_txgcount++; | |
591 | } | |
93e28d66 | 592 | e->lse_mscount += metaslabs_flushed; |
600a02b8 | 593 | e->lse_msdcount += metaslabs_dirty; |
93e28d66 SD |
594 | e->lse_blkcount += nblocks; |
595 | } | |
596 | ||
597 | static void | |
598 | spa_log_summary_add_incoming_blocks(spa_t *spa, uint64_t nblocks) | |
599 | { | |
600a02b8 | 600 | summary_add_data(spa, spa_syncing_txg(spa), 0, 0, nblocks); |
93e28d66 SD |
601 | } |
602 | ||
603 | void | |
600a02b8 | 604 | spa_log_summary_add_flushed_metaslab(spa_t *spa, boolean_t dirty) |
93e28d66 | 605 | { |
600a02b8 AM |
606 | summary_add_data(spa, spa_syncing_txg(spa), 1, dirty ? 1 : 0, 0); |
607 | } | |
608 | ||
609 | void | |
610 | spa_log_summary_dirty_flushed_metaslab(spa_t *spa, uint64_t txg) | |
611 | { | |
612 | log_summary_entry_t *target = NULL; | |
613 | for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); | |
614 | e != NULL; e = list_next(&spa->spa_log_summary, e)) { | |
615 | if (e->lse_start > txg) | |
616 | break; | |
617 | target = e; | |
618 | } | |
619 | ASSERT3P(target, !=, NULL); | |
620 | ASSERT3U(target->lse_mscount, !=, 0); | |
621 | target->lse_msdcount++; | |
93e28d66 SD |
622 | } |
623 | ||
624 | /* | |
625 | * This function attempts to estimate how many metaslabs should | |
626 | * we flush to satisfy our block heuristic for the log spacemap | |
627 | * for the upcoming TXGs. | |
628 | * | |
629 | * Specifically, it first tries to estimate the number of incoming | |
630 | * blocks in this TXG. Then by projecting that incoming rate to | |
631 | * future TXGs and using the log summary, it figures out how many | |
632 | * flushes we would need to do for future TXGs individually to | |
633 | * stay below our block limit and returns the maximum number of | |
634 | * flushes from those estimates. | |
635 | */ | |
636 | static uint64_t | |
637 | spa_estimate_metaslabs_to_flush(spa_t *spa) | |
638 | { | |
639 | ASSERT(spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)); | |
640 | ASSERT3U(spa_sync_pass(spa), ==, 1); | |
641 | ASSERT(spa_log_sm_blocklimit(spa) != 0); | |
642 | ||
643 | /* | |
644 | * This variable contains the incoming rate that will be projected | |
645 | * and used for our flushing estimates in the future. | |
646 | */ | |
647 | uint64_t incoming = spa_estimate_incoming_log_blocks(spa); | |
648 | ||
649 | /* | |
650 | * At any point in time this variable tells us how many | |
651 | * TXGs in the future we are so we can make our estimations. | |
652 | */ | |
653 | uint64_t txgs_in_future = 1; | |
654 | ||
655 | /* | |
656 | * This variable tells us how much room do we have until we hit | |
657 | * our limit. When it goes negative, it means that we've exceeded | |
658 | * our limit and we need to flush. | |
659 | * | |
660 | * Note that since we start at the first TXG in the future (i.e. | |
661 | * txgs_in_future starts from 1) we already decrement this | |
662 | * variable by the incoming rate. | |
663 | */ | |
664 | int64_t available_blocks = | |
665 | spa_log_sm_blocklimit(spa) - spa_log_sm_nblocks(spa) - incoming; | |
666 | ||
600a02b8 AM |
667 | int64_t available_txgs = zfs_unflushed_log_txg_max; |
668 | for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); | |
669 | e; e = list_next(&spa->spa_log_summary, e)) | |
670 | available_txgs -= e->lse_txgcount; | |
671 | ||
93e28d66 SD |
672 | /* |
673 | * This variable tells us the total number of flushes needed to | |
674 | * keep the log size within the limit when we reach txgs_in_future. | |
675 | */ | |
676 | uint64_t total_flushes = 0; | |
677 | ||
678 | /* Holds the current maximum of our estimates so far. */ | |
600a02b8 | 679 | uint64_t max_flushes_pertxg = zfs_min_metaslabs_to_flush; |
93e28d66 SD |
680 | |
681 | /* | |
682 | * For our estimations we only look as far in the future | |
683 | * as the summary allows us. | |
684 | */ | |
685 | for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); | |
686 | e; e = list_next(&spa->spa_log_summary, e)) { | |
687 | ||
688 | /* | |
689 | * If there is still room before we exceed our limit | |
690 | * then keep skipping TXGs accumulating more blocks | |
691 | * based on the incoming rate until we exceed it. | |
692 | */ | |
600a02b8 | 693 | if (available_blocks >= 0 && available_txgs >= 0) { |
f47f6a05 RY |
694 | uint64_t skip_txgs = (incoming == 0) ? |
695 | available_txgs + 1 : MIN(available_txgs + 1, | |
600a02b8 | 696 | (available_blocks / incoming) + 1); |
93e28d66 | 697 | available_blocks -= (skip_txgs * incoming); |
600a02b8 | 698 | available_txgs -= skip_txgs; |
93e28d66 SD |
699 | txgs_in_future += skip_txgs; |
700 | ASSERT3S(available_blocks, >=, -incoming); | |
600a02b8 | 701 | ASSERT3S(available_txgs, >=, -1); |
93e28d66 SD |
702 | } |
703 | ||
704 | /* | |
705 | * At this point we're far enough into the future where | |
706 | * the limit was just exceeded and we flush metaslabs | |
707 | * based on the current entry in the summary, updating | |
708 | * our available_blocks. | |
709 | */ | |
600a02b8 | 710 | ASSERT(available_blocks < 0 || available_txgs < 0); |
93e28d66 | 711 | available_blocks += e->lse_blkcount; |
600a02b8 AM |
712 | available_txgs += e->lse_txgcount; |
713 | total_flushes += e->lse_msdcount; | |
93e28d66 SD |
714 | |
715 | /* | |
716 | * Keep the running maximum of the total_flushes that | |
717 | * we've done so far over the number of TXGs in the | |
718 | * future that we are. The idea here is to estimate | |
719 | * the average number of flushes that we should do | |
720 | * every TXG so that when we are that many TXGs in the | |
721 | * future we stay under the limit. | |
722 | */ | |
723 | max_flushes_pertxg = MAX(max_flushes_pertxg, | |
724 | DIV_ROUND_UP(total_flushes, txgs_in_future)); | |
93e28d66 SD |
725 | } |
726 | return (max_flushes_pertxg); | |
727 | } | |
728 | ||
729 | uint64_t | |
730 | spa_log_sm_memused(spa_t *spa) | |
731 | { | |
732 | return (spa->spa_unflushed_stats.sus_memused); | |
733 | } | |
734 | ||
735 | static boolean_t | |
736 | spa_log_exceeds_memlimit(spa_t *spa) | |
737 | { | |
738 | if (spa_log_sm_memused(spa) > zfs_unflushed_max_mem_amt) | |
739 | return (B_TRUE); | |
740 | ||
741 | uint64_t system_mem_allowed = ((physmem * PAGESIZE) * | |
742 | zfs_unflushed_max_mem_ppm) / 1000000; | |
743 | if (spa_log_sm_memused(spa) > system_mem_allowed) | |
744 | return (B_TRUE); | |
745 | ||
746 | return (B_FALSE); | |
747 | } | |
748 | ||
749 | boolean_t | |
750 | spa_flush_all_logs_requested(spa_t *spa) | |
751 | { | |
752 | return (spa->spa_log_flushall_txg != 0); | |
753 | } | |
754 | ||
755 | void | |
756 | spa_flush_metaslabs(spa_t *spa, dmu_tx_t *tx) | |
757 | { | |
758 | uint64_t txg = dmu_tx_get_txg(tx); | |
759 | ||
760 | if (spa_sync_pass(spa) != 1) | |
761 | return; | |
762 | ||
763 | if (!spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)) | |
764 | return; | |
765 | ||
766 | /* | |
767 | * If we don't have any metaslabs with unflushed changes | |
768 | * return immediately. | |
769 | */ | |
770 | if (avl_numnodes(&spa->spa_metaslabs_by_flushed) == 0) | |
771 | return; | |
772 | ||
773 | /* | |
774 | * During SPA export we leave a few empty TXGs to go by [see | |
775 | * spa_final_dirty_txg() to understand why]. For this specific | |
776 | * case, it is important to not flush any metaslabs as that | |
777 | * would dirty this TXG. | |
778 | * | |
779 | * That said, during one of these dirty TXGs that is less or | |
780 | * equal to spa_final_dirty(), spa_unload() will request that | |
781 | * we try to flush all the metaslabs for that TXG before | |
782 | * exporting the pool, thus we ensure that we didn't get a | |
783 | * request of flushing everything before we attempt to return | |
784 | * immediately. | |
785 | */ | |
493fcce9 | 786 | if (BP_GET_LOGICAL_BIRTH(&spa->spa_uberblock.ub_rootbp) < txg && |
93e28d66 SD |
787 | !dmu_objset_is_dirty(spa_meta_objset(spa), txg) && |
788 | !spa_flush_all_logs_requested(spa)) | |
789 | return; | |
790 | ||
791 | /* | |
792 | * We need to generate a log space map before flushing because this | |
793 | * will set up the in-memory data (i.e. node in spa_sm_logs_by_txg) | |
794 | * for this TXG's flushed metaslab count (aka sls_mscount which is | |
795 | * manipulated in many ways down the metaslab_flush() codepath). | |
796 | * | |
797 | * That is not to say that we may generate a log space map when we | |
798 | * don't need it. If we are flushing metaslabs, that means that we | |
799 | * were going to write changes to disk anyway, so even if we were | |
800 | * not flushing, a log space map would have been created anyway in | |
801 | * metaslab_sync(). | |
802 | */ | |
803 | spa_generate_syncing_log_sm(spa, tx); | |
804 | ||
805 | /* | |
806 | * This variable tells us how many metaslabs we want to flush based | |
807 | * on the block-heuristic of our flushing algorithm (see block comment | |
808 | * of log space map feature). We also decrement this as we flush | |
809 | * metaslabs and attempt to destroy old log space maps. | |
810 | */ | |
811 | uint64_t want_to_flush; | |
812 | if (spa_flush_all_logs_requested(spa)) { | |
813 | ASSERT3S(spa_state(spa), ==, POOL_STATE_EXPORTED); | |
600a02b8 | 814 | want_to_flush = UINT64_MAX; |
93e28d66 SD |
815 | } else { |
816 | want_to_flush = spa_estimate_metaslabs_to_flush(spa); | |
817 | } | |
818 | ||
93e28d66 SD |
819 | /* Used purely for verification purposes */ |
820 | uint64_t visited = 0; | |
821 | ||
822 | /* | |
823 | * Ideally we would only iterate through spa_metaslabs_by_flushed | |
824 | * using only one variable (curr). We can't do that because | |
825 | * metaslab_flush() mutates position of curr in the AVL when | |
826 | * it flushes that metaslab by moving it to the end of the tree. | |
827 | * Thus we always keep track of the original next node of the | |
828 | * current node (curr) in another variable (next). | |
829 | */ | |
830 | metaslab_t *next = NULL; | |
831 | for (metaslab_t *curr = avl_first(&spa->spa_metaslabs_by_flushed); | |
832 | curr != NULL; curr = next) { | |
833 | next = AVL_NEXT(&spa->spa_metaslabs_by_flushed, curr); | |
834 | ||
835 | /* | |
836 | * If this metaslab has been flushed this txg then we've done | |
837 | * a full circle over the metaslabs. | |
838 | */ | |
839 | if (metaslab_unflushed_txg(curr) == txg) | |
840 | break; | |
841 | ||
842 | /* | |
843 | * If we are done flushing for the block heuristic and the | |
844 | * unflushed changes don't exceed the memory limit just stop. | |
845 | */ | |
846 | if (want_to_flush == 0 && !spa_log_exceeds_memlimit(spa)) | |
847 | break; | |
848 | ||
600a02b8 AM |
849 | if (metaslab_unflushed_dirty(curr)) { |
850 | mutex_enter(&curr->ms_sync_lock); | |
851 | mutex_enter(&curr->ms_lock); | |
852 | metaslab_flush(curr, tx); | |
853 | mutex_exit(&curr->ms_lock); | |
854 | mutex_exit(&curr->ms_sync_lock); | |
855 | if (want_to_flush > 0) | |
856 | want_to_flush--; | |
857 | } else | |
858 | metaslab_unflushed_bump(curr, tx, B_FALSE); | |
93e28d66 SD |
859 | |
860 | visited++; | |
861 | } | |
862 | ASSERT3U(avl_numnodes(&spa->spa_metaslabs_by_flushed), >=, visited); | |
600a02b8 AM |
863 | |
864 | spa_log_sm_set_blocklimit(spa); | |
93e28d66 SD |
865 | } |
866 | ||
867 | /* | |
868 | * Close the log space map for this TXG and update the block counts | |
e1cfd73f | 869 | * for the log's in-memory structure and the summary. |
93e28d66 SD |
870 | */ |
871 | void | |
872 | spa_sync_close_syncing_log_sm(spa_t *spa) | |
873 | { | |
874 | if (spa_syncing_log_sm(spa) == NULL) | |
875 | return; | |
876 | ASSERT(spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)); | |
877 | ||
878 | spa_log_sm_t *sls = avl_last(&spa->spa_sm_logs_by_txg); | |
879 | ASSERT3U(sls->sls_txg, ==, spa_syncing_txg(spa)); | |
880 | ||
881 | sls->sls_nblocks = space_map_nblocks(spa_syncing_log_sm(spa)); | |
882 | spa->spa_unflushed_stats.sus_nblocks += sls->sls_nblocks; | |
883 | ||
884 | /* | |
885 | * Note that we can't assert that sls_mscount is not 0, | |
886 | * because there is the case where the first metaslab | |
887 | * in spa_metaslabs_by_flushed is loading and we were | |
888 | * not able to flush any metaslabs the current TXG. | |
889 | */ | |
890 | ASSERT(sls->sls_nblocks != 0); | |
891 | ||
892 | spa_log_summary_add_incoming_blocks(spa, sls->sls_nblocks); | |
893 | spa_log_summary_verify_counts(spa); | |
894 | ||
895 | space_map_close(spa->spa_syncing_log_sm); | |
896 | spa->spa_syncing_log_sm = NULL; | |
897 | ||
898 | /* | |
899 | * At this point we tried to flush as many metaslabs as we | |
900 | * can as the pool is getting exported. Reset the "flush all" | |
901 | * so the last few TXGs before closing the pool can be empty | |
902 | * (e.g. not dirty). | |
903 | */ | |
904 | if (spa_flush_all_logs_requested(spa)) { | |
905 | ASSERT3S(spa_state(spa), ==, POOL_STATE_EXPORTED); | |
906 | spa->spa_log_flushall_txg = 0; | |
907 | } | |
908 | } | |
909 | ||
910 | void | |
911 | spa_cleanup_old_sm_logs(spa_t *spa, dmu_tx_t *tx) | |
912 | { | |
913 | objset_t *mos = spa_meta_objset(spa); | |
914 | ||
915 | uint64_t spacemap_zap; | |
916 | int error = zap_lookup(mos, DMU_POOL_DIRECTORY_OBJECT, | |
917 | DMU_POOL_LOG_SPACEMAP_ZAP, sizeof (spacemap_zap), 1, &spacemap_zap); | |
918 | if (error == ENOENT) { | |
919 | ASSERT(avl_is_empty(&spa->spa_sm_logs_by_txg)); | |
920 | return; | |
921 | } | |
922 | VERIFY0(error); | |
923 | ||
924 | metaslab_t *oldest = avl_first(&spa->spa_metaslabs_by_flushed); | |
925 | uint64_t oldest_flushed_txg = metaslab_unflushed_txg(oldest); | |
926 | ||
927 | /* Free all log space maps older than the oldest_flushed_txg. */ | |
928 | for (spa_log_sm_t *sls = avl_first(&spa->spa_sm_logs_by_txg); | |
929 | sls && sls->sls_txg < oldest_flushed_txg; | |
930 | sls = avl_first(&spa->spa_sm_logs_by_txg)) { | |
931 | ASSERT0(sls->sls_mscount); | |
932 | avl_remove(&spa->spa_sm_logs_by_txg, sls); | |
933 | space_map_free_obj(mos, sls->sls_sm_obj, tx); | |
934 | VERIFY0(zap_remove_int(mos, spacemap_zap, sls->sls_txg, tx)); | |
600a02b8 | 935 | spa_log_summary_decrement_blkcount(spa, sls->sls_nblocks); |
93e28d66 SD |
936 | spa->spa_unflushed_stats.sus_nblocks -= sls->sls_nblocks; |
937 | kmem_free(sls, sizeof (spa_log_sm_t)); | |
938 | } | |
939 | } | |
940 | ||
941 | static spa_log_sm_t * | |
942 | spa_log_sm_alloc(uint64_t sm_obj, uint64_t txg) | |
943 | { | |
944 | spa_log_sm_t *sls = kmem_zalloc(sizeof (*sls), KM_SLEEP); | |
945 | sls->sls_sm_obj = sm_obj; | |
946 | sls->sls_txg = txg; | |
947 | return (sls); | |
948 | } | |
949 | ||
950 | void | |
951 | spa_generate_syncing_log_sm(spa_t *spa, dmu_tx_t *tx) | |
952 | { | |
953 | uint64_t txg = dmu_tx_get_txg(tx); | |
954 | objset_t *mos = spa_meta_objset(spa); | |
955 | ||
956 | if (spa_syncing_log_sm(spa) != NULL) | |
957 | return; | |
958 | ||
959 | if (!spa_feature_is_enabled(spa, SPA_FEATURE_LOG_SPACEMAP)) | |
960 | return; | |
961 | ||
962 | uint64_t spacemap_zap; | |
963 | int error = zap_lookup(mos, DMU_POOL_DIRECTORY_OBJECT, | |
964 | DMU_POOL_LOG_SPACEMAP_ZAP, sizeof (spacemap_zap), 1, &spacemap_zap); | |
965 | if (error == ENOENT) { | |
966 | ASSERT(avl_is_empty(&spa->spa_sm_logs_by_txg)); | |
967 | ||
968 | error = 0; | |
969 | spacemap_zap = zap_create(mos, | |
970 | DMU_OTN_ZAP_METADATA, DMU_OT_NONE, 0, tx); | |
971 | VERIFY0(zap_add(mos, DMU_POOL_DIRECTORY_OBJECT, | |
972 | DMU_POOL_LOG_SPACEMAP_ZAP, sizeof (spacemap_zap), 1, | |
973 | &spacemap_zap, tx)); | |
974 | spa_feature_incr(spa, SPA_FEATURE_LOG_SPACEMAP, tx); | |
975 | } | |
976 | VERIFY0(error); | |
977 | ||
978 | uint64_t sm_obj; | |
979 | ASSERT3U(zap_lookup_int_key(mos, spacemap_zap, txg, &sm_obj), | |
980 | ==, ENOENT); | |
981 | sm_obj = space_map_alloc(mos, zfs_log_sm_blksz, tx); | |
982 | VERIFY0(zap_add_int_key(mos, spacemap_zap, txg, sm_obj, tx)); | |
983 | avl_add(&spa->spa_sm_logs_by_txg, spa_log_sm_alloc(sm_obj, txg)); | |
984 | ||
985 | /* | |
986 | * We pass UINT64_MAX as the space map's representation size | |
987 | * and SPA_MINBLOCKSHIFT as the shift, to make the space map | |
988 | * accept any sorts of segments since there's no real advantage | |
989 | * to being more restrictive (given that we're already going | |
990 | * to be using 2-word entries). | |
991 | */ | |
992 | VERIFY0(space_map_open(&spa->spa_syncing_log_sm, mos, sm_obj, | |
993 | 0, UINT64_MAX, SPA_MINBLOCKSHIFT)); | |
994 | ||
600a02b8 | 995 | spa_log_sm_set_blocklimit(spa); |
93e28d66 SD |
996 | } |
997 | ||
998 | /* | |
999 | * Find all the log space maps stored in the space map ZAP and sort | |
1000 | * them by their TXG in spa_sm_logs_by_txg. | |
1001 | */ | |
1002 | static int | |
1003 | spa_ld_log_sm_metadata(spa_t *spa) | |
1004 | { | |
1005 | int error; | |
1006 | uint64_t spacemap_zap; | |
1007 | ||
1008 | ASSERT(avl_is_empty(&spa->spa_sm_logs_by_txg)); | |
1009 | ||
1010 | error = zap_lookup(spa_meta_objset(spa), DMU_POOL_DIRECTORY_OBJECT, | |
1011 | DMU_POOL_LOG_SPACEMAP_ZAP, sizeof (spacemap_zap), 1, &spacemap_zap); | |
1012 | if (error == ENOENT) { | |
1013 | /* the space map ZAP doesn't exist yet */ | |
1014 | return (0); | |
1015 | } else if (error != 0) { | |
1ba4f3e7 | 1016 | spa_load_failed(spa, "spa_ld_log_sm_metadata(): failed at " |
93e28d66 SD |
1017 | "zap_lookup(DMU_POOL_DIRECTORY_OBJECT) [error %d]", |
1018 | error); | |
1019 | return (error); | |
1020 | } | |
1021 | ||
1022 | zap_cursor_t zc; | |
1023 | zap_attribute_t za; | |
1024 | for (zap_cursor_init(&zc, spa_meta_objset(spa), spacemap_zap); | |
1ba4f3e7 SD |
1025 | (error = zap_cursor_retrieve(&zc, &za)) == 0; |
1026 | zap_cursor_advance(&zc)) { | |
93e28d66 SD |
1027 | uint64_t log_txg = zfs_strtonum(za.za_name, NULL); |
1028 | spa_log_sm_t *sls = | |
1029 | spa_log_sm_alloc(za.za_first_integer, log_txg); | |
1030 | avl_add(&spa->spa_sm_logs_by_txg, sls); | |
1031 | } | |
1032 | zap_cursor_fini(&zc); | |
1ba4f3e7 SD |
1033 | if (error != ENOENT) { |
1034 | spa_load_failed(spa, "spa_ld_log_sm_metadata(): failed at " | |
1035 | "zap_cursor_retrieve(spacemap_zap) [error %d]", | |
1036 | error); | |
1037 | return (error); | |
1038 | } | |
93e28d66 SD |
1039 | |
1040 | for (metaslab_t *m = avl_first(&spa->spa_metaslabs_by_flushed); | |
1041 | m; m = AVL_NEXT(&spa->spa_metaslabs_by_flushed, m)) { | |
1042 | spa_log_sm_t target = { .sls_txg = metaslab_unflushed_txg(m) }; | |
1043 | spa_log_sm_t *sls = avl_find(&spa->spa_sm_logs_by_txg, | |
1044 | &target, NULL); | |
1ba4f3e7 SD |
1045 | |
1046 | /* | |
1047 | * At this point if sls is zero it means that a bug occurred | |
1048 | * in ZFS the last time the pool was open or earlier in the | |
1049 | * import code path. In general, we would have placed a | |
1050 | * VERIFY() here or in this case just let the kernel panic | |
1051 | * with NULL pointer dereference when incrementing sls_mscount, | |
1052 | * but since this is the import code path we can be a bit more | |
1053 | * lenient. Thus, for DEBUG bits we always cause a panic, while | |
1054 | * in production we log the error and just fail the import. | |
1055 | */ | |
1056 | ASSERT(sls != NULL); | |
1057 | if (sls == NULL) { | |
1058 | spa_load_failed(spa, "spa_ld_log_sm_metadata(): bug " | |
1059 | "encountered: could not find log spacemap for " | |
5dbf6c5a AZ |
1060 | "TXG %llu [error %d]", |
1061 | (u_longlong_t)metaslab_unflushed_txg(m), ENOENT); | |
1ba4f3e7 SD |
1062 | return (ENOENT); |
1063 | } | |
93e28d66 SD |
1064 | sls->sls_mscount++; |
1065 | } | |
1066 | ||
1067 | return (0); | |
1068 | } | |
1069 | ||
1070 | typedef struct spa_ld_log_sm_arg { | |
1071 | spa_t *slls_spa; | |
1072 | uint64_t slls_txg; | |
1073 | } spa_ld_log_sm_arg_t; | |
1074 | ||
1075 | static int | |
1076 | spa_ld_log_sm_cb(space_map_entry_t *sme, void *arg) | |
1077 | { | |
1078 | uint64_t offset = sme->sme_offset; | |
1079 | uint64_t size = sme->sme_run; | |
1080 | uint32_t vdev_id = sme->sme_vdev; | |
1081 | ||
1082 | spa_ld_log_sm_arg_t *slls = arg; | |
1083 | spa_t *spa = slls->slls_spa; | |
1084 | ||
1085 | vdev_t *vd = vdev_lookup_top(spa, vdev_id); | |
1086 | ||
1087 | /* | |
1088 | * If the vdev has been removed (i.e. it is indirect or a hole) | |
1089 | * skip this entry. The contents of this vdev have already moved | |
1090 | * elsewhere. | |
1091 | */ | |
1092 | if (!vdev_is_concrete(vd)) | |
1093 | return (0); | |
1094 | ||
1095 | metaslab_t *ms = vd->vdev_ms[offset >> vd->vdev_ms_shift]; | |
1096 | ASSERT(!ms->ms_loaded); | |
1097 | ||
1098 | /* | |
1099 | * If we have already flushed entries for this TXG to this | |
1100 | * metaslab's space map, then ignore it. Note that we flush | |
1101 | * before processing any allocations/frees for that TXG, so | |
1102 | * the metaslab's space map only has entries from *before* | |
1103 | * the unflushed TXG. | |
1104 | */ | |
1105 | if (slls->slls_txg < metaslab_unflushed_txg(ms)) | |
1106 | return (0); | |
1107 | ||
1108 | switch (sme->sme_type) { | |
1109 | case SM_ALLOC: | |
1110 | range_tree_remove_xor_add_segment(offset, offset + size, | |
1111 | ms->ms_unflushed_frees, ms->ms_unflushed_allocs); | |
1112 | break; | |
1113 | case SM_FREE: | |
1114 | range_tree_remove_xor_add_segment(offset, offset + size, | |
1115 | ms->ms_unflushed_allocs, ms->ms_unflushed_frees); | |
1116 | break; | |
1117 | default: | |
1118 | panic("invalid maptype_t"); | |
1119 | break; | |
1120 | } | |
600a02b8 AM |
1121 | if (!metaslab_unflushed_dirty(ms)) { |
1122 | metaslab_set_unflushed_dirty(ms, B_TRUE); | |
1123 | spa_log_summary_dirty_flushed_metaslab(spa, | |
1124 | metaslab_unflushed_txg(ms)); | |
1125 | } | |
93e28d66 SD |
1126 | return (0); |
1127 | } | |
1128 | ||
1129 | static int | |
1130 | spa_ld_log_sm_data(spa_t *spa) | |
1131 | { | |
600a02b8 | 1132 | spa_log_sm_t *sls, *psls; |
93e28d66 SD |
1133 | int error = 0; |
1134 | ||
1135 | /* | |
1136 | * If we are not going to do any writes there is no need | |
1137 | * to read the log space maps. | |
1138 | */ | |
1139 | if (!spa_writeable(spa)) | |
1140 | return (0); | |
1141 | ||
1142 | ASSERT0(spa->spa_unflushed_stats.sus_nblocks); | |
1143 | ASSERT0(spa->spa_unflushed_stats.sus_memused); | |
1144 | ||
1145 | hrtime_t read_logs_starttime = gethrtime(); | |
600a02b8 AM |
1146 | |
1147 | /* Prefetch log spacemaps dnodes. */ | |
1148 | for (sls = avl_first(&spa->spa_sm_logs_by_txg); sls; | |
1149 | sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls)) { | |
6c94e649 AM |
1150 | dmu_prefetch_dnode(spa_meta_objset(spa), sls->sls_sm_obj, |
1151 | ZIO_PRIORITY_SYNC_READ); | |
600a02b8 AM |
1152 | } |
1153 | ||
1154 | uint_t pn = 0; | |
1155 | uint64_t ps = 0; | |
687e4d7f | 1156 | uint64_t nsm = 0; |
600a02b8 AM |
1157 | psls = sls = avl_first(&spa->spa_sm_logs_by_txg); |
1158 | while (sls != NULL) { | |
1159 | /* Prefetch log spacemaps up to 16 TXGs or MBs ahead. */ | |
1160 | if (psls != NULL && pn < 16 && | |
1161 | (pn < 2 || ps < 2 * dmu_prefetch_max)) { | |
1162 | error = space_map_open(&psls->sls_sm, | |
1163 | spa_meta_objset(spa), psls->sls_sm_obj, 0, | |
1164 | UINT64_MAX, SPA_MINBLOCKSHIFT); | |
1165 | if (error != 0) { | |
1166 | spa_load_failed(spa, "spa_ld_log_sm_data(): " | |
1167 | "failed at space_map_open(obj=%llu) " | |
1168 | "[error %d]", | |
1169 | (u_longlong_t)sls->sls_sm_obj, error); | |
1170 | goto out; | |
1171 | } | |
1172 | dmu_prefetch(spa_meta_objset(spa), psls->sls_sm_obj, | |
1173 | 0, 0, space_map_length(psls->sls_sm), | |
1174 | ZIO_PRIORITY_ASYNC_READ); | |
1175 | pn++; | |
1176 | ps += space_map_length(psls->sls_sm); | |
1177 | psls = AVL_NEXT(&spa->spa_sm_logs_by_txg, psls); | |
1178 | continue; | |
93e28d66 SD |
1179 | } |
1180 | ||
600a02b8 | 1181 | /* Load TXG log spacemap into ms_unflushed_allocs/frees. */ |
0e4c830b | 1182 | kpreempt(KPREEMPT_SYNC); |
600a02b8 AM |
1183 | ASSERT0(sls->sls_nblocks); |
1184 | sls->sls_nblocks = space_map_nblocks(sls->sls_sm); | |
1185 | spa->spa_unflushed_stats.sus_nblocks += sls->sls_nblocks; | |
1186 | summary_add_data(spa, sls->sls_txg, | |
1187 | sls->sls_mscount, 0, sls->sls_nblocks); | |
1188 | ||
687e4d7f DB |
1189 | spa_import_progress_set_notes_nolog(spa, |
1190 | "Read %llu of %lu log space maps", (u_longlong_t)nsm, | |
1191 | avl_numnodes(&spa->spa_sm_logs_by_txg)); | |
1192 | ||
93e28d66 SD |
1193 | struct spa_ld_log_sm_arg vla = { |
1194 | .slls_spa = spa, | |
1195 | .slls_txg = sls->sls_txg | |
1196 | }; | |
600a02b8 AM |
1197 | error = space_map_iterate(sls->sls_sm, |
1198 | space_map_length(sls->sls_sm), spa_ld_log_sm_cb, &vla); | |
93e28d66 | 1199 | if (error != 0) { |
93e28d66 SD |
1200 | spa_load_failed(spa, "spa_ld_log_sm_data(): failed " |
1201 | "at space_map_iterate(obj=%llu) [error %d]", | |
1202 | (u_longlong_t)sls->sls_sm_obj, error); | |
1203 | goto out; | |
1204 | } | |
1205 | ||
600a02b8 AM |
1206 | pn--; |
1207 | ps -= space_map_length(sls->sls_sm); | |
687e4d7f | 1208 | nsm++; |
600a02b8 AM |
1209 | space_map_close(sls->sls_sm); |
1210 | sls->sls_sm = NULL; | |
1211 | sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls); | |
93e28d66 | 1212 | |
600a02b8 AM |
1213 | /* Update log block limits considering just loaded. */ |
1214 | spa_log_sm_set_blocklimit(spa); | |
93e28d66 | 1215 | } |
600a02b8 | 1216 | |
93e28d66 SD |
1217 | hrtime_t read_logs_endtime = gethrtime(); |
1218 | spa_load_note(spa, | |
687e4d7f DB |
1219 | "Read %lu log space maps (%llu total blocks - blksz = %llu bytes) " |
1220 | "in %lld ms", avl_numnodes(&spa->spa_sm_logs_by_txg), | |
93e28d66 SD |
1221 | (u_longlong_t)spa_log_sm_nblocks(spa), |
1222 | (u_longlong_t)zfs_log_sm_blksz, | |
687e4d7f | 1223 | (longlong_t)NSEC2MSEC(read_logs_endtime - read_logs_starttime)); |
93e28d66 SD |
1224 | |
1225 | out: | |
600a02b8 AM |
1226 | if (error != 0) { |
1227 | for (spa_log_sm_t *sls = avl_first(&spa->spa_sm_logs_by_txg); | |
1228 | sls; sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls)) { | |
1229 | if (sls->sls_sm) { | |
1230 | space_map_close(sls->sls_sm); | |
1231 | sls->sls_sm = NULL; | |
1232 | } | |
1233 | } | |
1234 | } else { | |
1235 | ASSERT0(pn); | |
1236 | ASSERT0(ps); | |
1237 | } | |
93e28d66 SD |
1238 | /* |
1239 | * Now that the metaslabs contain their unflushed changes: | |
1240 | * [1] recalculate their actual allocated space | |
1241 | * [2] recalculate their weights | |
1242 | * [3] sum up the memory usage of their unflushed range trees | |
1243 | * [4] optionally load them, if debug_load is set | |
1244 | * | |
1245 | * Note that even in the case where we get here because of an | |
1246 | * error (e.g. error != 0), we still want to update the fields | |
1247 | * below in order to have a proper teardown in spa_unload(). | |
1248 | */ | |
1249 | for (metaslab_t *m = avl_first(&spa->spa_metaslabs_by_flushed); | |
1250 | m != NULL; m = AVL_NEXT(&spa->spa_metaslabs_by_flushed, m)) { | |
1251 | mutex_enter(&m->ms_lock); | |
1252 | m->ms_allocated_space = space_map_allocated(m->ms_sm) + | |
1253 | range_tree_space(m->ms_unflushed_allocs) - | |
1254 | range_tree_space(m->ms_unflushed_frees); | |
1255 | ||
1256 | vdev_t *vd = m->ms_group->mg_vd; | |
1257 | metaslab_space_update(vd, m->ms_group->mg_class, | |
1258 | range_tree_space(m->ms_unflushed_allocs), 0, 0); | |
1259 | metaslab_space_update(vd, m->ms_group->mg_class, | |
1260 | -range_tree_space(m->ms_unflushed_frees), 0, 0); | |
1261 | ||
1262 | ASSERT0(m->ms_weight & METASLAB_ACTIVE_MASK); | |
1263 | metaslab_recalculate_weight_and_sort(m); | |
1264 | ||
1265 | spa->spa_unflushed_stats.sus_memused += | |
1266 | metaslab_unflushed_changes_memused(m); | |
1267 | ||
1268 | if (metaslab_debug_load && m->ms_sm != NULL) { | |
1269 | VERIFY0(metaslab_load(m)); | |
f09fda50 | 1270 | metaslab_set_selected_txg(m, 0); |
93e28d66 SD |
1271 | } |
1272 | mutex_exit(&m->ms_lock); | |
1273 | } | |
1274 | ||
1275 | return (error); | |
1276 | } | |
1277 | ||
1278 | static int | |
1279 | spa_ld_unflushed_txgs(vdev_t *vd) | |
1280 | { | |
1281 | spa_t *spa = vd->vdev_spa; | |
1282 | objset_t *mos = spa_meta_objset(spa); | |
1283 | ||
1284 | if (vd->vdev_top_zap == 0) | |
1285 | return (0); | |
1286 | ||
1287 | uint64_t object = 0; | |
1288 | int error = zap_lookup(mos, vd->vdev_top_zap, | |
1289 | VDEV_TOP_ZAP_MS_UNFLUSHED_PHYS_TXGS, | |
1290 | sizeof (uint64_t), 1, &object); | |
1291 | if (error == ENOENT) | |
1292 | return (0); | |
1293 | else if (error != 0) { | |
1294 | spa_load_failed(spa, "spa_ld_unflushed_txgs(): failed at " | |
1295 | "zap_lookup(vdev_top_zap=%llu) [error %d]", | |
1296 | (u_longlong_t)vd->vdev_top_zap, error); | |
1297 | return (error); | |
1298 | } | |
1299 | ||
1300 | for (uint64_t m = 0; m < vd->vdev_ms_count; m++) { | |
1301 | metaslab_t *ms = vd->vdev_ms[m]; | |
1302 | ASSERT(ms != NULL); | |
1303 | ||
1304 | metaslab_unflushed_phys_t entry; | |
1305 | uint64_t entry_size = sizeof (entry); | |
1306 | uint64_t entry_offset = ms->ms_id * entry_size; | |
1307 | ||
1308 | error = dmu_read(mos, object, | |
1309 | entry_offset, entry_size, &entry, 0); | |
1310 | if (error != 0) { | |
1311 | spa_load_failed(spa, "spa_ld_unflushed_txgs(): " | |
1312 | "failed at dmu_read(obj=%llu) [error %d]", | |
1313 | (u_longlong_t)object, error); | |
1314 | return (error); | |
1315 | } | |
1316 | ||
1317 | ms->ms_unflushed_txg = entry.msp_unflushed_txg; | |
600a02b8 AM |
1318 | ms->ms_unflushed_dirty = B_FALSE; |
1319 | ASSERT(range_tree_is_empty(ms->ms_unflushed_allocs)); | |
1320 | ASSERT(range_tree_is_empty(ms->ms_unflushed_frees)); | |
93e28d66 SD |
1321 | if (ms->ms_unflushed_txg != 0) { |
1322 | mutex_enter(&spa->spa_flushed_ms_lock); | |
1323 | avl_add(&spa->spa_metaslabs_by_flushed, ms); | |
1324 | mutex_exit(&spa->spa_flushed_ms_lock); | |
1325 | } | |
1326 | } | |
1327 | return (0); | |
1328 | } | |
1329 | ||
1330 | /* | |
1331 | * Read all the log space map entries into their respective | |
1332 | * metaslab unflushed trees and keep them sorted by TXG in the | |
1333 | * SPA's metadata. In addition, setup all the metadata for the | |
1334 | * memory and the block heuristics. | |
1335 | */ | |
1336 | int | |
1337 | spa_ld_log_spacemaps(spa_t *spa) | |
1338 | { | |
1339 | int error; | |
1340 | ||
1341 | spa_log_sm_set_blocklimit(spa); | |
1342 | ||
1343 | for (uint64_t c = 0; c < spa->spa_root_vdev->vdev_children; c++) { | |
1344 | vdev_t *vd = spa->spa_root_vdev->vdev_child[c]; | |
1345 | error = spa_ld_unflushed_txgs(vd); | |
1346 | if (error != 0) | |
1347 | return (error); | |
1348 | } | |
1349 | ||
1350 | error = spa_ld_log_sm_metadata(spa); | |
1351 | if (error != 0) | |
1352 | return (error); | |
1353 | ||
1354 | /* | |
1355 | * Note: we don't actually expect anything to change at this point | |
1356 | * but we grab the config lock so we don't fail any assertions | |
1357 | * when using vdev_lookup_top(). | |
1358 | */ | |
1359 | spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER); | |
1360 | error = spa_ld_log_sm_data(spa); | |
1361 | spa_config_exit(spa, SCL_CONFIG, FTAG); | |
1362 | ||
1363 | return (error); | |
1364 | } | |
1365 | ||
93e28d66 | 1366 | /* BEGIN CSTYLED */ |
ab8d9c17 | 1367 | ZFS_MODULE_PARAM(zfs, zfs_, unflushed_max_mem_amt, U64, ZMOD_RW, |
7ada752a AZ |
1368 | "Specific hard-limit in memory that ZFS allows to be used for " |
1369 | "unflushed changes"); | |
93e28d66 | 1370 | |
ab8d9c17 | 1371 | ZFS_MODULE_PARAM(zfs, zfs_, unflushed_max_mem_ppm, U64, ZMOD_RW, |
7ada752a AZ |
1372 | "Percentage of the overall system memory that ZFS allows to be " |
1373 | "used for unflushed changes (value is calculated over 1000000 for " | |
1374 | "finer granularity)"); | |
93e28d66 | 1375 | |
ab8d9c17 | 1376 | ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_block_max, U64, ZMOD_RW, |
7ada752a AZ |
1377 | "Hard limit (upper-bound) in the size of the space map log " |
1378 | "in terms of blocks."); | |
93e28d66 | 1379 | |
ab8d9c17 | 1380 | ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_block_min, U64, ZMOD_RW, |
7ada752a AZ |
1381 | "Lower-bound limit for the maximum amount of blocks allowed in " |
1382 | "log spacemap (see zfs_unflushed_log_block_max)"); | |
93e28d66 | 1383 | |
ab8d9c17 | 1384 | ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_txg_max, U64, ZMOD_RW, |
600a02b8 AM |
1385 | "Hard limit (upper-bound) in the size of the space map log " |
1386 | "in terms of dirty TXGs."); | |
1387 | ||
ab8d9c17 | 1388 | ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_block_pct, UINT, ZMOD_RW, |
7ada752a AZ |
1389 | "Tunable used to determine the number of blocks that can be used for " |
1390 | "the spacemap log, expressed as a percentage of the total number of " | |
1391 | "metaslabs in the pool (e.g. 400 means the number of log blocks is " | |
1392 | "capped at 4 times the number of metaslabs)"); | |
93e28d66 | 1393 | |
ab8d9c17 | 1394 | ZFS_MODULE_PARAM(zfs, zfs_, max_log_walking, U64, ZMOD_RW, |
7ada752a AZ |
1395 | "The number of past TXGs that the flushing algorithm of the log " |
1396 | "spacemap feature uses to estimate incoming log blocks"); | |
1397 | ||
1398 | ZFS_MODULE_PARAM(zfs, zfs_, keep_log_spacemaps_at_export, INT, ZMOD_RW, | |
1399 | "Prevent the log spacemaps from being flushed and destroyed " | |
1400 | "during pool export/destroy"); | |
1401 | /* END CSTYLED */ | |
93e28d66 | 1402 | |
ab8d9c17 | 1403 | ZFS_MODULE_PARAM(zfs, zfs_, max_logsm_summary_length, U64, ZMOD_RW, |
7ada752a | 1404 | "Maximum number of rows allowed in the summary of the spacemap log"); |
93e28d66 | 1405 | |
ab8d9c17 | 1406 | ZFS_MODULE_PARAM(zfs, zfs_, min_metaslabs_to_flush, U64, ZMOD_RW, |
7ada752a | 1407 | "Minimum number of metaslabs to flush per dirty TXG"); |