Kent Overstreet [Mon, 15 Mar 2021 21:26:19 +0000 (17:26 -0400)]
bcachefs: Kill reflink option
An option was added to control whether reflink support was on or off
because for a long time, reflink + inline data extent support was
missing - but that's since been fixed, so we can drop the option now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Mar 2021 01:30:08 +0000 (21:30 -0400)]
bcachefs: Fix read retry path for indirect extents
In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 8 Mar 2021 22:09:13 +0000 (17:09 -0500)]
bcachefs: Fix locking in bch2_btree_iter_traverse_cached()
bch2_btree_iter_traverse() is supposed to ensure we have the correct
type of lock - it was downgrading if necessary, but if we entered with a
read lock it wasn't upgrading to an intent lock, oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 21 Feb 2021 01:51:57 +0000 (20:51 -0500)]
bcachefs: Improve handling of extents in bch2_trans_update()
The transaction update/commit path cares about whether it's inserting
extents or regular keys; extents require extra passes (handling of
overlapping extents) but sometimes we want to skip all that. This
clarifies things by adding a new member to btree_insert_entry specifying
whether the key being inserted is an extent, instead of overloading
BTREE_ITER_IS_EXTENTS.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 20 Feb 2021 04:41:40 +0000 (23:41 -0500)]
bcachefs: KEY_TYPE_discard is no longer used
KEY_TYPE_discard used to be used for extent whiteouts, but when handling
over overlapping extents was lifted above the core btree code it became
unused. This patch updates various code to reflect that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 20 Feb 2021 05:00:23 +0000 (00:00 -0500)]
bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE()
bcachefs has been aggressively migrating filesystems and btree nodes to
the new format for quite some time - this shouldn't affect anyone
anymore, and lets us delete a _lot_ of code. Also, it frees up
KEY_TYPE_discard for a new whiteout key type for snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 28 Dec 2021 03:11:54 +0000 (22:11 -0500)]
bcachefs: Fix bch2_btree_cache_scan()
It was counting nodes on the freed list that it skips - because we want
to leave a few so that btree splits don't touch the allocator - as nodes
that it touched, meaning that if it was called with <= 3 nodes to
reclaim, and those nodes were on the freed list, it would never do any
work.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 18 Apr 2021 22:01:49 +0000 (18:01 -0400)]
bcachefs: Fix for copygc getting stuck waiting for reserve to be filled
This fixes a regression from the patch
bcachefs: Fix copygc dying on startup
In general only the allocator thread itself should be updating
ca->allocator_state, the thread waking up the allocator setting it is an
ugly hack only needed to avoid racing with the copygc threads when we're
first starting up.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 13 Apr 2021 13:49:23 +0000 (09:49 -0400)]
bcachefs: Fix copygc threshold
Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.
This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 16 Apr 2021 22:59:54 +0000 (18:59 -0400)]
bcachefs: Don't drop ptrs to btree nodes
If a ptr gen doesn't match the bucket gen, the bucket likely doesn't
contain the data we want - but it's still possible the data we want
might have been overwritten, and for btree node pointers we can verify
whether or not the node is the one we wanted with the node's sequence
number, so it's better to keep the pointer and try reading from it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 16 Apr 2021 20:54:11 +0000 (16:54 -0400)]
bcachefs: Bring back metadata only gc
This is useful for the filesystem dump debugging tool - when we're
hitting bugs we want to skip as much of the recovery process as
possible, and the dump tool only needs to know where metadata lives.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 3 Apr 2021 03:41:10 +0000 (23:41 -0400)]
bcachefs: Don't fail mounts due to devices that are marked as failed
If a given set of replicas is entirely on failed devices, don't fail the
mount: we will still fail the mount if we have some copies on non failed
devices.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 21 Mar 2021 03:55:36 +0000 (23:55 -0400)]
bcachefs: Fix bkey format generation for 32 bit fields
Having a packed format that can represent a field larger than the
unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 22 Mar 2021 22:39:16 +0000 (18:39 -0400)]
bcachefs: Scan for old btree nodes if necessary on mount
We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned
out there were people who still had filesystems with btree nodes in that
format in the wild. This adds a new compat feature that indicates we've
scanned for and rewritten nodes in the old format, and does that scan at
mount time if the option isn't set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 5 Mar 2021 23:00:55 +0000 (18:00 -0500)]
bcachefs: Create allocator threads when allocating filesystem
We're seeing failures to mount because of a failure to start the
allocator threads, which currently happens fairly late in the mount
process, after walking all metadata, and kthread_create() fails if
something has tried to kill the mount process, which is probably not
what we want.
This patch avoids this issue by creating, but not starting, the
allocator threads when we preallocate all of our other in memory data
structures.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 24 Feb 2021 02:41:25 +0000 (21:41 -0500)]
bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM
bch2_btree_node_get_noiter() isn't used from the btree iterator code,
which retries with the btree node cache cannibalize lock held on
-ENOMEM, so we should do it ourself if necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 10 Feb 2021 21:13:57 +0000 (16:13 -0500)]
bcachefs: Extents may now cross btree node boundaries
When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.
Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 12 Feb 2021 02:57:32 +0000 (21:57 -0500)]
bcachefs: iter->real_pos
We need to differentiate between the search position of a btree
iterator, vs. what it actually points at (what we found). This matters
for extents, where iter->pos will typically be the start of the key we
found and iter->real_pos will be the end of the key we found (which soon
won't necessarily be in the same btree node!) and it will also matter
for snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 10 Mar 2021 00:37:40 +0000 (19:37 -0500)]
bcachefs: Ensure btree iterators are traversed in bch2_trans_commit()
The upcoming patch to allow extents to span btree nodes will require
this... and this assertion seems to be popping, and it's not a very good
assertion anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bcachefs: Fix unnecessary read amplificaiton when allocating ec stripes
When allocating an erasure coding stripe, bcachefs will always reuse any
partial stripes before reserving a new stripe. This causes unnecessary
read amplification when preparing a stripe for writing. This patch changes
bcachefs to always reserve new stripes first, only relying on stripe reuse
when copygc needs more time to empty buckets from existing stripes.
Signed-off-by: Robbie Litchfield <blam.kiwi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The only reason we were keeping this around was for
BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances
to the next leaf node, it'll drop the lock on the node that we just
inserted to.
But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents
btree, just the inodes btree, and if we do need it for the extents btree
in the future we can do it more cleanly by cloning the iterator - this
lets us delete some special cases in the btree iterator code, which is
complicated enough as it is.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 7 Feb 2021 04:17:26 +0000 (23:17 -0500)]
bcachefs: Redo checks for sufficient devices
When the replicas mechanism was added, for tracking data by which drives
it's replicated on, the check for whether we have sufficient devices was
never updated to make use of it. This patch finally does that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 3 Feb 2021 20:31:17 +0000 (15:31 -0500)]
bcachefs: Run fsck if BCH_FEATURE_alloc_v2 isn't set
We're using BCH_FEATURE_alloc_v2 to also gate journalling updates to dev
usage - we don't have the code for reconstructing this from buckets
anymore, so we need to run fsck if it's not set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 3 Feb 2021 18:10:55 +0000 (13:10 -0500)]
bcachefs: Fixes/improvements for journal entry reservations
This fixes some arithmetic bugs in "bcachefs: Journal updates to dev
usage" - additionally, it cleans things up by switching everything that
goes in every journal entry to the journal_entry_res mechanism.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 21 Jan 2021 20:28:59 +0000 (15:28 -0500)]
bcachefs: Persist 64 bit io clocks
Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.
This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Jan 2021 23:01:07 +0000 (18:01 -0500)]
bcachefs: KEY_TYPE_alloc_v2
This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.
Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 2 Feb 2021 20:56:44 +0000 (15:56 -0500)]
bcachefs: Add missing call to bch2_replicas_entry_sort()
This fixes a bug introduced by "bcachefs: Improve diagnostics when
journal entries are missing" - devices in a replicas entry are supposed
to be sorted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Jan 2021 01:59:00 +0000 (20:59 -0500)]
bcachefs: Add (partial) support for fixing btree topology
When we walk the btrees during recovery, part of that is checking that
btree topology is correct: for every interior btree node, its child
nodes should exactly span the range the parent node covers.
Previously, we had checks for this, but not repair code. Now that we
have the ability to do btree updates during initial GC, this patch adds
that repair code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Jan 2021 01:15:46 +0000 (20:15 -0500)]
bcachefs: Add support for doing btree updates prior to journal replay
Some errors may need to be fixed in order for GC to successfully run -
walk and mark all metadata. But we can't start the allocators and do
normal btree updates until after GC has completed, and allocation
information is known to be consistent, so we need a different method of
doing btree updates.
Fortunately, we already have code for walking the btree while overlaying
keys from the journal to be replayed. This patch adds an update path
that adds keys to the list of keys to be replayed by journal replay, and
also fixes up iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Jan 2021 01:13:54 +0000 (20:13 -0500)]
bcachefs: Add BTREE_PTR_RANGE_UPDATED
This is so that when we discover btree topology issues, we can just
update the pointer to a btree node and signal btree read path that the
min/max keys in the node header should be updated from the node pointer.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 26 Jan 2021 21:04:38 +0000 (16:04 -0500)]
bcachefs: Refactor checking of btree topology
Still a lot of work to be done here: we can't yet repair btree topology
issues, but this patch refactors things so that we have better access to
what we need in the topology checks. Next up will be figuring out a way
to do btree updates during gc, before journal replay is done.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 26 Jan 2021 21:04:12 +0000 (16:04 -0500)]
bcachefs: Improve diagnostics when journal entries are missing
There's an outstanding bug with journal entries being missing in journal
replay. This patch adds code to print out where the journal entries were
physically located that were around the entry(ies) being missing, which
should make debugging easier.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Jan 2021 22:56:34 +0000 (17:56 -0500)]
bcachefs: Mark superblocks transactionally
More work towards getting rid of the in memory struct bucket: this path
adds code for marking superblock and journal buckets via the btree, and
uses it in the device add and journal resize paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Jan 2021 23:19:15 +0000 (18:19 -0500)]
bcachefs: Kill bch2_invalidate_bucket()
This patch is working towards eventually getting rid of the in memory
struct bucket, and relying only on the btree representation.
Since bch2_invalidate_bucket() was only used for incrementing gens, not
invalidating cached data, no other counters were being changed as a side
effect - meaning it's safe for the allocator code to increment the
bucket gen directly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Jan 2021 00:14:37 +0000 (19:14 -0500)]
bcachefs: Switch replicas.c allocations to GFP_KERNEL
We're transitioning to memalloc_nofs_save/restore instead of GFP flags
with the rest of the kernel, and GFP_NOIO was excessively strict and
causing unnnecessary allocation failures - these allocations are done
with btree locks dropped.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 21 Jan 2021 19:42:23 +0000 (14:42 -0500)]
bcachefs: Fix loopback in dio mode
We had a deadlock on page_lock, because buffered reads signal completion
by unlocking the page, but the dio read path normally dirties the pages
it's reading to with set_page_dirty_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 21 Jan 2021 00:42:09 +0000 (19:42 -0500)]
bcachefs: Clean up bch2_extent_can_insert
It was using an internal btree node iterator interface, when
bch2_btree_iter_peek_slot() sufficed. We were hitting a null ptr deref
that looked like it was from the iterator not being uptodate - this will
also fix that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 19 Jan 2021 01:20:24 +0000 (20:20 -0500)]
bcachefs: Don't allocate stripes at POS_MIN
In the future, stripe index 0 will be a sentinal value. This patch
doesn't disallow stripes at POS_MIN yet, leaving that for when we do the
on disk format changes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 19 Jan 2021 04:26:42 +0000 (23:26 -0500)]
bcachefs: Rework allocating buckets for stripes
Allocating buckets for existing stripes was busted, in part because the
data structures were too contorted. This reworks new stripes so that we
have an array of open buckets that matches blocks in the stripe, and
it's sparse if we're reusing an existing stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Jan 2021 21:16:37 +0000 (16:16 -0500)]
bcachefs: Fix gc updating stripes info
The primary stripes radix tree can be sparse, which was causing an
assertion to pop because the one use for gc isn't. Fix this by changing
the algorithm to copy between the two radix trees.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>