Kent Overstreet [Sat, 2 Nov 2019 01:16:51 +0000 (21:16 -0400)]
bcachefs: Avoid atomics in write fast path
This adds some horrible hacks, but the atomic ops for closures were
getting to be a pretty expensive part of the write path. We don't want
to rip out closures entirely from the write path, because they're used
for e.g. waiting on the allocator, or waiting on the journal flush, and
that stuff would get really ugly without closures.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Justin Husted [Sat, 12 Oct 2019 00:05:11 +0000 (17:05 -0700)]
bcachefs: Further padding fixes in bch2_journal_super_entries_add_common()
The previous patch 128cb1a to fix uninitialized data was incorrect and
did not initialize the padding space correctly. Furthermore, several
other cases in this function do not initialize their padding space
correctly.
Move initialization into some helper functions in a more robust way.
Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Justin Husted [Sat, 12 Oct 2019 00:20:30 +0000 (17:20 -0700)]
bcachefs: Initialize padding space after alloc bkey
Packed bkeys are padded up to 64 bit alignment, but the alloc bkey type
was not clearing the pad bytes after the last data byte. This left the
key possibly containing some random garbage at the end.
This problem was found using valgrind.
This patch also changes a path with the inode bkey to clear in the same
way.
Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 25 Oct 2019 22:54:58 +0000 (18:54 -0400)]
bcachefs: Fix an error path race
On IO error, bch2_writepages_io_done() will set the page state to
indicate nothing's already reserved (since the write didn't happen, we
don't know what's already reserved). This can race with the buffered IO
path, in between getting a disk reservation and calling
bch2_set_page_dirty().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2019 04:22:03 +0000 (00:22 -0400)]
bcachefs: Limit bios in writepages path to 256M
This works around a bug where bio_full() doesn't check for
bio->bi_iter.bi_size overflowing - and, we don't really want to build
bios that are that big anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 11 Oct 2019 18:45:22 +0000 (14:45 -0400)]
bcachefs: Fix a subtle race in the btree split path
We have to free the old (in memory) btree node _before_ unlocking the
new nodes - else, some other thread with a read lock on the old node
could see stale data after another thread has already updated the new
node.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 9 Oct 2019 16:11:00 +0000 (12:11 -0400)]
bcachefs: Split out bchfs_extent_update()
The next few patches are going to be more moving the logic around
i_size/i_sectors updates to io.c, and better separating the Linux VFS
specific code from core bcachefs code, to better support the fuse port.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 9 Oct 2019 13:44:36 +0000 (09:44 -0400)]
bcachefs: Check if extending inode differently
In bch2_extent_update(), we have to update the inode if i_size is
changing (the file is being extend) or if i_sectors is changing, but we
want to avoid touching the inode if it's not necessary.
Change sum_sector_overwrites() to also check if there's already data
above where we're writing to - this means we're definitely not extending
the file.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 9 Oct 2019 13:19:06 +0000 (09:19 -0400)]
bcachefs: Add a lock to bch_page_state
We can't use the page lock to protect it, because on writeback IO error
we need to access the page state before calling end_page_writeback() and
the page lock semantics are completely insane so that deadlocks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 7 Oct 2019 19:57:47 +0000 (15:57 -0400)]
bcachefs: Fix erasure coding disk space accounting
Disk space accounting for erasure coding + compression was completely
broken - we need to calculate the parity sectors delta the same way we
calculate disk_sectors, by calculating the old and new usage and
subtracting to get the difference.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 9 Oct 2019 02:56:33 +0000 (22:56 -0400)]
bcachefs: Fix ec_stripes_read()
The bkey_s_c returned by btree_iter_(peek|next) points into the btree
iter type, so advancing the iterator and then using the one previously
returned is a bug...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Justin Husted [Wed, 9 Oct 2019 02:17:06 +0000 (19:17 -0700)]
bcachefs: Initialize journal pad data in bch_replica_entry objects.
Running the filesystem under valgrind exposed some garbage data being
written to disk in bch2_journal_super_entries_add_common(), in the
portion which encodes bch_replica_entry objects.
Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Justin Husted [Wed, 9 Oct 2019 02:16:28 +0000 (19:16 -0700)]
bcachefs: Fix uninitialized data in bch2_gc_btree()
Running the filesystem under valgrind exposed a path where the max_stale
variable in bch2_gc_btree() might not be initialized before use in a
rare case when there are no btree nodes in a transaction.
Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 7 Oct 2019 19:09:30 +0000 (15:09 -0400)]
bcachefs: Fix incorrect use of bch2_extent_atomic_end()
bch2_extent_atomic_end counts the number of iterators requried for
marking overwrites - but journal replay never marks overwrites, so that
part was incorrect. And counting iterators for the key being inserted
should be unnecessary because we did that prior to the key being
inserted before it was first journalled.
This should fix an iterator overflow bug - the iterators for walking
overwrites were totally unneeded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 4 Oct 2019 21:07:20 +0000 (17:07 -0400)]
bcachefs: bch2_extent_atomic_end() now traverses iter
This fixes a bug in io.c bch2_write_index_default() - it was missing the
traverse call, but bch2_extent_atomic_end returns an error now and can
just call it itself.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 2 Oct 2019 22:35:36 +0000 (18:35 -0400)]
bcachefs: Factor out fs-common.c
This refactoring makes the code easier to understand by separating the
bcachefs btree transactional code from the linux VFS code - but more
importantly, it's also to share code with the fuse port.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 4 Oct 2019 19:58:43 +0000 (15:58 -0400)]
bcachefs: Don't use sha256 for siphash str hash key
With the refactoring that's coming to add fuse support, we want
bch2_hash_info_init() to be cheaper so we don't have to rely on anything
cached besides the inode in the btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 27 Sep 2019 02:21:39 +0000 (22:21 -0400)]
bcachefs: Rework btree iterator lifetimes
The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.
This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Sep 2019 21:48:25 +0000 (17:48 -0400)]
bcachefs: Count iterators for reflink_p overwrites correctly
In order to avoid trying to allocate too many btree iterators,
bch2_extent_atomic_end() needs to count how many iterators are going to
be needed for insertions and overwrites - but we weren't counting the
iterators for deleting a reflink_v when the refcount goes to 0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Sep 2019 19:02:05 +0000 (15:02 -0400)]
bcachefs: Handle bio_iov_iter_get_pages() returning unaligned bio
If the user buffer isn't aligned to the filesystem block size, on a
large enough IO - where it won't fit into a single bio -
bio_iov_iter_get_pages() won't necessarily return a bio with the proper
alignment.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 20 Sep 2019 18:28:35 +0000 (14:28 -0400)]
bcachefs: Fix validation of replicas entries
When an extent is erasure coded, we need to record a replicas entry to
indicate that data is present on the devices that extent has pointers to
- but nr_required should be 0, because it's erasure coded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 7 Sep 2019 21:17:21 +0000 (17:17 -0400)]
bcachefs: bch2_btree_iter_peek_prev()
Last of the basic operations for iterating forwards and backwards over
the btree: we now have
- peek(), returns key >= iter->pos
- next(), returns key > iter->pos
- peek_prev(), returns key <= iter->pos
- prev(), returns key < iter->pos
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With multiple iterators, if another iterator points to the key being
modified, we need to call bch2_btree_node_iter_fix() to re-unpack the
key into the iter->k
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 22 Jul 2019 17:37:02 +0000 (13:37 -0400)]
bcachefs: Improved bch2_fcollapse()
Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Being more rigorous about noting when the key the iterator currently
poins to has changed - which should also give us a nice performance
improvement due to not having to check if we have to skip other bsets
backwards as much.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>