| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
ca->io_ref does not protect against the filesystem going way,
c->write_ref does. Much like
0b50b7313ef2 bcachefs: Fix refcounting in discard path
the other async paths need fixing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
| |
bch_dev->io_ref does not protect against the filesystem going away;
bch_fs->writes does.
Thus the filesystem write ref needs to be the last ref we release.
Reported-by: syzbot+9e0404b505e604f67e41@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
we allow new fields to be added to existing key types, and new versions
should treat them as being zeroed; this was not handled in
alloc_v4_validate.
Reported-by: syzbot+3b2968fa4953885dd66a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
Comparing the wrong bpos - this was missed because normally
bucket_gens_init() runs on brand new filesystems, but this bug caused it
to overwrite bucket_gens keys with 0s when upgrading ancient
filesystems.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
On testing on an old mangled filesystem, we missed a case.
Fixes: bd864bc2d907 ("bcachefs: Fix bch2_trigger_alloc when upgrading from old versions")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.
This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.
Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
bch2_trigger_alloc was assuming that the new key would always be newly
created and thus always an alloc_v4 key, but - not when called from
btree_gc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
This fixes an accounting mismatch for cached data.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
gc_lock is now only for synchronization between check_alloc_info and
interior btree updates - nothing else
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
fsck_err() now optionally takes a btree_trans; if the current thread has
one, it is required that it be passed.
The next patch will use this to unlock when waiting for user input.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
Needed for online fsck; we need the trigger to initialize newly
allocated buckets and generation number changes while gc is running.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
| |
Rewrite fsck/gc for the new accounting scheme.
This adds a second set of in-memory accounting counters for gc to use;
like with other parts of gc we run all trigger in TRIGGER_GC mode, then
compare what we calculated to existing in-memory accounting at the end.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Main part of the disk accounting rewrite.
This is a wholesale rewrite of the existing disk space accounting, which
relies on percepu counters that are sharded by journal buffer, and
rolled up and added to each journal write.
With the new scheme, every set of counters is a distinct key in the
accounting btree; this fixes scaling limitations of the old scheme,
where counters took up space in each journal entry and required multiple
percpu counters.
Now, in memory accounting requires a single set of percpu counters - not
multiple for each in flight journal buffer - and in the future we'll
probably also have counters that don't use in memory percpu counters,
they're not strictly required.
An accounting update is now a normal btree update, using the btree write
buffer path. At transaction commit time, we apply accounting updates to
the in memory counters, which are percpu counters indexed in an
eytzinger tree by the accounting key.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
| |
Add a separate counter to bch_alloc_v4 for amount of striped data; this
lets us separately track striped and unstriped data in a bucket, which
lets us see when erasure coding has failed to update extents with stripe
pointers, and also find buckets to continue updating if we crash mid way
through creating a new stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to make sure we're not missing any fragmenation entries in the
LRU BTREE after repairing ALLOC BTREE
Also, use the new bch2_btree_write_buffer_maybe_flush() helper; this was
only working without it before since bucket invalidation (usually)
wasn't happening while fsck was running.
Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no reason for discards to be single threaded across all devices;
this will improve performance on multi device setups.
Additionally, making them per-device simplifies the refcounting on
bch_dev->io_ref; we now hold it for the duration that the discard path
is running, which fixes a race between the discard path and device
removal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
Incorrect bucket state transition in the discard path; when incrementing
a bucket's generation number that had already been discarded, we were
forgetting to check if it should be need_gc_gens, not free.
This was caught by the .invalid checks in the transaction commit path,
causing us to go emergency read only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
We only have 48 bits for the LRU time field, which is insufficient to
prevent wraparound.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need
overflow checks and guards.
Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
We can't discard a bucket while it's still open; this needs the
bucket_is_open_safe() version, which takes the open_buckets lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
Turn more asserts into proper recoverable error paths.
Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
We're about to start using bch_validate_flags for superblock section
validation - it's no longer bkey specific.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
New helper for getting refs to devices as we iterate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
More elimination of bch2_dev_bkey_exists() usage.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
Eliminating bch2_dev_bkey_exists() uses and replacing them with proper
checks; this one was unnecessary since the caller already has it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This will be used in the next patch for adding some new debug mode
asserts.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When building with clang's -Wincompatible-function-pointer-types-strict
(a warning designed to catch potential kCFI failures at build time),
there are several warnings along the lines of:
fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags)' with an expression of type 'int (struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, unsigned int)' [-Werror,-Wincompatible-function-pointer-types-strict]
118 | BCH_BKEY_TYPES()
| ^~~~~~~~~~~~~~~~
fs/bcachefs/bcachefs_format.h:394:2: note: expanded from macro 'BCH_BKEY_TYPES'
394 | x(inode, 8) \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x'
117 | #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name,
| ^~~~~~~~~~~~~~~~~~~~
<scratch space>:277:1: note: expanded from here
277 | bch2_bkey_ops_inode
| ^~~~~~~~~~~~~~~~~~~
fs/bcachefs/inode.h:26:13: note: expanded from macro 'bch2_bkey_ops_inode'
26 | .trigger = bch2_trigger_inode, \
| ^~~~~~~~~~~~~~~~~~
There are several functions that did not have their flags parameter
converted to 'enum btree_iter_update_trigger_flags' in the recent
unification, which will cause kCFI failures at runtime because the
types, while ABI compatible (hence no warning from the non-strict
version of this warning), do not match exactly.
Fix up these functions (as well as a few other obvious functions that
should have it, even if there are no warnings currently) to resolve the
warnings and potential kCFI runtime failures.
Fixes: 31e4ef3280c8 ("bcachefs: iter/update/trigger/str_hash flag cleanup")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This is a nice cleanup - and we've also been having problems with
kthread creation in the mount path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
We're about to add new asserts for btree_trans locking consistency, and
part of that requires that aren't using the btree_trans while it's
unlocked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some renaming for better consistency
bch2_member_exists -> bch2_member_alive
bch2_dev_exists -> bch2_member_exists
bch2_dev_exsits2 -> bch2_dev_exists
bch_dev_locked -> bch2_dev_locked
bch_dev_bkey_exists -> bch2_dev_bkey_exists
new helper - bch2_dev_safe
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.
These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
| |
Reported-by: syzbot+10827fa6b176e1acf1d0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.
If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
error messages should always include __func__
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nested transaction restart handling is typically best avoided; when the
inner context handles a transaction restart it invalidates the outer
transaction context, so we need to make sure to return a
transaction_restart_nested error.
This code wasn't doing that, and hit the assertion in
for_each_btree_key() that checks for that via trans->restart_count.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.
Also - users no longer have to explicitly select fsck and fix_errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buckets usually can't be discarded until the transaction that made them
empty has been committed in the journal.
Tracing has indicated that we're queuing the discard worker excessively,
only for it to skip over many buckets that are still waiting on a
journal commit, discarding only one or two buckets per iteration.
We want to switch to only queuing the discard worker after a journal
flush write, but there's an important optimization we need to preserve:
if a bucket becomes empty and it was never committed in the journal
while it was in use, we want to discard it and reuse it right away -
since overwriting it before the previous writes are flushed from the
device cache eans those writes only cost bus bandwidth.
So, this patch implements a fast path for buckets that can be discarded
right away. We need new locking between the two discard workers; the new
list of buckets being discarded provides that locking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bch2_trigger_alloc() kicks off certain tasks on bucket state changes;
e.g. triggering the bucket discard worker and the invalidate worker.
We've observed the discard worker running too often - most runs it
doesn't do any work, according to the tracepoint - so clearly, we're
kicking it off too often.
This adds an explicit statechange() macro to make these checks more
precise.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|