aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/ec.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* bcachefs: bch2_ec_read_extent() now takes btree_transKent Overstreet2023-11-051-7/+3
| | | | | | | We're not supposed to have more than one btree_trans at a time in a given thread - that causes recursive locking deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_stripe_to_text() now prints ptr gensKent Overstreet2023-11-051-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-011-16/+13
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add IO error counts to bch_memberKent Overstreet2023-11-011-1/+5
| | | | | | | | | We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-20/+14
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Break up io.cKent Overstreet2023-10-221-1/+2
| | | | | | | | | More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert more code to bch_err_msg()Kent Overstreet2023-10-221-2/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted fixes for clangKent Overstreet2023-10-221-48/+60
| | | | | | | clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert more -EROFS to private error codesKent Overstreet2023-10-221-3/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change check for invalid key typesKent Overstreet2023-10-221-1/+2
| | | | | | | | | | | As part of the forward compatibility patch series, we need to allow for new key types without complaining loudly when running an old version. This patch changes the flags parameter of bkey_invalid to an enum, and adds a new flag to indicate we're being called from the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted sparse fixesKent Overstreet2023-10-221-8/+10
| | | | | | | | | - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet2023-10-221-17/+17
| | | | | | This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New error message helpersKent Overstreet2023-10-221-2/+2
| | | | | | | | | | | | | Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ec: Fix a lost wakeupKent Overstreet2023-10-221-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: allocate_dropping_locks()Kent Overstreet2023-10-221-9/+2
| | | | | | | | | Add two new helpers for allocating memory with btree locks held: The idea is to first try the allocation with GFP_NOWAIT|__GFP_NOWARN, then if that fails - unlock, retry with GFP_KERNEL, and then call trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: GFP_NOIO -> GFP_NOFSKent Overstreet2023-10-221-1/+1
| | | | | | | | GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_bkey_get_iter() helpersKent Overstreet2023-10-221-9/+6
| | | | | | | | | | | | | | | | Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_ops.min_val_sizeKent Overstreet2023-10-221-6/+0
| | | | | | | | | | | | | This adds a new field to bkey_ops for the minimum size of the value, which standardizes that check and also enforces the new rule (previously done somewhat ad-hoc) that we can extend value types by adding new fields on to the end. To make that work we do _not_ initialize min_val_size with sizeof, instead we initialize it to the size of the first version of those values. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rip out code for storing backpointers in alloc keysKent Overstreet2023-10-221-10/+9
| | | | | | | | | | We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent()Kent Overstreet2023-10-221-1/+2
| | | | | | | | This adds a flags param to bch2_backpointer_get_key() so that we can pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the extent immediately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: use dedicated workqueue for tasks holding write refsBrian Foster2023-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A workqueue resource deadlock has been observed when running fsck on a filesystem with a full/stuck journal. fsck is not currently able to repair the fs due to fairly rapid emergency shutdown, but rather than exit gracefully the fsck process hangs during the shutdown sequence. Fortunately this is easily recoverable from userspace, but the root cause involves code shared between the kernel and userspace and so should be addressed. The deadlock scenario involves the main task in the bch2_fs_stop() -> bch2_fs_read_only() path waiting on write references to drain with the fs state lock held. A bch2_read_only_work() workqueue task is scheduled on the system_long_wq, blocked on the state lock. Finally, various other write ref holding workqueue tasks are scheduled to run on the same workqueue and must complete in order to release references that the initial task is waiting on. To avoid this problem, we can split the dependent workqueue tasks across different workqueues. It's a bit of a waste to create a dedicated wq for the read-only worker, but there are several tasks throughout the fs that follow the pattern of acquiring a write reference and then scheduling to the system wq. Use a local wq for such tasks to break the subtle dependency between these and the read-only worker. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New erasure coding shutdown pathKent Overstreet2023-10-221-5/+49
| | | | | | | | | | | | | | | | | | This implements a new shutdown path for erasure coding, which is needed for the upcoming BCH_WRITE_WAIT_FOR_EC write path. The process is: - Cancel new stripes being built up - Close out/cancel open buckets on write points or the partial list that are for stripes - Shutdown rebalance/copygc - Then wait for in flight new stripes to finish With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill up before they complete; the new ec shutdown path is needed for shutting down copygc/rebalance without deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Private error codes: ENOMEMKent Overstreet2023-10-221-7/+7
| | | | | | | This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix stripe create error pathKent Overstreet2023-10-221-6/+8
| | | | | | | If we errored out on a new stripe before fully allocating it, we shouldn't be zeroing out unwritten data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_new_stripes_to_text()Kent Overstreet2023-10-221-5/+9
| | | | | | Print out the alloc reserve, and format it a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Simplify stripe_idx_to_deleteKent Overstreet2023-10-221-5/+4
| | | | | | | | | This is not technically correct - it's subject to a race if we ever end up with a stripe with all empty blocks (that needs to be deleted) being held open. But the "correct" version was much too inefficient, and soon we'll be adding a stripes LRU. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Second layer of refcounting for new stripesKent Overstreet2023-10-221-11/+21
| | | | | | | | | | | | | This will be used for move writes, which will be waiting until the stripe is created to do the index update. They need to prevent the stripe from being reclaimed until their index update is done, so we need another refcount that just keeps the stripe open. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> # Conflicts: # fs/bcachefs/ec.c # fs/bcachefs/io.c
* bcachefs: ec: fall back to creating new stripes for copygcKent Overstreet2023-10-221-0/+8
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Extent helper improvementsKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | - __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available outside extents. - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and non const versions - bch2_extent_has_ptr() now returns the pointer it found Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rework open bucket partial list allocationKent Overstreet2023-10-221-4/+4
| | | | | | | | | | | | Now, any open_bucket can go on the partial list: allocating from the partial list has been moved to its own dedicated function, open_bucket_add_bucets() -> bucket_alloc_set_partial(). In particular, this means that erasure coded buckets can safely go on the partial list; the new location works with the "allocate an ec bucket first, then the rest" logic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix "btree node in stripe" errorKent Overstreet2023-10-221-0/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill bch2_ec_bucket_written()Kent Overstreet2023-10-221-17/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_new_stripes_to_text()Kent Overstreet2023-10-221-8/+10
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix stripe reuse pathKent Overstreet2023-10-221-18/+34
| | | | | | | | | | | | It's possible that we reuse a stripe that doesn't have quite the same configuration as the stripe_head we're allocating from. In that case, we have to make sure that the new stripe uses the settings from the stripe we resue, not the stripe head, and make sure the buffer is allocated correctly. This fixes the ec_mixed_tiers test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: RESERVE_stripeKent Overstreet2023-10-221-20/+53
| | | | | | | | | | | | | Rework stripe creation path - new algorithm for deciding when to create new stripes or reuse existing stripes. We add a new allocation watermark, RESERVE_stripe, above RESERVE_none. Then we always try to create a new stripe by doing RESERVE_stripe allocations; if this fails, we reuse an existing stripe and allocate buckets for it with the reserve watermark for the given write (RESERVE_none or RESERVE_movinggc). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: More stripe create cleanup/fixesKent Overstreet2023-10-221-15/+23
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Plumb alloc_reserve through stripe create pathKent Overstreet2023-10-221-23/+17
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ec: Improve error message for btree node in stripeKent Overstreet2023-10-221-1/+14
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ec: Ensure new stripe is closed in error pathKent Overstreet2023-10-221-2/+2
| | | | | | This fixes a use-after-free bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ec: zero_out_rest_of_ec_bucket()Kent Overstreet2023-10-221-3/+37
| | | | | | | Occasionally, we won't write to an entire bucket. This fixes the EC code to handle this case, zeroing out the rest of the bucket as needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_stripe_to_text()Kent Overstreet2023-10-221-6/+14
| | | | | | | We now print pointers as bucket:offset, the same as how we print extent pointers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: get_stripe_key_trans()Kent Overstreet2023-10-221-9/+13
| | | | | | Another nested btree_trans fix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix erasure coding shutdown pathKent Overstreet2023-10-221-1/+7
| | | | | | | | It's possible when shutting down to for a stripe head to have a new stripe that doesn't yet have any blocks allocated - we just need to free it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix buffer overrun in ec_stripe_update_extent()Kent Overstreet2023-10-221-21/+14
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Simplify ec stripes heapKent Overstreet2023-10-221-44/+13
| | | | | | | | Now that we have a separate data structure for tracking open stripes, the stripes heap can track all existing stripes, which is a nice simplification. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Erasure coding: Track open stripesKent Overstreet2023-10-221-73/+158
| | | | | | | | | | | This adds a new hash table for stripes being created or updated, instead of hackily relying on the stripes heap. This lets us reserve the slot for the new stripe up front, at the same time as we would pick an existing stripe - if we were updating an existing stripe - making the overall code more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Stripe deletion now checks what it's deletingKent Overstreet2023-10-221-16/+56
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve c->writes refcounting for stripe create pathKent Overstreet2023-10-221-21/+33
| | | | | | | This makes our handling of c->writes more consistent with other asynchronous work items. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Switch ec_stripes_heap_lock to a mutexKent Overstreet2023-10-221-17/+16
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix erasure coding lockingKent Overstreet2023-10-221-12/+7
| | | | | | | This adds a new helper, bch2_trans_mutex_lock(), for locking a mutex - dropping and retaking btree locks as needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>