aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/recovery.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* bcachefs: Upgrades now specify errors to fix, like downgradesKent Overstreet2024-01-051-9/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Online fsck can now fix errorsKent Overstreet2024-01-051-2/+5
| | | | | | | | BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Upgrading uses bch_sb.recovery_passes_requiredKent Overstreet2024-01-051-8/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix nochanges/read_only interactionKent Overstreet2024-01-051-1/+3
| | | | | | | | | | | | | nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: vstruct_for_each() now declares loop iterKent Overstreet2024-01-011-6/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_member_device() now declares loop iterKent Overstreet2024-01-011-6/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: darray_for_each() now declares loop iterKent Overstreet2024-01-011-1/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_err_(fn|msg) check if should printKent Overstreet2024-01-011-15/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix snapshot.c assertion for online fsckKent Overstreet2024-01-011-0/+3
| | | | | | | c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill for_each_btree_key()Kent Overstreet2024-01-011-1/+2
| | | | | | | | | | | | | for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_run_online_recovery_passes()Kent Overstreet2024-01-011-19/+38
| | | | | | | | Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add ability to redirect log outputKent Overstreet2024-01-011-3/+3
| | | | | | | | | | | | | | | | | | | | Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Explicity go RW for fsckKent Overstreet2024-01-011-1/+1
| | | | | | | This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: convert bch_fs_flags to x-macroKent Overstreet2024-01-011-17/+17
| | | | | | | Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill dev_usage->buckets_ecKent Overstreet2024-01-011-2/+0
| | | | | | | This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't flush journal after replayKent Overstreet2024-01-011-3/+1
| | | | | | | | | The flush_all_pins() after journal replay was unecessary, and trying to completely flush the journal while RW is not a great idea - it's not guaranteed to terminate if other threads keep adding things to the jorunal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-6/+6
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make journal replay more efficientKent Overstreet2024-01-011-28/+62
| | | | | | | | | | | Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Go rw before journal replayKent Overstreet2024-01-011-4/+12
| | | | | | | | | | | This gets us slightly nicer log messages. Also, this slightly clarifies synchronization of c->journal_keys; after we go RW it's in use by multiple threads (so that the btree iterator code can overlay keys from the journal); so it has to be prepped before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"Kent Overstreet2024-01-011-0/+2
| | | | | | | | | | | This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Print old version when scanning for old metadataKent Overstreet2024-01-011-2/+6
| | | | | | Also: we should be using bch2_fs_read_write_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flush fsck errors before running twiceKent Overstreet2024-01-011-0/+2
| | | | | | | | It's confusing if we run fsck a second time (in debug mode, to verify the second run is clean), but errors are still ratelimited from the first run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_sb_field_downgradeKent Overstreet2024-01-011-1/+23
| | | | | | | | | | | | | | | | | | Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_sb.recovery_passes_requiredKent Overstreet2024-01-011-15/+60
| | | | | | | | | | | | | | | | Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add persistent identifiers for recovery passesKent Overstreet2024-01-011-2/+32
| | | | | | | | | The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix setting version_upgrade_completeKent Overstreet2024-01-011-2/+2
| | | | | | | If a superblock write hasn't happened (i.e. we never had to go rw), then c->sb.version will be out of date w.r.t. c->disk_sb.sb->version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix uninitialized var in bch2_journal_replay()Kent Overstreet2023-12-101-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Proper refcounting for journal_keysKent Overstreet2023-11-241-4/+7
| | | | | | | | | The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entryKent Overstreet2023-11-041-0/+7
| | | | | | | | | | | When we didn't find anything in the journal that we'd like to use, and we're forced to use whatever we can find - that entry will have been a JSET_NO_FLUSH entry with a garbage last_seq value, since it's not normally used. Initialize it to something sane, for bch2_fs_journal_start(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix build errors with gcc 10Kent Overstreet2023-11-041-1/+1
| | | | | | | | | | | | gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-011-2/+8
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: rebalance_workKent Overstreet2023-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure devices are always correctly initializedKent Overstreet2023-10-311-15/+9
| | | | | | | | | | | We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_str()Kent Overstreet2023-10-311-3/+3
| | | | | | | Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarilyKent Overstreet2023-10-311-1/+1
| | | | | | | | | | | | | | | | | | Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix drop_alloc_keys()Kent Overstreet2023-10-221-15/+15
| | | | | | | For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix another smatch complaintKent Overstreet2023-10-221-1/+1
| | | | | | This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make btree root read errors recoverableKent Overstreet2023-10-221-5/+4
| | | | | | | The entire btree will be lost, but that is better than the entire filesystem not being recoverable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-3/+3
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix W=12 build errorsKent Overstreet2023-10-221-12/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a handful of spelling mistakes in various messagesColin Ian King2023-10-221-1/+1
| | | | | | | There are several spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_ID_logged_opsKent Overstreet2023-10-221-0/+1
| | | | | | | | | | | Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out snapshot.cKent Overstreet2023-10-221-0/+1
| | | | | | | subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix check_version_upgrade()Kent Overstreet2023-10-221-1/+1
| | | | | | We were failing to upgrade to the latest compatible version - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_journal_iter.cKent Overstreet2023-10-221-519/+2
| | | | | | | | | Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: sb-clean.cKent Overstreet2023-10-221-136/+9
| | | | | | Pull code for bch_sb_field_clean out into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix assorted checkpatch nitsKent Overstreet2023-10-221-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredKent Overstreet2023-10-221-7/+0
| | | | | | | | | | | | | | | | Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure topology repair runsKent Overstreet2023-10-221-0/+2
| | | | | | | | This fixes should_restart_for_topology_repair() - previously it was returning false if the btree io path had already seleceted topology repair to run, even if it hadn't run yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Log a message when running an explicit recovery passKent Overstreet2023-10-221-10/+10
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>