aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/bcachefs.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Add a time_stat for blocked on key cache flushKent Overstreet2024-08-131-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make allocator stuck timeout configurable, ratelimit messagesKent Overstreet2024-08-071-0/+2
| | | | | | | | | Limit these messages to once every 2 minutes to avoid spamming logs; with multiple devices the output can be quite significant. Also, up the default timeout to 30 seconds from 10 seconds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Reduce the scope of gc_lockKent Overstreet2024-07-141-2/+2
| | | | | | | gc_lock is now only for synchronization between check_alloc_info and interior btree updates - nothing else Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Refactor disk accounting data structuresKent Overstreet2024-07-141-1/+1
| | | | | | | | | | | | | Break up the percpu counter allocations into individual allocations for each disk accounting counter; this fixes an issue on large systems where we have too many replica entries to for the percpu allocator's max practical size. Also, use just one eytzinger tree for the normal set of counters and the gc counters; this simplifies accounting_gc_done() where we need the same set of counters to be present in both tables. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Plumb more logging through stdio redirectKent Overstreet2024-07-141-0/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert gc to new accountingKent Overstreet2024-07-141-3/+1
| | | | | | | | | | Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill replicas_journal_resKent Overstreet2024-07-141-2/+0
| | | | | | More dead code deletion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Delete journal-buf-sharded old style accountingKent Overstreet2024-07-141-2/+1
| | | | | | More deletion of dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill bch2_fs_usage_read()Kent Overstreet2024-07-141-4/+0
| | | | | | With bch2_ioctl_fs_usage(), this is now dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: dev_usage updated by new accountingKent Overstreet2024-07-141-2/+1
| | | | | | | | | | | | | Reading disk accounting now requires an eytzinger lookup (see: bch2_accounting_mem_read()), but the per-device counters are used frequently enough that we'd like to still be able to read them with just a percpu sum, as in the old code. This patch special cases the device counters; when we update in-memory accounting we also update the old style percpu counters if it's a deice counter update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Disk space accounting rewriteKent Overstreet2024-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree write buffer knows how to accumulate bch_accounting keysKent Overstreet2024-07-141-0/+1
| | | | | | | | | | | | Teach the btree write buffer how to accumulate accounting keys - instead of having the newer key overwrite the older key as we do with other updates, we need to add them together. Also, add a flag so that write buffer flush knows when journal replay is finished flushing accounting, and teach it to hold accounting keys until that flag is set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: metadata version bucket_stripe_sectorsKent Overstreet2024-07-141-0/+1
| | | | | | | | | | New on disk format version for bch_alloc->stripe_sectors and BCH_DATA_unstriped - accounting for unstriped data in stripe buckets. Upgrade/downgrade requires regenerating alloc info - but only if erasure coding is in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add capacity, reserved to fs_alloc_debug_to_text()Kent Overstreet2024-07-141-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Discard, invalidate workers are now per deviceKent Overstreet2024-06-251-5/+11
| | | | | | | | | | | | There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Guard against overflowing LRU_TIME_BITSKent Overstreet2024-06-191-0/+5
| | | | | | | | LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out btree_write_submit_wqKent Overstreet2024-06-101-1/+2
| | | | | | | | | | Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_gc can now handle unknown btreesKent Overstreet2024-05-281-43/+1
| | | | | | | Compatibility fix - we no longer have a separate table for which order gc walks btrees in, and special case the stripes btree directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add no_invalid_checks flagThomas Bertschinger2024-05-091-1/+2
| | | | | | | | | | Setting this flag on a filesystem results in validity checks being skipped when writing bkeys. This flag will be used by tooling that deliberately injects corruption into a filesystem in order to exercise fsck. It shouldn't be set outside of testing/debugging code. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Allocator prefers not to expand mi.btree_allocated bitmapKent Overstreet2024-05-081-1/+1
| | | | | | | | | | | We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ptr_stale() -> dev_ptr_stale()Kent Overstreet2024-05-081-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: move replica_set from bch_dev to bch_fsKent Overstreet2024-05-081-3/+1
| | | | | | | | This is needed for the next patch - the write submit path has to be able to allocate a replica bio even when we weren't able to get a ref on the device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Debug asserts for ca->refKent Overstreet2024-05-081-0/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill gc_init_recurse()Kent Overstreet2024-05-081-1/+1
| | | | | | | | | | | | This unifies the online and offline btree gc passes; we're not yet running it online. We now iterate over one level of the btree at a time - the same as check_extents_to_backpointers(); this ordering preserves order of keys regardless of btree splits and merges, which will be important when we re-enable online gc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill gc looping for bucket gensKent Overstreet2024-05-081-1/+0
| | | | | | | | looping when we change a bucket gen is not ideal - it means we risk failing if we'd go into an infinite loop, and it's better to make forward progress even if fsck doesn't fix everything. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: journal seq blacklist gc no longer has to walk btreeKent Overstreet2024-05-081-1/+0
| | | | | | | | | | | Since btree_ptr_v2, we no longer require the journal seq blacklist table for skipping blacklisted bsets (btree node entries); the pointer to a given node indicates how much data is present. Therefore there's no longer any need for journal seq blacklist gc to walk the btree - we can prune entries older than journal last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move gc of bucket.oldest_gen to workqueueKent Overstreet2024-05-081-3/+2
| | | | | | | This is a nice cleanup - and we've also been having problems with kthread creation in the mount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: member helper cleanupsKent Overstreet2024-05-081-5/+0
| | | | | | | | | | | | | | Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bucket_valid()Kent Overstreet2024-05-081-0/+1
| | | | | | cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add btree_node_merging_disabled debug paramKent Overstreet2024-05-081-0/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix missing write refs in fs fio pathsKent Overstreet2024-04-131-0/+2
| | | | | | bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Reconstruct missing snapshot nodesKent Overstreet2024-04-031-0/+1
| | | | | | | | When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flag btrees with missing dataKent Overstreet2024-04-031-0/+1
| | | | | | | We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Repair pass for scanning for btree nodesKent Overstreet2024-04-031-0/+3
| | | | | | | | | | | | | | | If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out recovery_passes.cKent Overstreet2024-03-311-1/+1
| | | | | | | | | | | We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move snapshot table size to struct snapshot_tableKent Overstreet2024-03-311-1/+0
| | | | | | | | We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out btree_node_rewrite_workerKent Overstreet2024-03-171-0/+2
| | | | | | | | This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: time_stats: split stats-with-quantiles into a separate structureDarrick J. Wong2024-03-131-1/+1
| | | | | | | | | | | | Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: pull out time_stats.[ch]Kent Overstreet2024-03-131-2/+1
| | | | | | prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename journal_keys.d -> journal_keys.dataKent Overstreet2024-03-131-3/+3
| | | | | | This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out discard fastpathKent Overstreet2024-03-131-1/+5
| | | | | | | | | | | | | | | | | | | | | | Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_print_opts()Kent Overstreet2024-03-131-0/+3
| | | | | | | Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_ID_subvolume_childrenKent Overstreet2024-03-131-0/+1
| | | | | | | | | | | | | Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Clamp replicas_required to replicasKent Overstreet2024-02-131-0/+12
| | | | | | | This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-211-5/+0
| | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add time_stats for btree_node_read_done()Kent Overstreet2024-01-051-0/+1
| | | | | | | Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Online fsck can now fix errorsKent Overstreet2024-01-051-4/+1
| | | | | | | | BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: factor out thread_with_file, thread_with_stdioKent Overstreet2024-01-051-8/+12
| | | | | | | thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: track transaction durationsKent Overstreet2024-01-051-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make sure allocation failure errors are loggedKent Overstreet2024-01-011-0/+6
| | | | | | | | | | | The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>