aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/buckets.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* bcachefs: Improve dev_alloc_debug_to_text()Kent Overstreet2023-10-221-0/+2
| | | | | | Now we also print the number of buckets reserved for each watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_mark_key() now takes btree_id & levelKent Overstreet2023-10-221-6/+12
| | | | | | | btree & level are passed to trans_mark - for backpointers - bch2_mark_key() should take them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New on disk format: BackpointersKent Overstreet2023-10-221-0/+19
| | | | | | | | | | | | | | | | | | This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Btree write bufferKent Overstreet2023-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fixes for building in userspaceKent Overstreet2023-10-221-0/+4
| | | | | | | | | | | | | | - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move bkey bkey_unpack_key() to bkey.hKent Overstreet2023-10-221-2/+0
| | | | | | | | | | Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h because bkey.h didn't include btree_types.h, which it needs for the compiled unpack function. This patch finally moves it to the proper location. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Optimize bch2_dev_usage_read()Kent Overstreet2023-10-221-1/+9
| | | | | | | | | - add bch2_dev_usage_read_fast(), which doesn't return by value - bch_dev_usage is big enough that we don't want the silent memcpy - tweak the allocation path to only call bch2_dev_usage_read() once per bucket allocated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix __dev_available().Daniel Hill2023-10-221-6/+6
| | | | | | | | __dev_available() now calculates available buckets correctly. Previously it would almost always return 0 when we have cached data. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out dev_buckets_free()Kent Overstreet2023-10-221-0/+13
| | | | | | | | | | | | | Previously, dev_buckets_available() only counted buckets that are eligible to be allocated right now - i.e. buckets that don't have cached data, or need discard, or need gc gens, etc. But most users of this function want to know how many buckets are eligible to be allocated from without moving data around - copygc, allocator striping, which means we should be including cached data buckets etc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Plumb btree_id & level to trans_markKent Overstreet2023-10-221-32/+5
| | | | | | | For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fold bucket_state in to BCH_DATA_TYPES()Kent Overstreet2023-10-221-13/+10
| | | | | | | | | | | | | | | | | Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Convert .key_invalid methods to printbufsKent Overstreet2023-10-221-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: gc mark fn fixes, cleanupsKent Overstreet2023-10-221-3/+3
| | | | | | | | | mark_stripe_bucket() was busted; it was using @new unitialized. Also, clean up all the gc mark functions, and convert them to the same style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Kill struct bucket_markKent Overstreet2023-10-221-14/+10
| | | | | | | | This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill main in-memory bucket arrayKent Overstreet2023-10-221-16/+4
| | | | | | | All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_usage_update() no longer depends on bucket_markKent Overstreet2023-10-221-7/+0
| | | | | | | | | | | | This is one of the last steps in getting rid of the main in-memory bucket array. This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead of bucket_mark, and for the places where we are in fact working with bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that takes bucket_mark and converts to bkey_alloc_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill allocator threads & freelistsKent Overstreet2023-10-221-34/+28
| | | | | | | | | | | | | | | Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Freespace, need_discard btreesKent Overstreet2023-10-221-10/+0
| | | | | | | | | | | This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: KEY_TYPE_alloc_v4Kent Overstreet2023-10-221-0/+8
| | | | | | | | | | | | This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move trigger fns to bkey_opsKent Overstreet2023-10-221-0/+13
| | | | | | | | This replaces the switch statements in bch2_mark_key(), bch2_trans_mark_key() with new bkey methods - prep work for the next patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Consolidate trigger code a bitKent Overstreet2023-10-221-3/+0
| | | | | | | Upcoming patches are doing more work on the triggers code, this patch just moves code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_trans_mark_key() now takes a bkey_i *Kent Overstreet2023-10-221-1/+26
| | | | | | | We're now coming up with triggers that modify the update being done. A bkey_s_c is const - bkey_i is the correct type to be using here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_gc_gens() no longer uses bucket arrayKent Overstreet2023-10-221-6/+0
| | | | | | | | Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Ignore cached data when calculating fragmentationKent Overstreet2023-10-221-5/+0
| | | | | | | | | | | | | | | | Previously, bucket fragmentation was considered to be bucket size - total amount of live data, both dirty and cached. This meant that if a bucket was full but only a small amount of data in it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the dirty data out of the bucket and the allocator wouldn't be able to invalidate and drop the cached data. This changes fragmentation to exclude cached data, so that copygc will evacuate these buckets and copygc/the allocator will always be able to make forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: New data structure for buckets waiting on journal commitKent Overstreet2023-10-221-8/+0
| | | | | | | | | | | | | Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: New in-memory array for bucket gensKent Overstreet2023-10-221-1/+19
| | | | | | | | The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Separate out gc_bucket()Kent Overstreet2023-10-221-4/+14
| | | | | | | | Since the main in memory bucket array is going away, we don't want to be calling bucket() or __bucket() when what we want is the GC in-memory bucket. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Kill ptr_bucket_mark()Kent Overstreet2023-10-221-13/+7
| | | | | | Only used in one place, we can just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Push c->mark_lock usage down to where it is neededKent Overstreet2023-10-221-1/+1
| | | | | | | | | | This changes the bch2_mark_key() and related paths to take mark lock where it is needed, instead of taking it in the upper transaction commit path - by pushing down locking we'll be able to handle fsck errors locally instead of requiring a separate check in the btree_gc code for replicas being marked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Kill bch2_replicas_delta_list_marked()Kent Overstreet2023-10-221-1/+1
| | | | | | | | | This changes bch2_trans_fs_usage_apply() to handle failure (replicas entry missing) by reverting the changes it made - meaning we can make the main transaction commit path a bit slimmer, and perhaps also simplify some locking in upcoming patches. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Run insert triggers before overwrite triggersKent Overstreet2023-10-221-2/+0
| | | | | | | | | | | | | | | Currently, btree triggers are run in natural key order, which presents a problem for fallocate in INSERT_RANGE mode: since we're moving existing extents to higher offsets, the trigger for deleting the old extent runs before the trigger that adds the new extent, potentially leading to indirect extents being deleted that shouldn't be when the delete causes the refcount to hit 0. This changes the order we run triggers so that for a givin btree, we run all insert triggers before overwrite triggers, nicely sidestepping this issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Convert bch2_mark_key() to take a btree_trans *Kent Overstreet2023-10-221-1/+1
| | | | | | | | This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: btree_pathKent Overstreet2023-10-221-3/+3
| | | | | | | | | | | | | | | This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Extensive triggers cleanupsKent Overstreet2023-10-221-19/+4
| | | | | | | | | | | | | | - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Inline fastpath of bch2_disk_reservation_add()Kent Overstreet2023-10-221-5/+25
| | | | | | | The fastpath now doesn't even disable preemption - instead we use a (non locked) cmpxchg. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: statfs resports incorrect avail blocksDan Robertson2023-10-221-0/+7
| | | | | | | | | The current implementation of bch_statfs does not scale the number of available blocks provided in f_bavail by the reserve factor. This causes an allocation of a file of this size to fail. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: gc shouldn't care about owned_by_allocatorKent Overstreet2023-10-221-2/+1
| | | | | | | | | | The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_trans_mark_dev_sb()Kent Overstreet2023-10-221-5/+3
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill bch2_fs_usage_scratch_get()Kent Overstreet2023-10-221-9/+1
| | | | | | | | This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix livelock calling bch2_mark_bkey_replicas()Kent Overstreet2023-10-221-0/+2
| | | | | | | | | | The bug was that we were trying to find a replicas entry that wasn't sorted - but, we can also simplify the code by not using bch2_mark_bkey_replicas and instead ensuring the list of replicas entries exists directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* buckets.c fixups XXX squashKent Overstreet2023-10-221-2/+0
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix copygc thresholdKent Overstreet2023-10-221-10/+16
| | | | | | | | | | | | | | | | Awhile back the meaning of is_available_bucket() and thus also bch_dev_usage->buckets_unavailable changed to include buckets that are owned by the allocator - this was so that the stat could be persisted like other allocation information, and wouldn't have to be regenerated by walking each bucket at mount time. This broke copygc, which needs to consider buckets that are reclaimable and haven't yet been grabbed by the allocator thread and moved onta freelist. This patch fixes that by adding dev_buckets_reclaimable() for copygc and the allocator thread, and cleans up some of the callers a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Journal updates to dev usageKent Overstreet2023-10-221-2/+5
| | | | | | | | This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Persist 64 bit io clocksKent Overstreet2023-10-221-8/+1
| | | | | | | | | | | | | | | | Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Mark superblocks transactionallyKent Overstreet2023-10-221-0/+6
| | | | | | | | | More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill bch2_invalidate_bucket()Kent Overstreet2023-10-221-2/+0
| | | | | | | | | | | | | This patch is working towards eventually getting rid of the in memory struct bucket, and relying only on the btree representation. Since bch2_invalidate_bucket() was only used for incrementing gens, not invalidating cached data, no other counters were being changed as a side effect - meaning it's safe for the allocator code to increment the bucket gen directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Refactor dev usageKent Overstreet2023-10-221-10/+1
| | | | | | | This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix integer overflow in bch2_disk_reservation_get()Kent Overstreet2023-10-221-4/+3
| | | | | | | | | | The sectors argument shouldn't have been a u32 - it can be up to U32_MAX (i.e. fallocate creating persistent reservations), and if replication is enabled we'll overflow when we calculate the real number of sectors to reserve. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't write bucket IO time lazilyKent Overstreet2023-10-221-6/+0
| | | | | | | | | | | | With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Update transactional triggers interface to pass old & new keysKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | This is needed to fix a bug where we're overflowing iterators within a btree transaction, because we're updating the stripes btree (to update block counts) and the stripes btree trigger is unnecessarily updating the alloc btree - it doesn't need to update the alloc btree when the pointers within a stripe aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>