aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_iter.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Convert for_each_btree_node() to lockrestart_do()Kent Overstreet2024-08-131-0/+1
| | | | | | | | | | for_each_btree_node() now works similarly to for_each_btree_key(), where the loop body is passed as an argument to be passed to lockrestart_do(). This now calls trans_begin() on every loop iteration - which fixes an SRCU warning in backpointers fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add missing path_traverse() to btree_iter_next_node()Kent Overstreet2024-08-071-0/+5
| | | | | | | This fixes a bug exposed by the next path - we pop an assert in path_set_should_be_locked(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix fsck warning about btree_trans not passed to fsck errorKent Overstreet2024-07-181-1/+2
| | | | | | | | | | | | If a btree_trans is in use it's supposed to be passed to fsck_err so that it can be unlocked if we're waiting on userspace input; but the btree IO paths do call fsck errors where a btree_trans exists on the stack but it's not passed through. But it's ok, because it's unlocked while doing IO. Fixes: a850bde6498b ("bcachefs: fsck_err() may now take a btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add lockdep support for btree node locksKent Overstreet2024-07-141-3/+21
| | | | | | | | | | | | | | | | | | | | This adds lockdep tracking for held btree locks with a single dep_map in btree_trans, i.e. tracking all held btree locks as one object. This is more practical and more useful than having lockdep track held btree locks individually, because - we can take more locks than lockdep can track (unbounded, now that we have dynamically resizable btree paths) - there's no lock ordering between btree locks for lockdep to track (we do cycle detection) - and this makes it easy to teach lockdep that btree locks are not safe to hold while invoking memory reclaim. The last rule is one that lockdep would never learn, because we only do trylock() from within shrinkers - but we very much do not want to be invoking memory reclaim while holding btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Simplify btree key cache fill pathKent Overstreet2024-07-141-5/+4
| | | | | | | Don't allocate the new bkey_cached until after we've done the btree lookup; this means we can kill bkey_cached.valid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill key cache arg to bch2_assert_pos_locked()Kent Overstreet2024-07-141-14/+13
| | | | | | | this is an internal implementation detail - and we're improving key cache coherency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Plumb more logging through stdio redirectKent Overstreet2024-07-141-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fsck_err() may now take a btree_transKent Overstreet2024-07-141-0/+14
| | | | | | | | | fsck_err() now optionally takes a btree_trans; if the current thread has one, it is required that it be passed. The next patch will use this to unlock when waiting for user input. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Disk space accounting rewriteKent Overstreet2024-07-141-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Set PF_MEMALLOC_NOFS when trans->lockedKent Overstreet2024-07-111-3/+4
| | | | | | proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix btree_trans list orderingKent Overstreet2024-06-231-7/+2
| | | | | | | | | | | | | | The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix race between trans_put() and btree_transactions_read()Kent Overstreet2024-06-231-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_trans_put()Kent Overstreet2024-06-191-3/+8
| | | | | | | | | | | | | reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add missing synchronize_srcu_expedited() call when shutting downKent Overstreet2024-06-101-1/+3
| | | | | | | | | | | We use the polling interface to srcu for tracking pending frees; when shutting down we don't need to wait for an srcu barrier to free them, but SRCU still gets confused if we shutdown with an outstanding grace period. Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Delete incorrect BTREE_ID_NR assertionKent Overstreet2024-06-101-6/+1
| | | | | | | for forwards compat we now explicitly allow mounting and using filesystems with unknown btrees, and we have to walk them for fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_assert_pos_locked()Kent Overstreet2024-05-201-0/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: x-macroize journal flags enumsKent Overstreet2024-05-081-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: uninline set_btree_iter_dontneed()Kent Overstreet2024-05-081-0/+13
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix btree_path_clone() ip_allocatedKent Overstreet2024-05-081-3/+7
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trans_verify_not_unlocked()Kent Overstreet2024-05-081-0/+16
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_path_can_relock()Kent Overstreet2024-05-081-4/+29
| | | | | | | | With the new assertions, we shouldn't be holding locks when trans->locked is false, thus, we shouldn't use relock when we just want to check if we can relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans->lockedKent Overstreet2024-05-081-2/+7
| | | | | | | Add a field for tracking whether a transaction object holds btree locks, and assertions to verify state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: maintain lock invariants in btree_iter_next_node()Kent Overstreet2024-05-081-0/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: get_unlocked_mut_path -> bch2_path_get_unlocked_mutKent Overstreet2024-05-081-0/+16
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet2024-05-081-83/+83
| | | | | | | | | | | Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: __BTREE_ITER_ALL_SNAPSHOTS -> BTREE_ITER_SNAPSHOT_FIELDKent Overstreet2024-05-081-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: When traversing to interior nodes, propagate result to paths to ↵Kent Overstreet2024-05-081-0/+11
| | | | | | same leaf node Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_path_to_text()Kent Overstreet2024-05-081-5/+47
| | | | | | | Long form version of bch2_btree_path_to_text() - useful in error messages and tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: prt_printf() now respects \r\n\tKent Overstreet2024-05-081-12/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improved topology repair checksKent Overstreet2024-03-311-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix trans->mem realloc in __bch2_trans_kmallocHongbo Li2024-03-311-3/+30
| | | | | | | | | | | | | | | | The old code doesn't consider the mem alloced from mempool when call krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX" is inaccurate when trans->mem was allocated by krealloc function. Instead, we use used_mempool stuff to record the situation, and realloc or free the trans->mem in elegant way. Also, after krealloc failed in __bch2_trans_kmalloc, the old data should be copied to the new buffer when alloc from mempool_alloc. Fixes: 31403dca5bb1 ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use kvzalloc() when dynamically allocating btree pathsKent Overstreet2024-03-131-2/+2
| | | | | | | | THis silences a mm/page_alloc.c warning about allocating more than a page with GFP_NOFAIL - and there's no reason for this to not have a vmalloc fallback anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()Kent Overstreet2024-03-131-0/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Save key_cache_path in peek_slot()Kent Overstreet2024-03-131-0/+1
| | | | | | | | | When bch2_btree_iter_peek_slot() clones the iterator to search for the next key, and then discovers that the key from the cloned iterator is the key we want to return - we also want to save the iter->key_cache_path as well, for the update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill more -EIO error codesKent Overstreet2024-03-131-1/+1
| | | | | | | | This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_and_journal_iter.transKent Overstreet2024-03-101-1/+1
| | | | | | | we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Set path->uptodate when no node at levelKent Overstreet2024-03-101-2/+2
| | | | | | | | We were failing to set path->uptodate when reaching the end of a btree node iterator, causing the new prefetch code for backpointers gc to go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix journal replay with unreadable btree rootsKent Overstreet2024-03-101-1/+3
| | | | | | | | | | | | When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btreeKent Overstreet2024-02-241-1/+3
| | | | | | | | | | | | | | | If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the keyspace where no keys are visible in the current snapshot, we have a problem - we'll scan for a very long time before scanning terminates. Awhile back, this was fixed for most cases with peek_upto() (and assertions that enforce that it's being used). But the fix missed the fact that the inodes btree is different - every key offset is in a different snapshot tree, not just the inode field. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve trace_trans_restart_relockKent Overstreet2024-01-211-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add an option to control btree node prefetchingKent Overstreet2024-01-051-2/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: track transaction durationsKent Overstreet2024-01-051-1/+8
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_trans always has statsKent Overstreet2024-01-051-10/+4
| | | | | | reserve slot 0 for unknown (when we overflow), to avoid some branches Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_trans_peek_slot_updatesKent Overstreet2024-01-011-31/+15
| | | | | | | refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_trans_peek_prev_updatesKent Overstreet2024-01-011-1/+20
| | | | | | bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_trans_peek_updatesKent Overstreet2024-01-011-9/+20
| | | | | | | refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: growable btree_pathsKent Overstreet2024-01-011-14/+63
| | | | | | | | | | | XXX: we're allocating memory with btree locks held - bad We need to plumb through an error path so we can do allocate_dropping_locks() - but we're merging this now because it fixes a transaction path overflow caused by indirect extent fragmentation, and the resize path is rare. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans->nr_pathsKent Overstreet2024-01-011-4/+5
| | | | | | | Start to plumb through dynamically growable btree_paths; this patch replaces most BTREE_ITER_MAX references with trans->nr_paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans->updates will also be resizableKent Overstreet2024-01-011-1/+2
| | | | | | | the reflink triggers are also bumping up against the maximum number of paths in a transaction - and generating proportional numbers of updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONSKent Overstreet2024-01-011-84/+82
| | | | | | | | | | | | | - Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>