asahi-linux.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	bcachefs: Convert for_each_btree_node() to lockrestart_do()	Kent Overstreet	2024-08-13	1	-29/+9
\| \| \| \| \| \| \| \| \| \|	for_each_btree_node() now works similarly to for_each_btree_key(), where the loop body is passed as an argument to be passed to lockrestart_do(). This now calls trans_begin() on every loop iteration - which fixes an SRCU warning in backpointers fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix loop restart in bch2_btree_transactions_read()	Kent Overstreet	2024-06-28	1	-6/+6
\| \| \| \| \| \|	Accidental infinite loop; also fix btree_deadlock_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix btree_trans list ordering	Kent Overstreet	2024-06-23	1	-4/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix race between trans_put() and btree_transactions_read()	Kent Overstreet	2024-06-23	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Make btree_deadlock_to_text() clearer	Kent Overstreet	2024-06-23	1	-23/+29
\| \| \| \| \| \| \| \| \| \|	btree_deadlock_to_text() searches the list of btree transactions to find a deadlock - when it finds one it's done; it's not like other *_read() functions that's printing each object. Factor out btree_deadlock_to_text() to make this clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: fix seqmutex_relock()	Kent Overstreet	2024-06-23	1	-6/+2
\| \| \| \| \| \| \| \| \|	We were grabbing the sequence number before unlock incremented it - fix this by moving the increment to seqmutex_lock() (so the seqmutex_relock() failure path skips the mutex_trylock()), and returning the sequence number from unlock(), to make the API simpler and safer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_dev_get_ioref() checks for device not present	Kent Overstreet	2024-05-09	1	-2/+2
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_dev_get_ioref2(); debug.c	Kent Overstreet	2024-05-09	1	-4/+4
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: member helper cleanups	Kent Overstreet	2024-05-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: iter/update/trigger/str_hash flag cleanup	Kent Overstreet	2024-05-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: prt_printf() now respects \r\n\t	Kent Overstreet	2024-05-08	1	-48/+16
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Move btree_updates to debugfs	Kent Overstreet	2024-04-04	1	-3/+42
\| \| \| \| \| \| \|	sysfs is limited to PAGE_SIZE, and when we're debugging strange deadlocks/priority inversions we need to see the full list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: create debugfs dir for each btree	Thomas Bertschinger	2024-04-03	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This creates a subdirectory for each individual btree under the btrees/ debugfs directory. Directory structure, before: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc ├── alloc-bfloat-failed ├── alloc-formats ├── backpointers ├── backpointers-bfloat-failed ├── backpointers-formats ... Directory structure, after: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc │ ├── bfloat-failed │ ├── formats │ └── keys ├── backpointers │ ├── bfloat-failed │ ├── formats │ └── keys ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve bch2_fatal_error()	Kent Overstreet	2024-03-18	1	-1/+1
\| \| \| \| \| \|	error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: kill kvpmalloc()	Kent Overstreet	2024-03-13	1	-3/+3
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Add gfp flags param to bch2_prt_task_backtrace()	Kent Overstreet	2024-01-22	1	-1/+1
\| \| \| \| \| \|	Fixes: e6a2566f7a00 ("bcachefs: Better journal tracepoints") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: smatch
*	bcachefs: Prep work for variable size btree node buffers	Kent Overstreet	2024-01-21	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve would_deadlock trace event	Kent Overstreet	2024-01-05	1	-1/+1
\| \| \| \| \| \|	We now include backtraces for every thread involved in the cycle. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: kill useless return ret	Kent Overstreet	2024-01-05	1	-3/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: track transaction durations	Kent Overstreet	2024-01-05	1	-11/+18
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS	Kent Overstreet	2024-01-01	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: btree_iter -> btree_path_idx_t	Kent Overstreet	2024-01-01	1	-1/+2
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: for_each_btree_key() now declares loop iter	Kent Overstreet	2024-01-01	1	-59/+32
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()	Kent Overstreet	2024-01-01	1	-6/+6
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix bch2_read_btree()	Kent Overstreet	2024-01-01	1	-2/+4
\| \| \| \| \| \| \| \|	In the debugfs code, we had an incorrect use of drop_locks_do(); on transaction restart we don't want to restart the current loop iteration, since we've already emitted the current key to the buffer for userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_btree_id_str()	Kent Overstreet	2023-10-31	1	-4/+4
\| \| \| \| \| \| \|	Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix copy_to_user() usage in flush_buf()	Kent Overstreet	2023-10-22	1	-8/+8
\| \| \| \| \| \| \| \|	copy_to_user() returns the number of bytes successfully copied - not an errcode. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Heap allocate btree_trans	Kent Overstreet	2023-10-22	1	-17/+17
\| \| \| \| \| \| \| \| \| \|	We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix W=12 build errors	Kent Overstreet	2023-10-22	1	-4/+2
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Break up io.c	Kent Overstreet	2023-10-22	1	-1/+0
\| \| \| \| \| \| \| \| \|	More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix more lockdep splats in debug.c	Kent Overstreet	2023-10-22	1	-17/+17
\| \| \| \| \| \| \|	Similar to previous fixes, we can't incur page faults while holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: seqmutex; fix a lockdep splat	Kent Overstreet	2023-10-22	1	-11/+35
\| \| \| \| \| \| \| \|	We can't be holding btree_trans_lock while copying to user space, which might incur a page fault. To fix this, convert it to a seqmutex so we can unlock/relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: GFP_NOIO -> GFP_NOFS	Kent Overstreet	2023-10-22	1	-2/+2
\| \| \| \| \| \| \| \|	GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_btree_node_ondisk_to_text()	Kent Overstreet	2023-10-22	1	-0/+119
\| \| \| \| \| \| \|	Pulling out a helper from cmd_list.c, as the rest is being rewritten in Rust but we're not ready to rewrite lower-level btree code in Rust. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Drop some anonymous structs, unions	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: New backtrace utility code	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: debug: Fix some locking bugs	Kent Overstreet	2023-10-22	1	-2/+2
\| \| \| \| \| \| \|	This fixes a few error paths in debug code that lead to locks failing to be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Plumb saw_error through to btree_err()	Kent Overstreet	2023-10-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The btree node read path has the ability to kick off an asynchronous btree node rewrite if we saw and corrected an error. Previously this was only used for errors that caused one of the replicas to be unusable - this patch plumbs it through to all error paths, so that normal fsck errors can be corrected. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: New bpos_cmp(), bkey_cmp() replacements	Kent Overstreet	2023-10-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Assorted checkpatch fixes	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Optimize bch2_trans_init()	Kent Overstreet	2023-10-22	1	-3/+3
\| \| \| \| \| \| \|	Now we store the transaction's fn idx in a local variable, instead of redoing the lookup every time we call bch2_trans_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Print cycle on unrecoverable deadlock	Kent Overstreet	2023-10-22	1	-21/+1
\| \| \| \| \| \| \| \| \| \| \|	Some lock operations can't fail; a cycle of nofail locks is impossible to recover from. So we want to get rid of these nofail locking operations, but as this is tricky it'll be done incrementally. If such a cycle happens, this patch prints out which codepaths are involved so we know what to work on next. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve btree_deadlock debugfs output	Kent Overstreet	2023-10-22	1	-5/+12
\| \| \| \| \| \| \| \|	This changes bch2_check_for_deadlock() to print the longest chains it finds - when we have a deadlock because the cycle detector isn't finding something, this will let us see what it's missing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Print deadlock cycle in debugfs	Kent Overstreet	2023-10-22	1	-0/+43
\| \| \| \| \| \| \| \| \| \|	In the event that we're not finished debugging the cycle detector, this adds a new file to debugfs that shows what the cycle detector finds, if anything. By comparing this with btree_transactions, which shows held locks for every btree_transaction, we'll be able to determine if it's the cycle detector that's buggy or something else. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Deadlock cycle detector	Kent Overstreet	2023-10-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Track maximum transaction memory	Kent Overstreet	2023-10-22	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats - makes it available in debugfs - switches bch2_trans_init() to using that for the amount of memory to preallocate, instead of the parameter passed in This drastically reduces transaction restarts, and means we no longer need to track this in the source code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Debugfs cleanup	Kent Overstreet	2023-10-22	1	-64/+51
\| \| \| \| \| \| \| \| \| \| \| \|	This improves flush_buf() so that it always returns nonzero when we're done reading and ready to return to userspace, and so that it returns the value we want to return to userspace (number of bytes read, if there wasn't an error). In the future we'll be better abstracting this mechanism and pulling it out of bcachefs, and using it to replace seq_file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Print last line in debugfs/btree_transaction_stats	Kent Overstreet	2023-10-22	1	-2/+5
\| \| \| \| \| \| \|	We need to turn the flush_buf() thing into a proper API, to replace seq_file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Track the maximum btree_paths ever allocated by each transaction	Kent Overstreet	2023-10-22	1	-5/+25
\| \| \| \| \| \| \| \| \| \| \| \|	We need a way to check if the machinery for handling btree_paths with in a transaction is behaving reasonably, as it often has not been - we've had bugs with transaction path overflows caused by duplicate paths and plenty of other things. This patch tracks, per transaction fn, the most btree paths ever allocated by that transaction and makes it available in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Rename lock_held_stats -> btree_transaction_stats	Kent Overstreet	2023-10-22	1	-7/+10
\| \| \| \| \| \| \|	Going to be adding more things to this in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>