aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/fs.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Don't delete open files in online fsckKent Overstreet2024-09-091-0/+8
| | | | | | | | | If a file is unlinked but still open, we don't want online fsck to delete it - or fun inconsistencies will happen. https://github.com/koverstreet/bcachefs/issues/727 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix incorrect i_state usageKent Overstreet2024-08-161-1/+1
| | | | | Reported-by: syzbot+95e40eae71609e40d851@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Switch to .get_inode_acl()Kent Overstreet2024-08-081-4/+4
| | | | | | | | | | | | | .set_acl() requires a dentry, and if one isn't passed it marks the VFS inode as not having an ACL. This has been causing inodes with ACLs to have them "disappear" on bcachefs filesystem, depending on which path those inodes get pulled into the cache from. Switching to .get_inode_acl(), like other local filesystems, fixes this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* Merge tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefsLinus Torvalds2024-07-181-71/+138
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bcachefs updates from Kent Overstreet: - Metadata version 1.8: Stripe sectors accounting, BCH_DATA_unstriped This splits out the accounting of dirty sectors and stripe sectors in alloc keys; this lets us see stripe buckets that still have unstriped data in them. This is needed for ensuring that erasure coding is working correctly, as well as completing stripe creation after a crash. - Metadata version 1.9: Disk accounting rewrite The previous disk accounting scheme relied heavily on percpu counters that were also sharded by outstanding journal buffer; it was fast but not extensible or scalable, and meant that all accounting counters were recorded in every journal entry. The new disk accounting scheme stores accounting as normal btree keys; updates are deltas until they are flushed by the btree write buffer. This means we have no practical limit on the number of counters, and a new tagged union format that's easy to extend. We now have counters for compression type/ratio, per-snapshot-id usage, per-btree-id usage, and pending rebalance work. - Self healing on read IO/checksum error Data is now automatically rewritten if we get a read error and then a successful retry - Mount API conversion (thanks to Thomas Bertschinger) - Better lockdep coverage Previously, btree node locks were tracked individually by lockdep, like any other lock. But we may take _many_ btree node locks simultaneously, we easily blow through the limit of 48 locks that lockdep can track, leading to lockdep turning itself off. Tracking each btree node lock individually isn't really necessary since we have our own cycle detector for deadlock avoidance and centralized tracking of btree node locks, so we now have a single lockdep_map in btree_trans for "any btree nodes are locked". - Some more small incremental work towards online check_allocations - Lots more debugging improvements - Fixes, including: - undefined behaviour fixes, originally noted as breaking userspace LTO builds - fix a spurious warning in fsck_err, reported by Marcin - fix an integer overflow on trans->nr_updates, also reported by Marcin; this broke during deletion of highly fragmented indirect extents * tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefs: (120 commits) lockdep: Add comments for lockdep_set_no{validate,track}_class() bcachefs: Fix integer overflow on trans->nr_updates bcachefs: silence silly kdoc warning bcachefs: Fix fsck warning about btree_trans not passed to fsck error bcachefs: Add an error message for insufficient rw journal devs bcachefs: varint: Avoid left-shift of a negative value bcachefs: darray: Don't pass NULL to memcpy() bcachefs: Kill bch2_assert_btree_nodes_not_locked() bcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTED bcachefs: __bch2_read(): call trans_begin() on every loop iter bcachefs: show none if label is not set bcachefs: drop packed, aligned from bkey_inode_buf bcachefs: btree node scan: fall back to comparing by journal seq bcachefs: Add lockdep support for btree node locks lockdep: lockdep_set_notrack_class() bcachefs: Improve copygc_wait_to_text() bcachefs: Convert clock code to u64s bcachefs: Improve startup message bcachefs: Self healing on read IO error bcachefs: Make read_only a mount option again, but hidden ...
| * bcachefs: Make read_only a mount option again, but hiddenKent Overstreet2024-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | fsck passes read_only as a mount option, and it's required for nochanges, which it also uses. Usually read_only is handled by the VFS, but we need to be able to handle it too; we just don't want to print it out twice, so mark it as a hidden option. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: support STATX_DIOALIGN for statx fileHongbo Li2024-07-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for STATX_DIOALIGN to bcachefs, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/bcachefs/test statx(/mnt/bcachefs/test) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/bcachefs/test statx(/mnt/bcachefs/test) = 0 dio mem align:1 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: kill key cache arg to bch2_assert_pos_locked()Kent Overstreet2024-07-141-3/+1
| | | | | | | | | | | | | | this is an internal implementation detail - and we're improving key cache coherency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Add tracepoints for bch2_sync_fs() and bch2_fsync()Youling Tang2024-07-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | Add trace_bch2_sync_fs() and trace_bch2_fsync() implementations. The output in trace is as follows: sync-29779 [000] ..... 193.700935: bch2_sync_fs: dev 254,16 wait 1 <...>-40027 [002] ..... 342.535227: bch2_fsync: dev 254,32 ino 4099 parent 4096 datasync 1 Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_fs_get_tree() cleanupKent Overstreet2024-07-141-30/+29
| | | | | | | | | | | | | | - improve error paths - call bch2_fs_start() separately, after applying late-parsed options Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Kill bch2_mount()Kent Overstreet2024-07-141-31/+15
| | | | | | | | | | | | Fold into bch2_fs_get_tree() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: use new mount APIThomas Bertschinger2024-07-141-22/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This updates bcachefs to use the new mount API: - Update the file_system_type to use the new init_fs_context() function. - Define the new fs_context_operations functions. - No longer register bch2_mount() and bch2_remount(); these are now called via the new fs_context functions. - Define a new helper type, bch2_opts_parse that includes a struct bch_opts and additionally a printbuf used to save options that can't be parsed until after the FS is opened. This enables us to parse as many options as possible prior to opening the filesystem while saving those options that need the open FS for later parsing. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: add printbuf arg to bch2_parse_mount_opts()Thomas Bertschinger2024-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mount options that take the name of a device that may be part of a filesystem, for example "metadata_target", cannot be validated until after the filesystem has been opened. However, an attempt to parse those options may be made prior to the filesystem being opened. This change adds a printbuf parameter to bch2_parse_mount_opts() which will be used to save those mount options, when they are supplied prior to the FS being opened, so that they can be parsed later. This functionality is not currently needed, but will be used after bcachefs starts using the new mount API to parse mount options. This is because using the new mount API, we will process mount options prior to opening the FS, but the new API doesn't provide a convenient way to "replay" mount option parsing. So we save these options ourselves to accomplish this. This change also splits out the code to parse a single option into bch2_parse_one_mount_opt(), which will be useful when using the new mount API which deals with a single mount option at a time. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: fix ei_update_lock lock orderingKent Overstreet2024-07-141-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | ei_update_lock is largely vestigal and will probably be removed, but we're not ready for that just yet. this fixes some lockdep splats with the new lockdep support for btree node locks; they're harmless, since we were taking ei_update_lock before actually locking any btree nodes, but "any btree nodes locked" are now tracked at the btree_trans level. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | Merge tag 'vfs-6.11.inode' of ↵Linus Torvalds2024-07-151-1/+0
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode / dentry updates from Christian Brauner: "This contains smaller performance improvements to inodes and dentries: inode: - Add rcu based inode lookup variants. They avoid one inode hash lock acquire in the common case thereby significantly reducing contention. We already support RCU-based operations but didn't take advantage of them during inode insertion. Callers of iget_locked() get the improvement without any code changes. Callers that need a custom callback can switch to iget5_locked_rcu() as e.g., did btrfs. With 20 threads each walking a dedicated 1000 dirs * 1000 files directory tree to stat(2) on a 32 core + 24GB ram vm: before: 3.54s user 892.30s system 1966% cpu 45.549 total after: 3.28s user 738.66s system 1955% cpu 37.932 total (-16.7%) Long-term we should pick up the effort to introduce more fine-grained locking and possibly improve on the currently used hash implementation. - Start zeroing i_state in inode_init_always() instead of doing it in individual filesystems. This allows us to remove an unneeded lock acquire in new_inode() and not burden individual filesystems with this. dcache: - Move d_lockref out of the area used by RCU lookup to avoid cacheline ping poing because the embedded name is sharing a cacheline with d_lockref. - Fix dentry size on 32bit with CONFIG_SMP=y so it does actually end up with 128 bytes in total" * tag 'vfs-6.11.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: fix dentry size vfs: move d_lockref out of the area used by RCU lookup bcachefs: remove now spurious i_state initialization xfs: remove now spurious i_state initialization in xfs_inode_alloc vfs: partially sanitize i_state zeroing on inode creation xfs: preserve i_state around inode_init_always in xfs_reinit_inode btrfs: use iget5_locked_rcu vfs: add rcu-based find_inode variants for iget ops
| * bcachefs: remove now spurious i_state initializationMateusz Guzik2024-06-131-1/+0
| | | | | | | | | | | | | | | | | | inode_init_always started setting the field to 0. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240611120626.513952-5-mjguzik@gmail.com Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"Kent Overstreet2024-07-111-2/+1
| | | | | | | | | | | | | | | | | | This reverts commit 86d81ec5f5f05846c7c6e48ffb964b24cba2e669. This wasn't tested with memcg enabled, it immediately hits a null ptr deref in list_lru_add(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Log mount failure error codeKent Overstreet2024-07-101-0/+2
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Mark bch_inode_info as SLAB_ACCOUNTYouling Tang2024-07-101-1/+2
| | | | | | | | | | | | | | | | | | After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg: account for certain kmem allocations to memcg") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Fix bch2_inode_insert() race path for tmpfilesKent Overstreet2024-07-101-0/+6
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Move the ei_flags setting to after initializationYouling Tang2024-06-211-5/+3
| | | | | | | | | | | | | | | | | | `inode->ei_flags` setting and cleaning should be done after initialization, otherwise the operation is invalid. Fixes: 9ca4853b98af ("bcachefs: Fix quota support for snapshots") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()Kent Overstreet2024-06-211-2/+10
| | | | | | | | | | | | | | | | | | | | discard_new_inode() is the correct interface for tearing down an indoe that was fully created but not made visible to other threads, but it expects I_NEW to be set, which we don't use. Reported-by: https://github.com/koverstreet/bcachefs/issues/690 Fixes: bcachefs: Fix race path in bch2_inode_insert() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: fix alignment of VMA for memory mapped files on THPYouling Tang2024-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs for read-only mmapped files, such as shared libraries. However, the kernel makes no attempt to actually align those mappings on 2MB boundaries, which makes it impossible to use those THPs most of the time. This issue applies to general file mapping THP as well as existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily fixed by using thp_get_unmapped_area for the unmapped_area function in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use. Similar to commit b0c582233a85 ("btrfs: fix alignment of VMA for memory mapped files on THP"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Add missing bch_inode_info.ei_flags initKent Overstreet2024-06-101-0/+2
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: set sb->s_shrinker->seeks = 0Kent Overstreet2024-06-101-0/+1
| | | | | | | | | | | | | | inodes and dentries are still present in the btree node cache, in much more compact form Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Don't return -EROFS from mount on inconsistency errorKent Overstreet2024-05-281-2/+10
|/ | | | | | | | We were accidentally returning -EROFS during recovery on filesystem inconsistency - since this is what the journal returns on emergency shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix race path in bch2_inode_insert()Kent Overstreet2024-05-221-2/+1
| | | | | | | | __destroy_new_inode() is appropriate when we have _just_allocated the inode, but not when it's been fully initialized and on i_sb_list. Reported-by: syzbot+a0ddc9873c280a4cb18f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO methodYouling Tang2024-05-201-1/+2
| | | | | | | | | | | | | Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Do that for bcachefs so that noop_direct_IO can eventually be removed. Similar to commit b29434999371 ("xfs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefsLinus Torvalds2024-05-191-49/+60
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ...
| * bcachefs: Change destroy_inode to free_inodeYouling Tang2024-05-081-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vfs[1] documentation describes free_inode as follows: ``` free_inode this method is called from RCU callback. If you use call_rcu() in ->destroy_inode to free ‘struct inode’ memory, then it’s better to release memory in this method. ``` free_inode will be called by the RCU callback, so it might be better to move the inode free operation to destroy_inode. Similar to commit ae6b47b5653e ("fs/ntfs3: Change destroy_inode to free_inode"). Link: [1]: https://www.kernel.org/doc/html/latest/filesystems/vfs.html Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: check inode backpointer in bch2_lookup()Kent Overstreet2024-05-081-6/+18
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Clean up inode allocKent Overstreet2024-05-081-22/+29
| | | | | | | | | | | | | | There's no need to be using new_inode(); we can skip all that indirection and make the code easier to follow. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()Kent Overstreet2024-05-081-0/+4
| | | | | | | | | | | | | | | | We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet2024-05-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_hash_lookup() now returns bkey_s_cKent Overstreet2024-05-081-7/+3
| | | | | | | | | | | | small cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | Merge tag 'vfs-6.10.misc' of ↵Linus Torvalds2024-05-131-0/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
| * statx: stx_subvolKent Overstreet2024-03-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new statx field for (sub)volume identifiers, as implemented by btrfs and bcachefs. This includes bcachefs support; we'll definitely want btrfs support as well. Link: https://lore.kernel.org/linux-fsdevel/2uvhm6gweyl7iyyp2xpfryvcu2g3padagaeqcbiavjyiis6prl@yjm725bizncq/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240308022914.196982-1-kent.overstreet@linux.dev Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | bcachefs: fix overflow in fiemapReed Riley2024-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | filefrag (and potentially other utilities that call fiemap) sometimes pass ULONG_MAX as the length. fiemap_prep clamps excessively large lengths - but the calculation of end can overflow if it occurs before calling fiemap_prep. When this happens, filefrag assumes it has read to the end and exits. Signed-off-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: Fix inode early destruction pathKent Overstreet2024-04-201-3/+6
| | | | | | | | | | | | | | | | | | | | discard_new_inode() is the wrong interface to use when we need to free an inode that was never inserted into the inode hash table; we can bypass the whole iput() -> evict() path and replace it with __destroy_inode(); kmem_cache_free() - this fixes a WARN_ON() about I_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | bcachefs: fix mount error pathKent Overstreet2024-03-311-0/+1
|/ | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_fatal_error()Kent Overstreet2024-03-181-1/+2
| | | | | | error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix the error code when mounting with incorrect options.Hongbo Li2024-03-131-1/+3
| | | | | | | | | | | | | | | | | | | | | When mount with incorrect options such as: "mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/". It rebacks the error "mount: /mnt/bcachefs: permission denied." cause bch2_parse_mount_opts returns -1 and bch2_mount throws it up. This is unreasonable. The real error message should be like this: "mount: /mnt/bcachefs: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error." Adding three private error codes for mounting error. Here are: - BCH_ERR_mount_option as the parent class for option error. - BCH_ERR_option_name represents the invalid option name. - BCH_ERR_option_value represents the invalid option value. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_lookup() gives better error message on inode not foundKent Overstreet2024-03-101-9/+64
| | | | | | | | | | | | When a dirent points to a missing inode, we really should print out the dirent. This requires quite a bit of refactoring, but there's some other benefits: we now do the entire looup (dirent and inode) in a single btree transaction, and copy to the VFS inode with btree locks still held, like the create path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_inode_insert()Kent Overstreet2024-03-101-62/+76
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Initialize super_block->s_uuidKent Overstreet2024-03-101-0/+1
| | | | | | Need to fix this oversight for the new FS_IOC_(GET|SET)UUID ioctls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Switch to uuid_to_fsid()Kent Overstreet2024-03-101-5/+1
| | | | | | | switch the statfs code from something horrible and open coded to the more standard uuid_to_fsid() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix missing bch2_err_class() callsKent Overstreet2024-02-101-4/+5
| | | | | | | | We aren't supposed to be leaking our private error codes outside of fs/bcachefs/. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds2024-01-101-67/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bcachefs updates from Kent Overstreet: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we now journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - New ioctls: - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) * tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits) bcachefs: eytzinger0_find() search should be const bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() bcachefs: fix simulateously upgrading & downgrading bcachefs: Restart recovery passes more reliably bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 bcachefs: improve checksum error messages bcachefs: improve validate_bset_keys() bcachefs: print sb magic when relevant bcachefs: __bch2_sb_field_to_text() bcachefs: %pg is banished bcachefs: Improve would_deadlock trace event bcachefs: fsck_err()s don't need to manually check c->sb.version anymore bcachefs: Upgrades now specify errors to fix, like downgrades bcachefs: no thread_with_file in userspace bcachefs: Don't autofix errors we can't fix bcachefs: add missing bch2_latency_acct() call bcachefs: increase max_active on io_complete_wq bcachefs: add time_stats for btree_node_read_done() bcachefs: don't clear accessed bit in btree node fill bcachefs: Add an option to control btree node prefetching ...
| * bcachefs: Fix nochanges/read_only interactionKent Overstreet2024-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Convert split_devs() to darrayKent Overstreet2024-01-011-48/+20
| | | | | | | | | | | | | | Bit of cleanup & modernization: also moving this code to util.c, it'll be used by userspace as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: for_each_member_device() now declares loop iterKent Overstreet2024-01-011-7/+4
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>