aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/inode.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* bcachefs: bch2_bkey_get_iter() helpersKent Overstreet2023-10-221-10/+7
| | | | | | | | | | | | | | | | Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_ops.min_val_sizeKent Overstreet2023-10-221-24/+0
| | | | | | | | | | | | | This adds a new field to bkey_ops for the minimum size of the value, which standardizes that check and also enforces the new rule (previously done somewhat ad-hoc) that we can extend value types by adding new fields on to the end. To make that work we do _not_ initialize min_val_size with sizeof, instead we initialize it to the size of the first version of those values. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_evict_subvolume_inodes()Kent Overstreet2023-10-221-3/+0
| | | | | | | | This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache() doesn't handle the case where i_count is already 0, we need to grab and put the inode in order for it to be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change bkey_invalid() rw param to flagsKent Overstreet2023-10-221-4/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Nocow supportKent Overstreet2023-10-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3Kent Overstreet2023-10-221-23/+140
| | | | | | | | | | | | | | Move bi_size and bi_sectors into the non-varint portion of the inode, so that the write path can update them without going through the relatively expensive unpack/pack operations. Other changes: - Add a field for the offset of the varint section, so we can add new non-varint fields without needing a new inode type, like alloc_v3 - Move bi_mode into the flags field, so that the varint section can be u64 aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_inode_opts_get()Kent Overstreet2023-10-221-0/+8
| | | | | | | | | | | | | This improves io_opts() and makes it a non-inline function - it's big enough that it probably shouldn't be. Also, bch_io_opts no longer needs fields for whether options are defined, so we can slim it down a bit. We'd like to stop passing around the full bch_io_opts, but that'll be tricky because of bch2_rebalance_add_key(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix compat path for old inode formatsKent Overstreet2023-10-221-0/+2
| | | | | | | | Old inode formats don't have all the fields of the current inode format: when unpacking inodes in the current format we can thus skip zeroing out the destination buffer, but that doesn't work on for the old formats. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill bch2_extent_trim_atomic() usageKent Overstreet2023-10-221-11/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: More errcode cleanupKent Overstreet2023-10-221-13/+13
| | | | | | | | We shouldn't be overloading standard error codes now that we have provisions for bcachefs-specific errorcodes: this patch converts super.c and super-io.c to per error site errcodes, with a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Key cache now works for snapshots btreesKent Overstreet2023-10-221-2/+2
| | | | | | | | | | | | | This switches btree_key_cache_fill() to use a btree iterator, not a btree path, so that it can search for keys in previous snapshots. We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid recursion back into the key cache. This will allow us to re-enable the key cache for inodes in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | | | | This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_inode_opts_to_opts()Kent Overstreet2023-10-221-0/+11
| | | | | | | | | | | | It turns out the *_defined entries of bch_io_opts are only used in one place - in the xattr get path - and there we immediately convert to a bch_opts struct, which also has the *_defined entries. This patch changes bch2_inode_opts_to_opts() to go directly from bch_inode_unpacked to bch_opts, which is a minor simplification and will also let us slim down struct bch_io_opts in another patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a transaction path overflowKent Overstreet2023-10-221-1/+8
| | | | | | | | | | It turns out we need bch2_extent_trim_atomi() even when we're deleting extents one at a time because it's possible for one reflink_p to reference arbitrarily many reflink_v extents. This doesn't normally happen, but the data move path can fragment existing extents in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted checkpatch fixesKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | | | | checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Inline bch2_inode_pack()Kent Overstreet2023-10-221-4/+11
| | | | | | It's mainly used from bch2_inode_write(), so inline it there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add private error codes for ENOSPCKent Overstreet2023-10-221-1/+1
| | | | | | | | | | Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: EINTR -> BCH_ERR_transaction_restartKent Overstreet2023-10-221-2/+2
| | | | | | | | | Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Don't BUG_ON() inode link count underflowKent Overstreet2023-10-221-0/+33
| | | | | | | This switches that assertion to a bch2_trans_inconsistent() call, as it should be. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Printbuf reworkKent Overstreet2023-10-221-18/+18
| | | | | | | This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add rw to .key_invalid()Kent Overstreet2023-10-221-3/+3
| | | | | | | | | This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Convert .key_invalid methods to printbufsKent Overstreet2023-10-221-56/+74
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_btree_iter_peek_upto()Kent Overstreet2023-10-221-2/+2
| | | | | | | | | | | | | In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated snapshots. When we hit the end of an inode, if the next inode(s) are in a different subvolume, we could potentially have to skip past many keys before finding a key we can return to the caller, so they can terminate the iteration. This adds a peek_upto() variant to solve this problem, to be used when we know the range we're searching within. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Inode create no longer needs to probe key cacheKent Overstreet2023-10-221-24/+4
| | | | | | | Now that we have full key cache coherency, we can simplify bch2_inode_create(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_id_cached()Kent Overstreet2023-10-221-10/+5
| | | | | | | | Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Simplify bch2_inode_delete_keys()Kent Overstreet2023-10-221-35/+22
| | | | | | | | Had a bug report that implies bch2_inode_delete_keys() returned -EINTR before it completed, so this patch simplifies it and makes the flow control a little more conventional. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fix a null ptr deref in bch2_inode_delete_keys()Kent Overstreet2023-10-221-1/+5
| | | | | | | | Similarly to bch2_btree_delete_range_trans(), bch2_inode_delete_keys() may sometimes split compressed extents, and needs to pass in a disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fix debug build in userspaceKent Overstreet2023-10-221-10/+0
| | | | | | | This fixes some compiler warnings that only trigger in userspace - dead code, a maybe uninitialed variable, a maybe null ptr passed to printk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fix missing field initializationKent Overstreet2023-10-221-0/+1
| | | | | | | | When unpacking v1 inodes, we were failing to initialize the journal_seq field, leading to a BUG_ON() when fsync tries to flush a garbage journal sequence number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: bch2_assert_pos_locked()Kent Overstreet2023-10-221-3/+3
| | | | | | | | | This adds a new assertion to be used by bch2_inode_update_after_write(), which updates the VFS inode based on the update to the btree inode we just did - we require that the btree inode still be locked when we do that update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Add journal_seq to inode & alloc keysKent Overstreet2023-10-221-108/+103
| | | | | | | | | | | | | | | | Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Add BCH_SUBVOLUME_UNLINKEDKent Overstreet2023-10-221-5/+1
| | | | | | | | Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Plumb through subvolume idKent Overstreet2023-10-221-19/+90
| | | | | | | | | | | | | | To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Subvolumes, snapshotsKent Overstreet2023-10-221-2/+13
| | | | | | | | This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: btree_pathKent Overstreet2023-10-221-31/+30
| | | | | | | | | | | | | | | This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add flags field to bch2_inode_to_text()Kent Overstreet2023-10-221-6/+17
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Always check for transaction restartsKent Overstreet2023-10-221-1/+1
| | | | | | | On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Add an option for whether inodes use the key cacheKent Overstreet2023-10-221-7/+10
| | | | | | | We probably don't ever want to flip this off in production, but it may be useful for certain kinds of testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Add safe versions of varint encode/decodeKent Overstreet2023-10-221-3/+3
| | | | | | | This adds safe versions of bch2_varint_(encode|decode) that don't read or write past the end of the buffer, or varint being encoded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Kill bch2_btree_iter_peek_cached()Kent Overstreet2023-10-221-10/+7
| | | | | | It's now been rolled into bch2_btree_iter_peek_slot() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Preallocate transaction memKent Overstreet2023-10-221-1/+1
| | | | | | This helps avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Check for errors from bch2_trans_update()Kent Overstreet2023-10-221-5/+3
| | | | | | | Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Fix pathalogical behaviour with inode sharding by cpu IDKent Overstreet2023-10-221-3/+1
| | | | | | | | If the transactior restarts on a different CPU, it could end up needing to read in a different btree node, which makes another transaction restart more likely... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Add an option to control sharding new inode numbersKent Overstreet2023-10-221-7/+14
| | | | | | | We're seeing a bug where inode creates end up spinning in bch2_inode_create - disabling sharding will simplify what we're testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: properly initialize used valuesDan Robertson2023-10-221-1/+1
| | | | | | | | | | | | | | | | | | | - Ensure the second key value in bch_hash_info is initialized to zero if the info type is of type BCH_STR_HASH_SIPHASH. - Initialize the possibly returned value in bch2_inode_create. Assuming bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value of ret could be returned to the user as an error pointer. - Fix compiler warning in initialization of bkey_s_c_stripe fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct bkey_s_c_stripe new_s = { NULL }; ^~~~ Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improved check_directory_structure()Kent Overstreet2023-10-221-26/+5
| | | | | | | | | | | | | | Now that we have inode backpointers, we can simplify checking directory structure: instead of doing a DFS from the filesystem root and then checking if we found everything, we can iterate over every inode and see if we can go up until we get to the root. This patch also has a number of fixes and simplifications for the inode backpointer checks. Also, it turns out we don't actually need the BCH_INODE_BACKPTR_UNTRUSTED flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change inode allocation code for snapshotsKent Overstreet2023-10-221-23/+55
| | | | | | | | | | For snapshots, when we allocate a new inode we want to allocate an inode number that isn't in use in any other subvolume. We won't be able to use ITER_SLOTS for this, inode allocation needs to change to use BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Inode backpointersKent Overstreet2023-10-221-13/+5
| | | | | | | | | | | | | | | | | This patch adds two new inode fields, bi_dir and bi_dir_offset, that point back to the inode's dirent. Since we're only adding fields for a single backpointer, files that have been hardlinked won't necessarily have valid backpointers: we also add a new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has ever had multiple links to it. That's ok, because we only really need this functionality for directories, which can never have multiple hardlinks - when we add subvolumes, we'll need a way to enemurate and print subvolumes, and this will let us reconstruct a path to a subvolume root given a subvolume root inode. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Start using bpos.snapshot fieldKent Overstreet2023-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve inode deletion codeKent Overstreet2023-10-221-31/+14
| | | | | | | It had some silly redundancies. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>