| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previosuly, the transaction commit path would have to add keys to the
btree write buffer as a separate operation, requiring additional global
synchronization.
This patch introduces a new journal entry type, which indicates that the
keys need to be copied into the btree write buffer prior to being
written out. We switch the journal entry type back to
JSET_ENTRY_btree_keys prior to write, so this is not an on disk format
change.
Flushing the btree write buffer may require pulling keys out of journal
entries yet to be written, and quiescing outstanding journal
reservations; we previously added journal->buf_lock for synchronization
with the journal write path.
We also can't put strict bounds on the number of keys in the journal
destined for the write buffer, which means we might overflow the size of
the preallocated buffer and have to reallocate - this introduces a
potentially fatal memory allocation failure. This is something we'll
have to watch for, if it becomes an issue in practice we can do
additional mitigation.
The transaction commit path no longer has to explicitly check if the
write buffer is full and wait on flushing; this is another performance
optimization. Instead, when the btree write buffer is close to full we
change the journal watermark, so that only reservations for journal
reclaim are allowed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, bch2_journal_pin_set() would silently ignore a request to
pin a journal sequence number that was no longer dirty, because it was
used internally by bch2_journal_pin_copy() which could race with the src
pin being flushed.
Split these apart so that we can properly assert that @seq is a
currently dirty journal sequence number - this is almost always a bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
We have a couple journal pin put helpers to handle cases where the
journal lock is already held or not. Refactor the helpers to lock
and reclaim from the highest level and open code the reclaim from
the one caller of the internal variant. The latter call will be
moved into the journal buf release helper in a later patch.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
If the journal reclaim thread makes it to the timeout without ever
initializing j->last_flushed, we could end up sleeping for a very long
time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
In bch2_btree_interior_update_will_free_node, we copy the journal pins
from outstanding writes on the btree node we're about to free. But, this
can race with the writes completing, and dropping their journal pins.
To guard against this, just use READ_ONCE() in bch2_journal_pin_copy().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
This patch increases the maximum journal buffers in flight from 2 to 4 -
this will be particularly helpful when in the future we stop requiring
flush+fua for every journal write.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
| |
If the journal is halted, journal reclaim won't necessarily be able to
make any forward progress, and won't accomplish anything anyways - we
should bail out so that we don't get stuck looping in reclaim when the
caches are too dirty and we should be shutting down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This deletes some duplicated code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This is to make tracing easier.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.
Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.
These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
| |
Now that interior btree updates are fully transactional, we don't need
to write out alloc info in a loop. However, interior btree updates do
put more things in the journal, so we still need a loop in the RO
sequence.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
When we're doing btree updates from journal flush, this becomes a
locking inversion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Will be used in the future for inode updates, which will be very helpful
for multithreaded workloads that have to update the inode with every
extent update (appends, or updates that change i_sectors)
Also will be used eventually for fully persistent alloc info
However - we still need a mechanism for reserving space in the journal
prior to getting a journal reservation, so it's not technically safe to
make use of this just yet, we could deadlock with the journal full
(although not likely to be an issue in practice)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.
Website: https://bcachefs.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|