Skip to content

Commit 934140a

Browse files
kiran-modukuridhowells
authored andcommitted
cachefiles: Fix refcounting bug in backing-file read monitoring
cachefiles_read_waiter() has the right to access a 'monitor' object by virtue of being called under the waitqueue lock for one of the pages in its purview. However, it has no ref on that monitor object or on the associated operation. What it is allowed to do is to move the monitor object to the operation's to_do list, but once it drops the work_lock, it's actually no longer permitted to access that object. However, it is trying to enqueue the retrieval operation for processing - but it can only do this via a pointer in the monitor object, something it shouldn't be doing. If it doesn't enqueue the operation, the operation may not get processed. If the order is flipped so that the enqueue is first, then it's possible for the work processor to look at the to_do list before the monitor is enqueued upon it. Fix this by getting a ref on the operation so that we can trust that it will still be there once we've added the monitor to the to_do list and dropped the work_lock. The op can then be enqueued after the lock is dropped. The bug can manifest in one of a couple of ways. The first manifestation looks like: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! RIP: 0010:fscache_put_operation+0x1e3/0x1f0 ... fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 This is due to the operation being in the DEAD state (6) rather than INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through fscache_put_operation(). The bug can also manifest like the following: kernel BUG at fs/fscache/operation.c:69! ... [exception RIP: fscache_enqueue_operation+246] ... #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not entirely clear which assertion failed. Fixes: 9ae326a ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Lei Xue <[email protected]> Reported-by: Vegard Nossum <[email protected]> Reported-by: Anthony DeRobertis <[email protected]> Reported-by: NeilBrown <[email protected]> Reported-by: Daniel Axtens <[email protected]> Reported-by: Kiran Kumar Modukuri <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Daniel Axtens <[email protected]>
1 parent d0eb06a commit 934140a

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

fs/cachefiles/rdwr.c

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
2727
struct cachefiles_one_read *monitor =
2828
container_of(wait, struct cachefiles_one_read, monitor);
2929
struct cachefiles_object *object;
30+
struct fscache_retrieval *op = monitor->op;
3031
struct wait_bit_key *key = _key;
3132
struct page *page = wait->private;
3233

@@ -51,16 +52,22 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
5152
list_del(&wait->entry);
5253

5354
/* move onto the action list and queue for FS-Cache thread pool */
54-
ASSERT(monitor->op);
55+
ASSERT(op);
5556

56-
object = container_of(monitor->op->op.object,
57-
struct cachefiles_object, fscache);
57+
/* We need to temporarily bump the usage count as we don't own a ref
58+
* here otherwise cachefiles_read_copier() may free the op between the
59+
* monitor being enqueued on the op->to_do list and the op getting
60+
* enqueued on the work queue.
61+
*/
62+
fscache_get_retrieval(op);
5863

64+
object = container_of(op->op.object, struct cachefiles_object, fscache);
5965
spin_lock(&object->work_lock);
60-
list_add_tail(&monitor->op_link, &monitor->op->to_do);
66+
list_add_tail(&monitor->op_link, &op->to_do);
6167
spin_unlock(&object->work_lock);
6268

63-
fscache_enqueue_retrieval(monitor->op);
69+
fscache_enqueue_retrieval(op);
70+
fscache_put_retrieval(op);
6471
return 0;
6572
}
6673

0 commit comments

Comments
 (0)