aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorNick Piggin <npiggin@suse.de>2009-01-06 03:05:50 +0100
committerLinus Torvalds <torvalds@linux-foundation.org>2009-01-05 18:31:12 -0800
commite8c82c2e23e3527e0c9dc195e432c16784d270fa (patch)
tree943b1359ff8d6855b0b5fd37bb0ae61f01a7da0d
parentf1b11e505463fd597ab7963df26dd1f446dcceae (diff)
downloadmrst-s0i3-test-e8c82c2e23e3527e0c9dc195e432c16784d270fa.tar.gz
mrst-s0i3-test-e8c82c2e23e3527e0c9dc195e432c16784d270fa.tar.xz
mrst-s0i3-test-e8c82c2e23e3527e0c9dc195e432c16784d270fa.zip
mm lockless pagecache barrier fix
An XFS workload showed up a bug in the lockless pagecache patch. Basically it would go into an "infinite" loop, although it would sometimes be able to break out of the loop! The reason is a missing compiler barrier in the "increment reference count unless it was zero" case of the lockless pagecache protocol in the gang lookup functions. This would cause the compiler to use a cached value of struct page pointer to retry the operation with, rather than reload it. So the page might have been removed from pagecache and freed (refcount==0) but the lookup would not correctly notice the page is no longer in pagecache, and keep attempting to increment the refcount and failing, until the page gets reallocated for something else. This isn't a data corruption because the condition will be detected if the page has been reallocated. However it can result in a lockup. Linus points out that ACCESS_ONCE is also required in that pointer load, even if it's absence is not causing a bug on our particular build. The most general way to solve this is just to put an rcu_dereference in radix_tree_deref_slot. Assembly of find_get_pages, before: .L220: movq (%rbx), %rax #* ivtmp.1162, tmp82 movq (%rax), %rdi #, prephitmp.1149 .L218: testb $1, %dil #, prephitmp.1149 jne .L217 #, testq %rdi, %rdi # prephitmp.1149 je .L203 #, cmpq $-1, %rdi #, prephitmp.1149 je .L217 #, movl 8(%rdi), %esi # <variable>._count.counter, c testl %esi, %esi # c je .L218 #, after: .L212: movq (%rbx), %rax #* ivtmp.1109, tmp81 movq (%rax), %rdi #, ret testb $1, %dil #, ret jne .L211 #, testq %rdi, %rdi # ret je .L197 #, cmpq $-1, %rdi #, ret je .L211 #, movl 8(%rdi), %esi # <variable>._count.counter, c testl %esi, %esi # c je .L212 #, (notice the obvious infinite loop in the first example, if page->count remains 0) Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--include/linux/radix-tree.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index a916c6660df..355f6e80db0 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -136,7 +136,7 @@ do { \
*/
static inline void *radix_tree_deref_slot(void **pslot)
{
- void *ret = *pslot;
+ void *ret = rcu_dereference(*pslot);
if (unlikely(radix_tree_is_indirect_ptr(ret)))
ret = RADIX_TREE_RETRY;
return ret;