The following code demonstrates how a LoadingCache stops loading forever if invalidateAll() is called during a load operation.
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import com.google.common.base.Preconditions;
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import com.google.common.util.concurrent.ThreadFactoryBuilder;
public class CacheBug {
private static final ExecutorService EXECUTOR = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());
static final LoadingCache<String, Long> THE_CACHE = CacheBuilder.newBuilder()
.refreshAfterWrite(1, TimeUnit.MILLISECONDS)
.build(new CacheLoader<String, Long>() {
@Override
public Long load(String s) throws InterruptedException {
Thread.sleep(200);
return System.currentTimeMillis();
}
});
public static void main(String[] args) throws ExecutionException, InterruptedException {
THE_CACHE.getUnchecked("");
Thread.sleep(100);
EXECUTOR.submit(() -> THE_CACHE.get(""));
for (int i = 0; i < 10; i++) {
Thread.sleep(100);
THE_CACHE.invalidateAll();
System.err.println("Current time is: " + THE_CACHE.get(""));
}
Preconditions.checkArgument(System.currentTimeMillis() - THE_CACHE.get("") < 500);
}
}
Output:
Current time is: 1508359149314
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Current time is: 1508359149217
Exception in thread "main" java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
at CacheBug.main(CacheBug.java:34)
The expected behaviour is that the time should increase (by about 300ms) in every iteration.
If adding a removal listener, one minor mistake shown is that a COLLECTED notification is published. The updated cause, due to a race to promote the cause if the entry's fields were also collected, accidentally missed using (valueReference != UNSET) case. Thus, on an UNSET value reference the cause is null. That said, its also odd for why a null value would even occur.
Regarding this issue, note that invalidate(key) does seem to work fine. Only invalidateAll causes weird results.
@cpovirk @lowasser
Does anybody mind if I'll take this issue for investigation and probably for patching?
Please go ahead.
On Thu, Nov 23, 2017, 11:49 AM Sergei Ustimenko notifications@github.com
wrote:
@cpovirk https://github.com/cpovirk @lowasser
https://github.com/lowasser
Does anybody minds if I'll take this issue for investigation and probably
for patching?—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/google/guava/issues/2971#issuecomment-346664280, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAhPOdfADnpJb63yWRZ1lEhQ483DK2Twks5s5aIPgaJpZM4P-VCf
.
@lowasser
So roughly speaking, that's what happens:
Main + Pool-1-Thread-1
+-----------------------------------+
|
count == 0 |
|
get new value |
|
load new value..|
|
store new value |
|
count <= 1 |
+-----------------------------------+
|
| check for value
|
| value isn't null
|
| refresh value
|
| load new value..
+-----------------------------------+
|
invalidate all |
|
count <= 0 |
|
+-----------------------------------+
|
set up new head |
in the bucket |
|
load new value..|
|
+-----------------------------------+
|
| store ready value
|
| value was set up
| previously by main
| w/ isLoading=>true
|
|
| value is null =>
| publish COLLECTED
|
| oldValue.isActive
| => true => this.count
| remains constant
|
| store new value
|
| count <= 0
|
+------------------------------------+
|
try to store |
ready value |
|
don't store it |
|
publish REPLACED|
|
count remains |
constant = 0 |
+
summarizing that, if I didn't make a mistake, we clearly could see, that while value is loading by pool-1-thread-1 we're doing invalidateAll, which causes us to set up this.count = 0. Then main wakes up and tries to load a value, as underlying entry in a table is empty and shall be initialized (then sleeps for about 200 ms). poll-1-thread-1 wakes up, sets up it's value, but, what it does here is it checks, if there's already a bucket and entry initialized (which is generously done by main), however, a value in an entry is still loading. pool-1-thread-1 substitutes the value with it's own loaded one and doesn't affect this.count which is 0 (it actually does --newCount and stores it) at the moment. main apparently wakes up and tries to set up it's value, however fails to pass the condition:
if (oldValueReference == valueReference
|| (entryValue == null && valueReference != UNSET)) {
and thus doesn't update a value along with this.count which is still equals to zero. Thus main publishes REPLACED and exits.
All the subsequent loads fail miserably as main, and all the invalidateAll attempts fail because of this.count == 0.
In my opinion there's two ways to fix this issue:
1st if kinda safe and quickest one, however not the best one, as I think. It is just to delete a condition:
if (count != 0) { // read-volatile
at LocalCache.Segment#clear:3357 which prevents invalidateAll from clearing kinda broken segment.
or the second one is to update LocalCache.Segment#storeLoadedValue:3284 and add here something like follows, to catch the case, when oldValueReference.isLoading == false && newValueReference.isLoading == true and we still want to update a value and increment this.count.
} else if (!oldValueReference.isActive() && valueReference.isActive()) {
setValue(e, key, newValue, now);
this.count = newCount;
evictEntries(e);
enqueueNotification(key, hash, newValue, oldValueReference.getWeight(), RemovalCause.REPLACED);
return true;
}
I must admit, that I like 2nd option (not exactly that implementation but I like the approach) much more because in my opinion it fixes the root cause (we fail to update a value in a bucket), however it breaks the test LocalCacheTest.testSegmentStoreComputedValue for package-private storeLoadedValue for the clobbered case. However 1st option is quite safe (I don't take in account performance drawback which caused by invalidateAll being clearing an internal table in cause when count should be 0 and table is actually empty).
So, @lowasser what do you think on that? Maybe I should do a deeper dive? Which option do you like the most?
I also must say, that COLLECTED (as @ben-manes noticed) status might be published under a race, which is caused by entryValue <= valueReference#get == null, which actually appears to be a valid case, when one thread sets up a reference, but value is still loading and being checked by another thread.
Something like
RemovalCause cause = (entryValue == null && !valueReference.isLoading()) ? RemovalCause.COLLECTED : RemovalCause.REPLACED;
should be there.
@fdesu hi,i ran into the same situation that never reload again. can you help to explain why invalidate(Object) would not cause this problem as it also just remove the value and set count=0(we only had one cachekey)
Hi @yingqin678, sorry if my explanation above looks hard to understand and not entirely accurate, I was too much excited as I recall.
Regarding the bug, as far as I remember invalidate(Object) follows that pattern: invalidate(Object) -> localCache.remove(key) -> segmentFor(hash).remove(key, hash) where remove unconditionally locks the segment and removes the entry. Brief Segment#remove logic:
V remove(Object key, int hash) {
lock();
try {
long now = map.ticker.read();
preWriteCleanup(now);
int newCount = this.count - 1;
AtomicReferenceArray<ReferenceEntry<K, V>> table = this.table;
int index = hash & (table.length() - 1);
ReferenceEntry<K, V> first = table.get(index);
for (ReferenceEntry<K, V> e = first; e != null; e = e.getNext()) {
...
} finally {
unlock();
postWriteCleanup();
}
while invalidateAll iterates through each segment and calls the Segment#clear. The clear's logic depends on volatile counter. Clear conditionally in that case goes ahead and clears the segment. Problem here is that unfortunately bug appears during the count management. Following is the brief Segment#clear's logic:
if (count != 0) { // read-volatile
lock();
try {
// clean goes here
++modCount;
count = 0; // write-volatile
} finally {
unlock();
postWriteCleanup();
}
}
Thus, my assumption is mainly based on the fact, that if Segment#clear would run unconditionally, then the symptom would be cured, though the root cause wouldn't be fixed.
I guess you might better go ahead and check ben-manes/caffeine cache, as it appears that it supersedes the guava's one. Though I'm not sure on that, so we better ask @ben-manes, I guess.
Caffeine does try to replace Guava’s, but it’s not a guava/google project so I certainly can’t claim so officially. I do think it’s a better option going forward, but that bias should be obvious.
@fdesu hi again,regard as your last reply,i understand that when Segment#clear would run unconditionally ,bug is cured.however what i am not convinced is that after first time exec invidate(Object) or invidateAll() ,the first entry in table both is null, and the count=0.so what`s the different between the result after first time exec invidate(Object) and invidateAll() that cause to the bug.
@yingqin678
If I get the question right, then invalidate(Object) runs invalidation regardless the count. It will evict entry and repair the bucket even if it is broken. If bucket's count on the other hand was corrupted (e.g. set to 0 while there are still some living entries), invalidateAll wouldn't run, thus causes a bug.
The case might be much more sophisticated though, I'm just describing the OP's case.
@fdesu ths for the reply very much.maybe i'm more curious in the origin reason why it happens,just let me think about it more
Most helpful comment
@lowasser
So roughly speaking, that's what happens:
summarizing that, if I didn't make a mistake, we clearly could see, that while value is loading by
pool-1-thread-1we're doinginvalidateAll, which causes us to set upthis.count = 0. Thenmainwakes up and tries to load a value, as underlying entry in a table is empty and shall be initialized (then sleeps for about200 ms).poll-1-thread-1wakes up, sets up it's value, but, what it does here is it checks, if there's already a bucket and entry initialized (which is generously done bymain), however, a value in an entry is still loading.pool-1-thread-1substitutes the value with it's own loaded one and doesn't affectthis.countwhich is0(it actually does--newCountand stores it) at the moment.mainapparently wakes up and tries to set up it's value, however fails to pass the condition:and thus doesn't update a value along with
this.countwhich is still equals to zero. ThusmainpublishesREPLACEDand exits.All the subsequent loads fail miserably as
main, and all theinvalidateAllattempts fail because ofthis.count == 0.In my opinion there's two ways to fix this issue:
1st if kinda safe and quickest one, however not the best one, as I think. It is just to delete a condition:
at
LocalCache.Segment#clear:3357which preventsinvalidateAllfrom clearing kinda broken segment.or the second one is to update
LocalCache.Segment#storeLoadedValue:3284and add here something like follows, to catch the case, whenoldValueReference.isLoading == false && newValueReference.isLoading == trueand we still want to update a value and incrementthis.count.I must admit, that I like 2nd option (not exactly that implementation but I like the approach) much more because in my opinion it fixes the root cause (we fail to update a value in a bucket), however it breaks the test
LocalCacheTest.testSegmentStoreComputedValuefor package-privatestoreLoadedValuefor the clobbered case. However 1st option is quite safe (I don't take in account performance drawback which caused byinvalidateAllbeing clearing an internal table in cause whencountshould be0and table is actually empty).So, @lowasser what do you think on that? Maybe I should do a deeper dive? Which option do you like the most?