Tokio: Deadlock issue with tokio::sync::lock::Lock

Created on 26 Jul 2019  路  5Comments  路  Source: tokio-rs/tokio

For some reason Lock::poll_lock is not returning Poll::Ready even after the LockGuard is dropped. This is leading to a deadlock.

Below is the code & link to the project I wrote to showcase this issue.
https://github.com/JohnDoneth/tokio-sync-bug

#![feature(async_await)]

use tokio;
use tokio::sync::lock::{Lock, LockGuard};

use core::task::{Context, Poll};
use std::future::Future;
use std::pin::Pin;
use std::sync::Arc;

struct Items {
    items: Vec<Lock<usize>>,
}

impl Items {
    fn new() -> Self {
        let mut items = vec![];

        for i in 0..5 {
            items.push(Lock::new(i));
        }

        Self { items }
    }
    fn get<'a>(&'a self) -> LockFuture<'a> {
        LockFuture(&self)
    }
}

struct LockFuture<'a>(&'a Items);

impl<'a> Future for LockFuture<'a> {
    type Output = LockGuard<usize>;
    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        for mut item in self.0.items.iter().cloned() {
            if let Poll::Ready(val) = item.poll_lock(cx) {
                return Poll::Ready(val);
            }
        }

        Poll::Pending
    }
}

#[tokio::main]
async fn main() {
    let items = Arc::new(Items::new());

    let mut completed = Lock::new(0usize);

    for i in 0..1000 {
        let items = items.clone();
        let completed = completed.clone();

        tokio::spawn(async move {
            let guard = items.get().await;

            println!("{:?}, {}", i, *guard);

            let mut completed = completed.clone();
            let mut completed = completed.lock().await;
            *completed += 1;
        });
    }

    while *completed.lock().await != 1000 {
        // do nothing
    }
}

Example output:

...
32, 0
33, 0
34, 1
35, 2
36, 3
37, 4
...stuck here...

Most helpful comment

Perfect! Thank you for helping me figure this out, it makes perfect sense in hindsight. I tested it with the original code I was working on and can confirm that it was the issue.

All 5 comments

For some reason Lock::poll_lock is not returning Poll::Ready even after the LockGuard is dropped. This is leading to a deadlock.

Are you sure that is the case?

When threads are spawned all go to this line:

            let guard = items.get().await;

At least 5 of them will make past this line since they will get Poll::Ready but rest of them will be stuck on Poll::Pending. But then they are never woken up (when some LockGuard is dropped there should be mechanism to wake other LockFutures).

I said at least 5 because some of the threads could be so fast that they finished before other threads started up. If that is the case they will drop LockFuture and next thread will see that some item will return Poll::Ready so the poll on LockFuture will succeed.

Are you sure that is the case?

I'm fairly certain. I verified that Poll::Ready was not being returned by putting a print line in the poll method when Poll::Ready is returned. I also verified that they are still being polled by printing on Poll::Pending as well, which does get printed out after the deadlock occurs.

But then they are never woken up (when some LockGuard is dropped there should be mechanism to wake other LockFutures).

Ah, are you saying that there needs to be another layer of waking here because I'm polling multiple Lock's?

Ah, are you saying that there needs to be another layer of waking here because I'm polling multiple Lock's?

Actually, I think that I misread the code. I didn't notice that you are passing the cx to the poll_lock which should trigger waking up.

I don't think there is a deadlock issue in tokio-sync, my hypothesis is that your reproduction code is broken because the poll method creates clones of the locks and calls poll_lock but is dropping the locks when the loop has ended, this may inhibit the notification of the task. To check this hypothesis, I modified your code so that a LockFuture owns a cloned vec of locks.

#[derive(Clone)]
struct Items {
    items: Vec<Lock<usize>>,
}

impl Items {
    fn new() -> Self {
        let mut items = vec![];

        for i in 0..5 {
            items.push(Lock::new(i));
        }

        Self { items }
    }
}

struct LockFuture(Items);

impl Future for LockFuture {
    type Output = LockGuard<usize>;
    fn poll(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        for item in self.0.items.iter_mut() {
            if let Poll::Ready(val) = item.poll_lock(cx) {
                return Poll::Ready(val);
            }
        }

        Poll::Pending
    }
}

#[tokio::main]
async fn main() {
    let items = Items::new();

    let mut completed = Lock::new(0usize);

    for i in 0..1000 {
        let items = items.clone();
        let mut completed = completed.clone();

        tokio::spawn(async move {
            let guard = LockFuture(items).await;

            println!("{:?}, {}", i, *guard);

            let mut completed = completed.lock().await;
            *completed += 1;
        });
    }

    while *completed.lock().await != 1000 {
        // do nothing
    }
}

As far as I can tell, the variant above doesn't deadlock.

Perfect! Thank you for helping me figure this out, it makes perfect sense in hindsight. I tested it with the original code I was working on and can confirm that it was the issue.

Was this page helpful?
0 / 5 - 0 ratings