Actix-web: Undefined behavior arrising from use of `UnsafeCell` in CloneableService

Created on 20 Jan 2020  路  3Comments  路  Source: actix/actix-web

Futures created by the HttpServiceHandler (indirectly) contain references to shared UnsafeCell containing the service. With some creative use of thread local storage, it is possible for the poll function of one future to call the poll function of the other. This leads to two simultaneous mutable references into the service, which can cause undefined behavior.

I have written a demonstration of this below.

use std::task::{Context, Poll};
use std::pin::Pin;
use std::future::Future;
use actix_service::{fn_factory, ServiceFactory, Service};
use actix_http::{HttpService, Request, Response, Protocol};
use futures::future::{ready, Ready};
use tokio::io::{AsyncRead, AsyncWrite};
use std::cell::Cell;
use actix_http::error::DispatchError;
use actix_http::http::{StatusCode, Error};
use futures::task::noop_waker;

thread_local! {
    static FUTURE_TO_POLL: Cell<Option<Pin<Box<dyn Future<Output = Result<(), DispatchError>>>>>> = Cell::new(None);
}

struct DangerousService(Option<Box<u8>>);

impl Service for DangerousService {
    type Request = Request;
    type Response = Response;
    type Error = Error;
    type Future = Ready<Result<Response, Error>>;

    fn poll_ready(&mut self, ctx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        unimplemented!();
    }

    fn call(&mut self, req: Self::Request) -> Self::Future {
        if let Some(mut future_to_poll) = FUTURE_TO_POLL.with(|cell| cell.take()) {
            if let Some(reference) = self.0.as_ref() {
                let waker = noop_waker();
                let mut context = Context::from_waker(&waker);
                let _ = future_to_poll.as_mut().poll(&mut context);

                println!("This read will fail {}", *reference);
            }
            ready(Ok(Response::new(StatusCode::OK)))
        } else {
            self.0 = None;
            ready(Ok(Response::new(StatusCode::OK)))
        }
    }
}

struct FakeAsync(bool);

impl AsyncRead for FakeAsync {
    fn poll_read(mut self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut [u8]) -> Poll<Result<usize, std::io::Error>> {
        if self.0 {
            self.0 = false;
            let to_write = b"GET /hello HTTP/1.0\n\n";
            buf[0..to_write.len()].copy_from_slice(to_write);
            Poll::Ready(Ok(to_write.len()))
        } else {
            Poll::Pending
        }
    }
}

impl AsyncWrite for FakeAsync {
    fn poll_write(self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &[u8]) -> Poll<Result<usize, std::io::Error>> {
        Poll::Ready(Ok(buf.len()))
    }

    fn poll_flush(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), std::io::Error>> {
        Poll::Ready(Ok(()))
    }

    fn poll_shutdown(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), std::io::Error>> {
        Poll::Ready(Ok(()))
    }
}

#[actix_rt::main]
async fn main() {
    let service_factory = fn_factory(|| {
        ready(Ok::<_, Error>(DangerousService(Some(Box::new(5)))))
    });
    let http_service: HttpService<FakeAsync, _, _> = HttpService::new(service_factory);
    let mut handler = http_service.new_service(()).await.unwrap();
    let future1 = handler.call((FakeAsync(true), Protocol::Http1, None));
    let future2 = handler.call((FakeAsync(true), Protocol::Http1, None));

    FUTURE_TO_POLL.with(move |cell| cell.set(Some(Box::pin(future1))));
    future2.await.unwrap();
}
C-bug UB

Most helpful comment

@mitsuhiko after some benchmarking it appears there is no noticeable perf difference between safe and unsafe, all within +-10ns which is probably error due to async env:

async_service_direct    time:   [335.96 ns 348.83 ns 359.29 ns]
                        change: [-5.8798% +3.8551% +14.029%] (p = 0.46 > 0.05)
                        No change in performance detected.
async_service_cloneable_unsafe
                        time:   [316.97 ns 336.05 ns 352.48 ns]
                        change: [-7.1993% +3.2989% +15.189%] (p = 0.57 > 0.05)
                        No change in performance detected.
async_service_cloneable_safe
                        time:   [332.77 ns 350.90 ns 365.58 ns]
                        change: [-9.4788% +0.5402% +13.049%] (p = 0.93 > 0.05)
                        No change in performance detected.

Also somebody has pointed out in the chat that making it through feature flag will make it possible to sneak unsafe through transient dependency.

All 3 comments

It seems reasonably easy fix for this would be to replace UnsafeCell with RefCell. Though it will lead to service thread panic instead of mentioned UB. Service thread will be restarted and service capacity will be reduced for some time. If that is affordable solution I can work on that.

I feel like it would be interesting to make an internal cell wrapper that uses RefCell that could be switched to an UnsafeCell in case someone really wants that extra performance. Also it would leave one place to touch for optimizations later if needed.

The panic itself i think is okay in this case for now.

@mitsuhiko after some benchmarking it appears there is no noticeable perf difference between safe and unsafe, all within +-10ns which is probably error due to async env:

async_service_direct    time:   [335.96 ns 348.83 ns 359.29 ns]
                        change: [-5.8798% +3.8551% +14.029%] (p = 0.46 > 0.05)
                        No change in performance detected.
async_service_cloneable_unsafe
                        time:   [316.97 ns 336.05 ns 352.48 ns]
                        change: [-7.1993% +3.2989% +15.189%] (p = 0.57 > 0.05)
                        No change in performance detected.
async_service_cloneable_safe
                        time:   [332.77 ns 350.90 ns 365.58 ns]
                        change: [-9.4788% +0.5402% +13.049%] (p = 0.93 > 0.05)
                        No change in performance detected.

Also somebody has pointed out in the chat that making it through feature flag will make it possible to sneak unsafe through transient dependency.

Was this page helpful?
0 / 5 - 0 ratings