Diesel: Async I/O

Created on 9 Aug 2016  Ā·  95Comments  Ā·  Source: diesel-rs/diesel

With futures-rs and tokio the Rust ecosystem recently got a whole new chapter in the async IO story. It would be pretty great for diesel to work with this as well.

From an enduser standpoint, a straightforward API change would be to 'just' use an Async PgConnection instead of the regular PgConnection (DB specific for more efficient implementations?). All traits generic over Connection (e.g. LoadDsl) would get an associated type to determine if they are returning e.g. a QueryResult<T> or a Future<T>.

discussion desired

Most helpful comment

Sorry that I didn't reply -- I had an unexpected baby. Yes, that is the correct link to the podcast.

All 95 comments

Worth noting that in the async case I'd also like load and get_results to return some sort of stream instead of Future<Vec<T>>, as it frees me up to do cursory things (granted, even just a future is fine). This of course will require re-implementing the postgresql adapter to not rely on libpq (which I'm already working on for other reasons, not sure if I've spoken about it outside of the podcast). It also will not be possible with sqlite.

@sgrif maybe something like the "async iterator" in JS? Which would return an Iterator<Item=Future<T>>, the consumer would then call await on each future, but for this to work we would need compiler support for await. We could also make a simple helper function which would take an Iterator<Future> and a closure and iterate over the Iterator and calling the closure when the Future is ready.

That's a possibility as well, but it's untrue to how I'll actually have access to the data, which is in batches. If we're doing true async IO I also will not have the length ahead of time, so it'd have to be an actual iterator not Vec>, at which point why not just do a true stream?

Please note that futures-rs also has a Stream trait!

Stream<T> is to Vec<T> as Future<T> is to Result<T, _> (or I guess Option<T> in this case)

@sgrif off-topic, but curious as to the reasons for moving off of libpq. Could you also post a link to the podcast?

@macobo - I'm guessing it's The Bike Shed.

Sorry that I didn't reply -- I had an unexpected baby. Yes, that is the correct link to the podcast.

Well there is tokio-postgres now which you can use for your async stuff.
https://docs.rs/tokio-postgres/0.2.1/tokio_postgres/

Just want to clarify, will async make the 1.0 milestone?

No.

So... I've fully implemented a PoC for this twice now. I'm going to close this for the time being.

This is not a statement that async I/O will never happen, but it is a statement that it is not currently on the roadmap, and I do not consider this to be actionable at this time. Neither tokio nor futures have stable APIs at the moment (tokio in particular is supposedly going to have a major overhaul at some point).

At absolute minimum, we need to wait for things to settle down there. However, there are problems that need to be solved in those APIs as well. I found that in order to get to an API that I was happy with, I had to resort to some pretty nasty hacks which had a lot of gotchas and I'm not comfortable shipping. It seemed that ownership hits you way harder here as well. It's virtually impossible to have anything that isn't 'static. This might seem obvious or NBD, but consider for a moment that you basically never actually own the connection. Everything takes &connection (or more likely a PooledConnection<'a, T>).

With the state of things as they are today, we would also need to re-implement connection pooling to make this work. This is on top of what would already be an enormous amount of work. This is not just "make it async". When we switch to tokio, we can no longer use the client libraries provided to us by the database creators. We need to learn the wire protocol that database uses, and implement it at the TCP level.

Finally, I hear a lot of people saying that the lack of async support is the big blocker for them using Diesel, but I'm not buying it. PostgreSQL is the only backend likely to get an async driver any time soon (I do not know the MySQL wire protocol, and it is mostly undocumented). The tokio-postgres crate, which people would likely be using instead if async was really the blocker for them, has had 300 downloads in the past 3 months. I'm not saying that async isn't a thing worth doing, but I think people are perhaps overstating how important it is to them.

Anyway with all that said, I'm going to close this issue. Open issues in this repo should represent something that is actionable, and that you could reasonably open a pull request to fix. I do not consider this to be actionable with the state of async in Rust today. I will keep my eye on that situation, and if that changes, I will open a new issue for this. But you should not expect async Diesel in the near future.

I understand your frustrations with the current API, and understand you don't want to build async support just yet, only to re-do it later. All of that makes perfect sense, and this is your project so absolutely do what you feel is best.

I just want to point out that your rationale for "async isn't a big deal" is full of selection bias (i.e. inaccurate), and you're doing yourself a disservice by believing those stats.

Finally, I hear a lot of people saying that the lack of async support is the big blocker for them using Diesel, but I'm not buying it.

You're using the stats for tokio-postgres to infer the popularity of async needs and since it is not popular you infer that async is not a blocker.

However, this is inaccurate, people who view async as a blocker are not using sync-Rust instead. They are simply _not using Rust_.

At least for me, I do not use Rust for any web based applications because, while hyper is pretty good, there is no great ecosystem around async in Rust yet.

I want to re-iterate: I am not trying to say that "async isn't a big deal". I am trying to say it is not currently a priority, and it is not feasible for us to attempt to accomplish it at this point in time.

people who view async as a blocker are not using sync-Rust instead. They are simply not using Rust.

While you're probably right, and it may not have been a great stat to bring up, I'm not sure I agree with "if I can't use Diesel I'm just not going to use Rust" being the norm in this case.

I'm also not trying to use the stats for tokio-postgres to infer the popularity of async, I'm using to to give some number to the only async alternative available for the backend that we are most likely to support. Presumably there are a decent number of people who want all of:

  • Rust
  • PostgreSQL
  • Async I/O

This is what those people would be using.

But again, I'm probably calling too much attention to what was by far the smallest part of my reasoning for closing this (which I should also mention was discussed with the rest of the core team). This is mostly about it not being something we can do right now, which is why I wanted to close this issue. If circumstances change and async-diesel becomes feasible in the future, we'll open a new issue. But I want to avoid keeping issues open which cannot reasonably be closed with a pull request (this would of course be a very large pull request)

Just a heads up on Pleingres, ā€œ_An asynchronous, Tokio-based, 100% Rust postgres driver_ā€, as mentioned in the Pijul blog.
It might be an inspiration or a base for the future work. Or not.

I don't know if this exists yet, but perhaps documentation (e.g. an example on the website) about how to integrate Diesel into an async project without blocking the whole event loop. If this example does connection pooling, then this seems like a reasonable solution until the async ecosystem in Rust stabilizes some.

I'm using Diesel on a site that I hope will have high volume database operations (mostly simple CRUD, but relatively high volume, as in potentially hundreds or even thousands more-or-less concurrently). I'll be figuring out how to do pooling properly, but I honestly wouldn't go through this effort if I wasn't dead-set on using Rust and another solution has an out-of-the-box solution (e.g. Go is "good enough").

And honestly, an async interface into a thread pool of synchronous connections is probably better for a high traffic site anyway, which is probably the bulk of the "async or bust" crowd. That's not to say that async in Diesel isn't useful, just that we can probably solve the problem for most people with documentation.

This method is considered as a workaround

Just use
https://github.com/alexcrichton/futures-rs/tree/master/futures-cpupool
to wrap diesel's operations, then you can use diesel with futures/tokio nicely.
Define something like

use futures::prelude::*;
use futures_cpupool::CpuPool;
use diesel;

pub type Conn = diesel::pg::PgConnection;

pub fn exec_async<T, E, F, R>(&self, f: F) -> impl Future<Item = T, Error = E>
  where
    T: Send + 'static,
    E: From<::diesel::result::Error> + Send + 'static,
    F: FnOnce(&Conn) -> R + Send + 'static,
    R: IntoFuture<Item = T, Error = E> + Send + 'static,
    <R as IntoFuture>::Future: Send,
  {
    lazy_static! {
      static ref THREAD_POOL: CpuPool = {
        CpuPool::new_num_cpus()
      };
    }

    let pool = /* clone a ref of diesel's connection pool */;
    THREAD_POOL.spawn_fn(move || {
      pool
        .get()
        .map_err(|err| E::from(err))
        .map(|conn| f(&conn))
        .into_future()
        .and_then(|f| f)
    })
  }

Then you can

exec_async(|conn| {
  diesel::insert_into(??).values(??).execute(conn)
}).and_then(...)

@vorot93 @friktor @lnicola is the example above wrong?

@ivanovaleksey

  1. it's not real async code, we are talking about handling the whole connection asynchronously, not running blocking code on a threadpool
  2. you can't do more concurrent queries than there are cpu cores (CpuPool::new_num_cpus())

not saying it's "wrong"

@ivanovaleksey I can't say it's correct, as I didn't test it (e.g. I'm not sure about the trait bounds). In theory it should work. However,

  • it won't work with the async database drivers (I know there are a couple of implementations)
  • it's rather verbose
  • CpuPool is fixed-size; in general I'd prefer a more flexible thread pool
  • after the tokio changes that are coming soon, it will look a bit different (but still with a fixed thread pool)

So it works, but I'm not a fan of that approach.

@ivanovaleksey Pretty much what @chpio said. This code simply masks the blocking operations behind a thread pool. This doesn't help in the high load use case where large number of incoming DB queries will cause unacceptable delays for each of them.

See also actix-diesel example https://github.com/actix/examples/blob/f8e3570bd16bcf816070af20f210ce1b4f2fca69/diesel/src/main.rs#L64-L70

Perhaps offering a canonical pool implementation might be useful, that way diesel could have async/futures API without using async drivers (yet).

This sync driver / async API is how slick (scala) works https://stackoverflow.com/a/29597868/1240268 and might suffice to appease the "people who view async as a blocker" (at least some of them).


Aside:

CpuPool is fixed-size; in general I'd prefer a more flexible thread pool

IME it's desirable to have a fixed size (with potentially multiple pools), that way all connections are made on startup... and generally your db server won't allow an unlimited number of connections anyway.

This article by the SQLAlchemy author is a very good read on the topic (while it's about Python many facts listed there still hold in the Rust world):
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

@sgrif Maybe one could reconsider this issue. Indeed, things seem to brighten on this topic thanks to futures 0.3, and the introduction of non-'static futures, cf the article Wherefore art thou Romio? from @withoutboats for a more detailed explanation šŸ‘ :

The benefit of non-'static futures

On the other hand, the other big thing I’ve found is a lot of opportunities that async/await opens up in terms of API design. In particular, one problem that the futures 0.1 ecosystem has suffered from is the fact that all futures need to be 'static, so they can’t contain any references. This is, of course, just a special case of the general problem of ā€œself-referential futuresā€ that pinning was designed to solve, but its a variation on the theme that’s worth highlighting.

cc @mehcode

@gfortaine I currently have some WIP stuff but its waiting on Diesel being a 2018 edition crate. PRs like #1956 need to be merged in (and there are a few other things like, turning on the future_compatibility lint and fixing all the problems that come up, that could be done to help out).

Furthermore we are also blocked on https://github.com/rust-lang/rust/issues/56409 being stabilized and diesel being bumped to a MRV of 1.34 (expected version for this feature) as proc-macro crates (that are used in the crate that provides its runtime) cannot be edition 2018 without that feature.

@mehcode @gfortaine Adding to that we probably also need something like existential types because otherwise we are not able to use async functions as part of traits. I'm not sure what's the time line for stabilisation there.

@weiznich I would rather not block on that feature and just take the small performance hit with Box<Future< .. >> for trait methods.

That said, I assume we're going to likely put the async stuff behind a documented unstable feature flag until _everything_ is stabilized to do it perfectly but there is a large ecosystem clamoring for _some_ level of support here.

@mehcode From that standpoint: Maybe just implement it using the unstable nightly features and put it behind the already existing unstable feature flag?

That's what I'm intending to do.

But it still leaves us blocked on https://github.com/rust-lang/rust/issues/56409 as we must compile as a 2018 edition crate first and we can't put that behind a feature flag.

@mehcode Would it possible to somehow summarize your current plan in our forum?

Merging those 2018 edition PR's should be possible as soon as we've figured out what's up with our CI (again…) and where to export our macros.

@gfortaine Right now it's more of an issue of time than desire.

Hi!

Is there any change in this? Would be lovely to have async diesel :)

We've played a bit with async and found that it seems like it is currently impossible to implement a interface that is roughly what we imaging on top of that.

For anyone interested in this: We imaging basically a interface like this. Unfortunately it is currently not possible to write a implementation of the transaction function that does compile and which is usable without dirty hacks. At least anything we've tried did not work. So basically as soon as someone found a solution for that we could talk about implementing.

There's no good way to abstract over async closures right now. The lifetime you're using is wrong - it should not take a reference that lives for all of 'a, but there's also no way to tie lifetime of FT to the lifetime of the borrow in the correct way. This is why async closures will not be stabilized in the first pass, but will require additional design work.

What you actually want is T: for<'a> FnOnce(&'a mut Connection) -> impl Future<Output = Result<R, ()> + 'a, which we don't yet support. My hope is this will also be written in a more manageable way, hopefully something like T: AsyncFnOnce(&mut Connection) -> Result<R, ()>.

@withoutboats Good to know that's on the radar. For us the bigger issue was actually that Send/Sync bounds leak from locals inside an async function. e.g.

async fn foo() {
    let x = NotSync;
    await!(some_fn(&NotSync));
}

async fn some_fn<T>(t: &T) {}

results in foo() not being Send, even though the use of !Sync values is entirely internal, not based on its args or return value. (More full code: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=c82232a0410f367a983d9f41206f1973)

Not clear to me there's no way to create a data race if that were allowed.

@withoutboats is there any link to track the feature you speak of? Of public discussions about it?

Pd: I'm learning a lot with all this threat and related ones :D

@DavidBM the aync/await story is complex in Rust, I'm trying to follow it, too. We are getting there. As a starter, you might be interested in this issue.

@weiznich I haven't tried it and it's quite likely that I'm misunderstanding the problem, but I thought I'd ask.

Would a Future-based transaction API where you await to start a transaction and it gets rolled back on drop unless you commit work better than the closure?

@lnicola If I understood your proposal correctly you imaged something like this, right?

struct TransactionGuard<'a> {
    connection: &'a mut Connection
}

impl<'a> Drop for TransactionGuard<'a> {
    fn drop(&mut self) {
        connection.execute("ROLLBACK");
    }
}

impl<'a> TransactionGuard<'a> {
    async fn commit(&self) -> QueryResult<()> {
        self.connection.execute("COMMIT").await
    }
}

impl Connection {
    async fn transaction(&mut self) -> QueryResult<TransactionGuard<'a>> {
        self.execute("BEGIN").await?;
        Ok(TransactionGuard{connection: self})
    }
}

async fn with_transaction(conn: &mut Connection) -> QueryResult<()> {
    let guard = conn.transaction().await?;
    conn.execute("SOME Query").await?;
    guard.commit().await?;
    Ok(())
}

I think that's not a API that is as easily usable as the current transaction api of diesel, but that's probably something that could be accepted.

Additionally I see the following technical issues that prevents this from working:

  • We need to store a reference to the connection inside of the TransactionGuard to execute something on Drop. If we store a shared reference we will basically run in the same Send problem mentioned above, if we store a mutable reference we won't be able to execute anything inside of the transaction, because no we would need 2 mutable references, which is impossible in rust.
  • How do we await a query in drop? Just use the synchronous variant? What happens if the query returns a error, like connection to the database is lost?

Updated Playground

Ah, right, I should have thought more. The drop issue is obvious and I'm not sure how to solve the other one.

@sgrif My understanding of the issue. The below code is wanted:

async fn foo() {
    let x = NotSync;
    let ref_x = &x;
    some_fn(ref_x).await;
}

which, from the POV of the compiler, boils down to:

async fn foo() {
    let x = NotSend + 'a;
    some_fn(x).await;
}

We would like the future returned by foo to be Send, because the x variable and the future returned by some_fn(x), which are not Send, are local to the foo function and cannot escape it.

However, if you take 'a := 'static, this latter statement is not true. Consider:

async fn foo() {
    let x = Rc::new(5);
    some_fn(x).await;
}

thread_local! {
    pub static FOO: RefCell<Rc<i32>> = RefCell::new(Rc::new(0));
}

async fn some_fn(x: Rc<i32>) {
    FOO.with(|f| {
        *f.borrow_mut() = x.clone();
    })
    some_future.await;
    // do something with `x`
}

then you're screwed because the future returned by foo could set the thread local, then move to another thread, and there continue its use of x while the original thread may still use it through the thread local.

Now, with the original x = &RefCell example, we cannot store x in a thread local because it requires x to outlive 'static. But maybe there exist other ways to actually make x escape out of foo which I am not aware of. And even if there are none, it seems difficult to make the compiler understand this.


Hence, I believe that you would need some form of synchronization in the end.

Since you intended to use a RefCell, some lightweight RwLock which panics if it is already in use would work, something along the lines of https://docs.rs/atomic_refcell/0.1.3/atomic_refcell I guess.

If you don't even need concurrent read accesses, something with exclusive access which panics if it is already in use would have even less overhead: https://gist.github.com/rust-play/281af9e0b72dd89727aff960b23e75fe

@scalexm

If you don't even need concurrent read accesses, something with exclusive access which panics if it is already in use would have even less overhead: https://gist.github.com/rust-play/281af9e0b72dd89727aff960b23e75fe

Even that minimal synchronisation seems not to be necessary if we take &mut Connection instead of shared references. The problem there is that it seems not to be possible to formulate the life time bounds around the transaction API we would like to implement. (Updated playground from above) There is a solution provided by Nemo157 that shows that this API is possible, but unfortunately it works only with functions not with closures.
Additionally I've played a bit with manually implementing a future for a transaction type here, but I'm not sure if this is even sound (or to word it different, I'm assume that line 85-89 breaks some alias rules, even if there is only ever one function that could access the connection reference.)

@weiznich

Even that minimal synchronisation seems not to be necessary if we take &mut Connection instead of shared references. The problem there is that it seems not to be possible to formulate the life time bounds around the transaction API we would like to implement. (Updated playground from above) There is a solution provided by Nemo157 that shows that this API is possible, but unfortunately it works only with functions not with closures.

I see. I was just replying to what Sean called the « bigger issueĀ Ā» actually, but I wasn’t aware of the whole plan indeed.

Additionally I've played a bit with manually implementing a future for a transaction type here, but I'm not sure if this is even sound (or to word it different, I'm assume that line 85-89 breaks some alias rules, even if there is only ever one function that could access the connection reference.)

This is definitely unsound, as the 'b and 'a lifetime are arbitrary and unrelated, so you’re basically transmuting two lifetimes. For example, you could have 'b := 'static, and you’d be giving out a &'static mut Connection to the async handler, which could e.g. store it in a thread local, and that thread may re-use it later when the Connection has died already.

Note that adding a where 'a: 'b constraint won’t solve the problem, as you could still have 'a = 'b = 'static, so once you’ve handed out the &'static mut Connection to the async handler, you cannot use it anymore: mutable references are not Copy. The pointer « trickĀ Ā» would still be unsound as you would now be duplicating a &'static mut.

Async / await has been officially stabilised as of 1.39, can we revisit this decision? It is the main blocker for us using it at my company. We are making more and more code async so having synchronous DB connections is unacceptable.

@banool A few things about that:

  1. Diesel is run as a free time project by members of the core team. If you want to have a certain feature in a timely manner consider funding one of the core team members (for example Sean) for working on this.
  2. As explained above quite a few times: None of the core team members and none of the async people from rustc that have answered above were able to write a prototypical implementation of an usable async database interface due do compiler bugs and short comings to express certain complex lifetime bounds in combination with async callbacks. Have you solved them? Otherwise there is currently not clear path forward here, even if we want to implement it now.
  3. Are you sure that you don't overrate async support? It will give you only a performance advantage in very specific cases. Otherwise you will be fine with just using a connection pool and run your queries in a threadpool.

Is there actually a need to pass around &mut connection? For instance, what if instead of passing a connection into execute we instead made it so that execute asynchronously secured a connection from a pool behind the scenes that then had ownership passed and was used for a single query.

@Bajix Yes we need to have some kind of interface that is based on references to connections. If you base the interface on pulling out a connection from the pool, how would you model transactions? They require that you execute a set of operations on the same connection.

@weiznich Ah ok yea there's really no way to get around the need for passing around connections

What you actually want is T: for<'a> FnOnce(&'a mut Connection) -> impl Future Result

@withoutboats is there any link to track the feature you speak of? Of public discussions about it?

@withoutboats do you have any new insights here or even just a link to where we can track this feature?

What is a good way to ship queries into thread pool? Arc/Mutex seems to be pretty unergonomic (also useless as one not going to use it in other thread before it returns from thread pool, but there is no other way to let compiler know about it). Also it poisons signatures throughout the call stack (as you can't just pass connection references around).

Web frameworks commonly need asynchronous I/O while performing substantial work per request. The new runtime enables you to fearlessly use synchronous libraries like diesel or rayon in any async-std application.

https://async.rs/blog/stop-worrying-about-blocking-the-new-async-std-runtime/#a-concrete-example

That blog post is somewhat misleading. If you block in a future, you'll also block the whole task it's part of. E.g. if you use combinators like select! (I am aware that it's not called select! in async-std, don't mind that) to add timeouts to futures, they will never fire.

I agree with @Inicola, this post is likely to confuse many developers. They are claiming that we should not worry about blocking methods and can simply start using Diesel blocking API like this but it is a bad recommendation.

What they offer sounds like a way to automatically fall back to Tokios's block_in_place(). Ideally, database operations should be running on a dedicated thread pool for IO operations and that is exactly what Tokio provides with spawn_blocking().

Seems also that Tokio chose a much better wording. That said, it would be nice if some kinds of similar mechanisms would be offered by Tokio to fall back to block_in_place() when the function takes more than a couple of msecs and/or display a warning when the execution takes too much time in the default execution context.

If you block in a future, you'll also block the whole task it's part of. E.g. if you use combinators like select!

hmm, yeah, this would affect eg Stream::buffered as well, as it is pretty much an sub-executor in it self that runs as one task in the root executor.


The new runtime detects blocking automatically. We don’t need spawn_blocking anymore and can simply deprecate it.

Nice... is that a reason enought to not use async rs?

Just for reference, an if I understand correctly, the Rust feature that has the capacity to allow this to happen is async clousures: https://github.com/rust-lang/rust/issues/62290

Am I right?

_Please note that the linked issue is no place to ask about how the development is going or news._

@DavidBM Yes we would need that feature. As we are not interested in providing a implementation based on nightly we are blocked on that.

I don't think async closures particularly affect our implementation. || { async {} } is effectively equivalent to async || {} from our point of
view. Really what we need is a trait to represent them, which as far as I'm
aware is not currently being proposed, or available behind any nightly
feature gate

Please note that the linked issue is no place to ask about how the
development is going or news.

Please consider why you think you are entitled to a place to ask for
updates from unpaid volunteers

On Mon, Feb 17, 2020 at 3:31 AM Georg Semmler notifications@github.com
wrote:

@DavidBM https://github.com/DavidBM Yes we would need that feature. As
we are not interested in providing a implementation based on nightly we are
blocked on that.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/diesel-rs/diesel/issues/399?email_source=notifications&email_token=AALVMK7RG2O47VK7AXGNX2TRDJROHA5CNFSM4CL7J2H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL54E3Y#issuecomment-586924655,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AALVMK227SJFYCD7CFZWN5DRDJROHANCNFSM4CL7J2HQ
.

--

Thanks,
Sean Griffin

Is it possible to review whether a Connection needs to be Send + !Sync? It seems that from comments like this that we are assuming that connections can't be Sync. However, tokio-postgres does have Send + Sync connections, by sending requests via a channel to a handler in a separate thread, then async waiting on the other end.

Maybe this has been considered before but rejected for some reason? I couldn't find it written anywhere though. Please forgive me if this has been answered already.

From my perspective, having an async API is a way of making the API more type safe, even if performance stays the same. What I mean by this is that right now I can write:

// Will compile, but is a "broken" Future,
// since it blocks.
async fn foo(conn: PqConnection) -> Bar {
   bars.first<Bar>(&conn).unwrap()
}

I know I can use tokio::task::block_in_place, but sadly the compiler won't remind me that I have to.

@meteficha We do use libpq internally. Their docs state state the following:

One thread restriction is that no two threads attempt to manipulate the same PGconn object at the same time. In particular, you cannot issue concurrent commands from different threads through the same connection object. (If you need to run concurrent commands, use multiple connections.)

That translates quite literally to !Sync bound in terms of rusts trait system.

(A short quick search would have lead you to that answer)

From my perspective, having an async API is a way of making the API more type safe, even if performance stays the same. What I mean by this is that right now I can write:

Brining up this topic again and again won't solve anything. We do have this issue on our radar, but at least for me anytime anyone asks another question like this that basically boils down to: "Why is this not done yet?" or "Why don't you just do xyz?" my motivation to work on this shrinks even more. It's not that we are not aware what's currently possible in the type system and what's not possible…

And to finally repeat what Sean have said literally above:

Please consider why you think you are entitled to a place to ask for updates from unpaid volunteers.

Sorry if I came across as entitled, that wasn't my intention. How can I write a better comment next time?

I know that diesel uses libpq internally right now. I'll rephrase my question: would it be acceptable from your POV to have an API on top of (a) Sync + Send and (b) tokio-postgresql or something with a similar approach?

I know also that you said before that there's no point in thinking about (b) before having an async API. But it seems to me that transitioning both at the same would solve both issues. I'm not asking you to write the code, just your thoughts on the idea. Thank you.

I think this attitude of replying to interactions on github as being "entitled" is really unhealthily for a community as whole. Most of these comments are in good faith and this kind of response pushes people away.

It is fair for an open source project to have deal with other people. People in general have good intentions (which is the case in this particular instance). I understand that repetition can be annoying but I think there are better ways to handle these situation.

@meteficha

How can I write a better comment next time?

I consider this issue as a place to discuses the technical base of a potential implementation. That means I expect from people commenting here that:

  • they've read all existing comments
  • they assume that we are aware of this issue, have done quite a bit of research in different solution and have already talked to some members of the related compiler/type system/async working groups how to solve the problems we've identified.
  • they are adding something new to the discussion in a constructive way
  • they've done some research about what they proposing and how it fits in the bigger picture

For questions or short suggestions in the style of "Why you are not doing xyz" this is simply the wrong place. For ideas that may not be fully complete I would prefer to have such things in our forum with a concrete explanation why this could help.

I know that diesel uses libpq internally right now. I'll rephrase my question: would it be acceptable from your POV to have an API on top of (a) Sync + Send and (b) tokio-postgresql or something with a similar approach?

The problem here is that we won't drop libpq anytime soon as far as I can tell. There is already a connection implementation based on the postgres crate in #2257. So it is possible to write something based on top of tokio-postgres. As mentioned there we wont do anything of that till the release of diesel 2.0 because of there is only that much time in a day. That said this does not mean we can just base the interface on Send + Sync is this is not the common case. It should be possible to implement such an interface for all backends that support some kind of async interface (even those not officially part of diesel itself). Send + !Sync is just the common case there, as far as I'm aware.

@orium

I think this attitude of replying to interactions on github as being "entitled" is really unhealthily for a community as whole. Most of these comments are in good faith and this kind of response pushes people away.

I totally understand that most of the comments here are in good faith. Nevertheless we consider our issue tracker to be a tool for the contributor team to organize what should be done and how it should be done. So this is not the right place to ask questions or ask about the status of a current feature. We have other platforms like the gitter channel and our forum for this, where you don't send a notification to quite a lot people (as you do when posting to this issue). As stated above I consider (especially) this issue to be an place for technical discussion. For anyone participating here I would expect that they have done at least a moderate amount of research about why things are as they are. For example for the last case it wouldn't have been that hard to find out that we use libpq and libpq has those constrains. Also that there is an open PR for using the postgres crate. Additionally it should be clear that such an API cannot be postgres specific.

It is fair for an open source project to have deal with other people. People in general have good intentions (which is the case in this particular instance). I understand that repetition can be annoying but I think there are better ways to handle these situation.

Generally I'm quite happy to help with questions/issues/….
his case is a bit different because there have been quite a few people that are basically requesting that this implemented now, because they need it for some business case. Making such requests to an open source project run by unpaid volunteers is just rude. That in combination with people commenting here, saying "Why not just do xyz" where it is clear after maybe 2 minutes of thinking/trying out that "xyz" will not solve the problem, puts me into a bad mood as soon as I see this issue popping up somewhere in my notifications.
As this sounds now maybe harder then it should read: It's just this is not the right place to ask questions or propose vague plans. We have other platforms for such things.

@weiznich That's fair answer. And I mostly agree with what you are saying. Maybe it is more of question tone than the underlying message: overall what you said makes to me, but I think the phasing can better. For instance, I can easily empathise with your previous message, and I think most people can, but the "entitled" message before immediately feels like coming from an adversarial position.

But I do understand it is a continuous problem and it is not easy to solve.

I think, if the collaborators will do this or not, it's the right of community to ask anything, and for devs, give the response or not... But make you wish the rule for all ... I can't even think about that..

Questions about this and others features always will exists, and people need know how to speak with people. If not receiving any money and it's totally from your free time, congratulations, you rocks and make awesome things for everyone, but make this like "I don't want wanna listen you because this"... I think I'm not will use this lib because devs politics and i think other people will do the same.

Please think about this because it's not good for the community.

Perhaps one way in which anyone working with diesel can accomplish full async today would be to use SQLx and Diesel along side each other. In this scenario, diesel would exclusively be used as a query builder and the actual query itself would be passed into SQLx as a &str and then the structs would have traits for both so presumably it should just work and fit nicely side by side. I'm sure there are ways to make such a thing very ergonomic. Perhaps a discussion could be had on the feasibility of accomplishing async await via query building and library integration? Such an integration would sidestep the issue entirely in the near term and then over time both could be full async and it would be sort of a use case specific thing where both flavors as used within a project.

@Bajix In my opinion that is not as "easy" and straightforward as you suggest. Just passing a &str to SQLx only works for very simple queries. As soon as you include some user provided value in your query you need to handle bind params, which would require some sort of prepared statement support (which is not exposed by SQLx) or requires to convert diesels internal bind value representation into SQLx exposed bind value representation, which is not something simple or straightforward.
Additionally you would loose much of the features diesel provide. Especially the flowing features:

  • Compile time checking that the returned struct match the query
  • Optimized prepared statement caching

(Beside of that I'm not sure if I really understand why everyone seems to think async will magically improve your performance. In my experience it won't. It will be slower in the the general case and lead to code that is harder to understand and reason about. Has anyone already written a meaningful benchmark showing that async brings clear performance wins in this case? Or that shows that SQLx (or some other async rust database library) is so much faster than diesel? At least in my own benchmarks that is not the case.)

So going down this route is nothing I personally willing to spend my time on. If someone want's to try it, sure feel free to give at a try.

@weiznich speaking of other approaches, do you have an opinion on tokio-diesel? By just looking at the example and the test it looks interesting (but I am missing the context to say more)

@apiraino tokio-diesel does "just" use a threadpool to run the database queries on. So that's no real async but only that variant of async that you can already use with diesel and any other runtime today, by just doing something like `runtime::block_(||/Some diesel ops/)

Time flies, it has been four years, I think people should notice that diesel will not support async for the next 3-5 years, so let's switch to other stuff if async really matters.

Even a plan can ensure people, no matter it's 2 years or 3 years people will be happy to see its future, but there isn't.

@GopherJ If you're interested in helping us add support, feel free to come work with us on adding support. But your comment reeks of entitlement and is not welcome here.

I think that if there's enough value to do this (and judging by the fact that SQLAlchemy just did, there is) there will be enough companies and people willing to donate money to this effort so we can hire one of the maintainers to do it.

How about we reopen this issue and attach it to bountysource or some other issue hunting service?
At some point, we'll raise enough money. I'm willing to donate 5$ for that purpose.

@sgrif What do you think?

As this issue currently shows up in quite a few post as argument that diesel won't implement async I want to clarify a few things.

First of all: We won't stop anyone working on a async diesel API. If someone is interested in that at least I will try to find some time for mentoring and code review there. That said: For me the async interface is not that important because I likely will not require it for my own projects in a foreseeable future. Therefore my motivation to work on it is quite low as long as there are other features that are more important for me. (Currently that means adding complete support for group by clauses and support constructing dynamic select clauses). Additionally as already stated multiple times before diesel is a free time project for me, so I'm not able to spend a unlimited amount of time for working on diesel. Additionally at least I see at least a few technical barriers that first need to be solved at language level (see below for details), which is not something I'm willing to do in my free time. Therefore I will likely not be able to work on async myself for a foreseeable time. Again that does not mean diesel won't support async. It just means someone needs to step up and start working on it. Or to word it differently: You cannot expect that someone else will step up and do the work for you.

So now to the technical part:
In my opinion the main technical blocker here is the open question how to model a sound async transaction API. As transactions are one of the core features of an relational database system it is in my opinion crucial to get that right or at least sound. Before someone starts saying: Hey the other async database drives have already solved that problem. The short answer is: No they haven't solved the underlying problem. They all provide an API that is prone to misuse or has quite large restrictions. Just let's have a look at the currently possible options:

Modeling transactions as guard object

This is done by sqlx, postgres and quaint. The API looks something like

struct Transaction {/* … */}

impl Connection {
    async fn transaction(&mut self) -> Result<Transaction>;
}

impl Transaction {
    async fn begin(conn: &mut Connection) -> Result<Self>;
    async fn commit(self) -> Result<()>;
    async fn abort(self) -> Result<()>;
}

async fn use_a_transaction(conn: &mut Connection) -> Result<Something> {
    let transaction = conn.transaction().await?;

    transaction.execute("Some query").await?;
    match transaction.execute("Some other query").await {
        Ok(o) => {
            transaction.commit().await?;
            return Ok(o);
        }
        Err(e) => {
            transaction.abort().await?;
            return Err(e);
        }
    }
}

The important question here is: What happens when a transaction object that represents a open transaction goes out of scope.
There are multiple options here:

  • It's a programmer error and we have a dangling open transaction now for our connection. It seems like this is the approach chosen by quaint (Note that there is nothing like a Drop impl there)
  • We abort/commit the transaction on Drop. At first this sounds like a good solution, but this would block on drop, so that's in my opinion not a good option for a real async API. Also how would one handle connection errors in the Drop implementation?
  • We mark that the connection has an open transaction and perform the rollback before the next operation performed on that connection. This is the approach chosen by sqlx. My problem with this approach is that we have a potential dangling transaction now if we do not perform any other operation in a timely manner. This can result in a degraded performance at database side, because the database needs to assume the transaction is ongoing till the abort command is really received.
  • Use the Drop impl to schedule an abort on a background thread. This approach is chosen by tokio-postgres. It requires that the database library starts and handles threads on it's own. That's much more complexity that I would like to have for such an feature. You would be back to basically have a thread per database connection to perform rollbacks, at which point you could just use a thread pool to simulate a async API by performing all "blocking" database operations in that thread pool. Additionally, again: It is not clear how to handle failures here.

The underlying problem can maybe partially solved at language level by introducing something like async Drop.

Using a closure based API as diesel currently provides in the non-async interface

Such an API would look like this:

impl Connection {
    async fn transaction<F: Fn(&mut Self) -> impl Future<Out = Result<R>>, R>(&mut self, f: F) -> Result<R> 
    where /* lots of complicated bounds here */;
}

async fn use_a_transaction(conn: &mut Connection) -> Result<Something> {
    conn.transaction(|conn| {
     async {
         conn.execute("some query).await?;
         conn.execute("some other query).await
    }})
}

The problem here is that we cannot write appropriated trait bounds for the closure yet. The best we came up with yet is this playground, which requires you to pass a function instead of a closure as argument to the transaction function. While this would work in theory, in practice this means it is not possible to pass any custom value to the transaction function, which in turn would servly restrict the usefulness of this approach. Sqlx seems to provide something like this, but I was not able to get it to work.
For possible fixes I'm not sure what's exactly the problem here. Something like the async counterpart of the Fn trait family in std-lib would certainly solve this. Probably it's also possible to extend lifetime inference around closures in such a way to accept the closure in the playground linked above.

Summary

Give the technical problems outlined above I cannot see how this issue is actionable in a sense that someone can start working on this in diesel itself. As open issues in our issue tracker represent something that is actually actionable this issue is closed because it is not actionable for the reasons explained above. Again this does not mean we are not willing to implement an async interface only that we are not able to do that without changes to rust itself. If someone has a different opinion here feel free to submit a prototypical API design as playground here or in our forum. Otherwise please to do not comment with: "But we need async because of …", that was already written many times before.

@weiznich For async closures you need something like this:

https://github.com/launchbadge/sqlx/blob/67099d993c5c9eb575ca530c0c94c9a9066c1ebd/sqlx-core/src/pool/options.rs#L145-L152

Its not the most ergonomic thing to use. But it does work. It seems the problem in your linked issue was that we didn't fix the Connection::transaction function (but we have others that take async closures that do work).

@weiznich @mehcode and it's possible to do a blanket impl to accept both async fn and async closures.

https://github.com/algesten/hreq/blob/c35c823654cf7eee7efbc8c987ed28545fb99e78/src/server/handler.rs#L44-L62

The issue is not that we don't know how to write the impl, it's that
compiler bugs regarding projection through associated types prevent
closures from being passed in

On Tue, Sep 1, 2020 at 5:09 AM Martin Algesten notifications@github.com
wrote:

@weiznich https://github.com/weiznich @mehcode
https://github.com/mehcode and it's possible to do a blanket impl to
accept both async fn and async closures.

https://github.com/algesten/hreq/blob/master/src/server/handler.rs#L44-L62

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/diesel-rs/diesel/issues/399#issuecomment-684773560,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AALVMK2T2Z65WCNRZCHE3ATSDTI45ANCNFSM4CL7J2HQ
.

--

Thanks,
Siân Griffin

@lolgesten That impl looks nice and simple but my guess is it would explode in Diesel and SQLx because when you have something as generic as the database-agnostic trait setup that's present in both, we start getting into areas of Rust that are really broken / unfinished.

@sgrif, What we (SQLx) have works with closures and through an HRTB and associated types:

PgPoolOptions::new()
    .after_connect(|conn| {
        Box::pin(async move {
            conn.execute("DO YOUR THING HERE").await?;

            Ok(())
        })
    })
    // [...]

The primary issue is that's ugly to write for the user. If it's something written a lot, maybe a macro might help. Not sure if a macro can break up the closure |...| from the block and add the boilerplate.

diesel::transaction!(conn, |conn| {
  // [...]
}).await?;

@lolgesten and @mehcode Thanks for your suggestions. Here are the corresponding playgrounds: Boxed, Handler.

The approach suggested by @mehcode Technically works but let me cite @sgrif here:

I'm not willing to ship such a subpar API, especially to work around a compiler bug that will eventually be fixed

Please keep in mind that diesel already released a 1.0 release, so that changing an existing API at a later point comes with a much higher cost. I agree with their opinion that this API (especially the boxing) is not on par with other parts of the diesel API, so we definitively will change that as soon as possible.

@lolgesten Approach would allow us to solve that theoretically. Practically we run into https://github.com/rust-lang/rust/issues/59337 again. Therefore at least I assume that adding full async support is still blocked on that rustc issue. So everyone arguing that diesel should support async now, should start fixing that issue first.

2507 Adds has a large amount for diesel and various async rust database crates. For all people claiming that async will make things faster: I cannot see that in those benchmark results, more like those crates are preform significant worse in most situations. Feel free to submit optimizations/changes there, so that a clear performance advantage of those crates is visible.

For all people claiming that async will make things faster: I cannot see that in those benchmark results, more like those crates are preform significant worse in most situations.

Shouldn't io_uring make things faster (readiness-vs-completion, syscall batching in the age of spectre, etc.)? But relies on async. Yeah, io_uring support isn't there right now and we need to first figure out how to make it work in rust (there are promising experimental crates for that, but nothing "stable"). I think an investment in async will pay off in the future.

Is using async_std task spawn like below safe in a web handler?

pub async fn query<F, T>(pool: &PgPool, f: F) -> Result<T, Error>
where
  F: FnOnce(&PgConnection) -> T + Send + 'static,
  T: Send + 'static,
{
  let conn = pool.get()?;
  let res = async_std::task::spawn(async {
    let conn = conn;
    let res = f(&conn);
    Ok(res) as Result<_, Error>
  });

  Ok(res.await?)
}

@ibraheemdev If I get it right, async_std now handles blocking tasks without specifying spawn_blocking, so this should work all right.
But it doesn't really solve this issue: what happens in your new async task is still blocking.

In general (i.e. regardless if you use tokio or async_std), I would still recommend using spawn_blocking for the time being, just to make sure the task gets dispatched to a thread pool dedicated to blocking tasks.

Leaving a note for people trying to use diesel in an asynchronous context: Check out https://github.com/djc/bb8, it’s an async connection pool with an adapter for diesel, and I found it works very well 😃

@weiznich about the benchmarks, do the results apply for separate network-connected webserver / db?

I often saw benchmark results with webserver and db on the same machine... so I'm wondering if the results would transfer to conditions with network-latency and timeouts / etc. (something like a 'kinder' version of jepsen?)

eg. if there's a lot of webserver-db connections, and timeouts/disconnects happen, blocking-thread-only model will ~ , whereas async model will ~

@devdoomari3 I've run all benchmarks against a local installation of the database system, as it is quite a lot of work to get a reproducible setup for simulating a network-latency. Non of them includes running a webserver, as this is not required for any of those benchmarks and would just add some more variance. If you are interested in numbers including network-latency and timeouts feel free to contribute a corresponding setup, which is somewhat reproducible by others.

eg. if there's a lot of webserver-db connections, and timeouts/disconnects happen, blocking-thread-only model will ~ , whereas async model will ~

As always with such claims: Please provide a concrete benchmark with numbers. Otherwise this is just an unscientific guess. My own benchmarks shared above indicate something different. The basic issue here is that you cannot have tens of thousands of database connections, to handle all your webserver requests. In practice most database systems handle a few tens of connections quite well. For everything else you use a connection pool in your server, which will then be the bottleneck in such a scenario.

For all people claiming that async will make things faster: I cannot see that in those benchmark results, more like those crates are preform significant worse in most situations.

@weiznich I was asking if your benchmarks apply in situations where clients-of-db (mostly webservers/etc) and DB are on separate machines. (and also wondering if anyone's doing the benchmark with such assumption)

In practice most database systems handle a few tens of connections quite well. For everything else you use a connection pool in your server, which will then be the bottleneck in such a scenario.

ok then so even there's a high-latency issue, it's like only a few MBs of ram wasted for non-async case?

@devdoomari3

ok then so even there's a high-latency issue, it's like only a few MBs of ram wasted for non-async case?

If I understand you correctly here, you mean you could work around that by having aync connections? Or by having more connections? Could you elaborate on that, because as far as I know database systems tend to handle a large number of connections not so well (and that's not just a issue of add more RAM, but a fundamental constrain (see here for example). Also adding async here will not remove the bottelneck, because most database systems that I'm aware of assume that on one connection only one query is run in parallel (see this documentation for example). So if you make such statements please provide some sources for your claims otherwise this is not the right place for such comments.

@weiznich I was asking if your benchmarks apply in situations where clients-of-db (mostly webservers/etc) and DB are on separate machines. (and also wondering if anyone's doing the benchmark with such assumption)

As I've written above: I've run those benchmarks with everything on the same PC. I doubt that the numbers will be much different for running components on different PC's, but anyone claiming that is free to take the code I've linked above and proof me wrong.
Otherwise I've seen no solid benchmark yet showing that a database library must be really async to perform well in such contexts. I would assume that:

  • Most users do not even need async, because they never will get that much traffic that it matters (even crates.io does well without using async, so if you don't have the size of crate.io you likely don't need async).
  • Most remaining cases could likely be solved with a async connection pool (look at bb2 they have a diesel backend) + a thread pool to run the actual queries
  • Which leaves only a tiny fraction of use cases that really need a async database library. We now likely talk about big commercial entities, where at least I say that I wont develop that on my on for free.

Most users do not even need async, because they never will get that much traffic that it matters (even crates.io does well without using async, so if you don't have the size of crate.io you likely don't need async).

This is not the point. I'd much rather spend my efforts writing code against an async API that is still sync under the hood knowing that I will be able to leverage free performance gains in the future and/or be able to scale up easily. It should be completely uncontroversial that this is the way to go, browsers have been doing it for years - even for mundane things like DOM manipulation - just because the potential for optimization in the future hardware is that great.

This is not a criticism of the work of Diesel maintainers, it's completely ok that they have different priorities. Saying that sync is ok in 2021 is just insane in my opinion, though, especially for things that involve network connections.

@jupp0r Instead of repeating the same claims again you could contribute something constructive to this discussion, like:

  • How to fix the rustc bug linked above
  • How would an async connection API look like based on the toy implementation given above
  • Or even an benchmark with some reproducible numbers backing up your claim

Otherwise please don't comment on this issue, as it is meant to be used for technical discussions.

It might help simmer things to discuss the downsides a bit more. My understanding is that the benefits are moot because async await is incapable of scaling database connections and having a threadpool allocated to connections is a non-issue. At best async await reduces thread context switches and saves a marginal amount of ram but what it does not do is increase the connection count nor the query-per-connection throughput. Coming from other languages this would be vastly more costly, but Rust is extremely good at multithreading and so the calculus is different. Doing blocking database queries actually even makes a lot of sense here because at present that results in better optimizations that keep connections utilized. We have essentially reached the hard limit on queries x connection and no amount of async i/o will scale this further and the marginal benefits to be be had will reduce resource usage but won’t actually increase throughput. Using something like Actix web block lets us ensure blocking is only happening for queries. Because of the closure bug, all async closures need to be of a static lifetime and so this would add complexity and require the use of Arcs, boxes and sub traits. The library would be harder to use, less efficient to optimize, and it still wouldn’t yield more queries over time even though it would be leaner because ultimately the connection is the bottleneck. To actually make headway requires async closures with non-static lifetimes and probably even async traits too. Until that point, the benefits are just not there and the downsides can’t be overlooked. It is a nice to have feature, but it’s not for right now. Async connections can yield the best at present gains

Edit follow up:
Here's an interesting side question to ponder: how much would it actually take for Diesel to max out Postgres? The default max connections for Postgres is 100, but this can be scaled up to 300. But that's it; 300 is more or less a hard max AFAIK. To scale further than 300, one could employ PgBouncer but all that's really doing is splitting out connection pooling into a service and the 300 connection ceiling remains. The cost per thread is ~2mb, and so in totality were every connection to be fully utilized this is still only 600mb of RAM spread across however many nodes. There is still of course a lot of value to be had were this to be able to be optimized down to a thread-per-core architecture that requires no thread context switches and is entirely non-blocking but realistically speaking this isn't going to happen until tokio adds io_uring. So a lot really needs to happen as a foundation for any substantial improvement here, but let's not look a gift horse in the mouth here and overlook just how good Diesel actually is at present. What's true of other languages isn't necessarily true of Rust, so when we look at issues like this we also have to consider it from the perspective of Rust's strengths and not of other languages weaknesses; that's what it means for a language to break the curve. Rust is the best language for multithreading and being incapable to effectively using resources is simple not an issue that Rust has. For anyone coming into this from NodeJS it's easy to see the lack of async await as blasphemous but we need to get over that because comparing a single threaded language to a multi-threading language is apples to oranges and overlooks that even without every conceivable optimization Rust is still vastly better. At present Diesel is actually pretty great, and while it's fun to think about how much better it will be in the future, let's not overlook how good it actually is here and now. This is not NodeJS; this is Rust, and we multithread and do so well!

On Jan 24, 2021, at 12:12 AM, Georg Semmler notifications@github.com wrote:


@jupp0r Instead of repeating the same claims again you could contribute something constructive to this discussion, like:

How to fix the rustc bug linked above
How would an async connection API look like based and the toy implementation given above
Or even an benchmark with some reproducible numbers backing up your claim
Otherwise please don't comment on this issue, as it is meant to be used for technical discussions.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

This is not a statement that async I/O will never happen, but it is a statement that it is not currently on the roadmap, and I do not consider this to be actionable at this time. Neither tokio nor futures have stable APIs at the moment (tokio in particular is supposedly going to have a major overhaul at some point).

Tokio 1 is out, is this point still relevant @sgrif?

@ShadowJonathan That does not really matter, as there are quite a few other blockers pointed out in this thread. Non of them are resolved.

@weiznich you said:

In my opinion the main technical blocker here is the open question how to model a sound async transaction API.

I don't understand why handling transactions safely should be an issue handled by a rust mechanism? I mean it would be cool if sth. like a guard pattern can help, but if not then it's up to the developer to do it right. And while it seems like that without any macro magic it's not possible make developers always use transactions correctly, I think it's fine to say, that e.g. the transaction guard's drop fn will panic, if the transaction haven't been committed.

@DaAitch As diesel tries to provide as much guarantees at compile time as possible it is not an option for us to just panic on an API misuse, especially if this misuse could be prevented by a better API design.

Was this page helpful?
0 / 5 - 0 ratings