Actix-web: App::data vs App::app_data

Created on 16 Apr 2020 · 11Comments · Source: actix/actix-web

I see this come up time and time again on gitter of it not being clear what the differences and advantages are of these two similarly named functions. It's my opinion that this isn't simply a documentation issue.

Opening this issue to track thoughts and proposals. I suspect any solution will require breaking change so want to take some action before v3 release.

C-improvement

Source

robjtede

👍13

Most helpful comment

Here are some points that were raised in an enlightening discussion between @robjtede and @rkfox on gitter about this issue:

At the moment the main reason why we have both .data() and .app_data() is so that a data type that is already wrapped in an Arc can be passed to app_data() to avoid double wrapping in an Arc<Arc<T>>.
The code tries very hard to accommodate non-clone-able data but in order to achieve this in a generic way it has to wrap everything in an Arc<T>.
If we enforce that all data passed to .data() implements both Send + Clone then we no longer need .app_data() - any data that isn't can simply be wrapped in an Arc<T> by the developer.

pfrenssen on 18 May 2020

👍6

All 11 comments

Hmmm, any chance we can go ahead and summarize the pros/cons?

In my head, there are 2 main types that users desire:

data across threads or instances of your application (Arc struct, etc)
data localized to the current thread (maybe you want statistics about that thread?)

In addition to these, I think v3 would do well with the concept of HttpRequests having "per request" data that is similar to how Node.js Express handles the .next() concept? I know we briefly talked about chained handlers like that?

cdbattags on 16 Apr 2020

Can the two be combined?

Dowwie on 16 Apr 2020

I think these could be merged. Following the code paths, I think I've found out why certain choices were made; most of which are around the resolution of DataFactorys.

There are currently 3 methods on App (some exist elsewhere on Scope, ServiceConfig, etc.) for enabling extraction from the eventual ServiceRequest::app_data call:

App::app_data - takes arbitary data and adds directly to an Extensions map
App::data - takes arbitary data, wraps it in Data, and adds to a vec of dyn DataFactory for resolving in AppInitResult::poll. Set up this way because ServiceConfig (used in App::configure) stores a vec of dyn DataFactory and a basic Vec::extend is used to merge them.
App::data_factory - takes data factory functions and adds them to a vec of FnDataFactorys which get called in AppInit and the returned futures resolved in AppInitResult::poll.

My research is not yet done (there may be errors or omissions above) but I strongly suspect that this can be simplified. App::data should be able to take arbitary items and add directly to Extension and App::data_factory should be able to take an impl DataFactory wether its a struct or fn.

I don't think that App::data should be silent wrapping in Data, it's non obvious and creates a disconnect between the set site and the extract site. In my thoughts, an argument for ::data should clearly spec what is being stored, even as far as its "clone" semantics, eg not wrapping in Arc implicitly and letting the use select this either explicitly as Data<Arc<T>>, or some semantic wrapper SharedData<T>. (It does seem that the implicit Arc is enabling Data to not need to be Clone but there may be reasonable ways around this.)

robjtede on 20 Apr 2020

👍1

Here are some points that were raised in an enlightening discussion between @robjtede and @rkfox on gitter about this issue:

At the moment the main reason why we have both .data() and .app_data() is so that a data type that is already wrapped in an Arc can be passed to app_data() to avoid double wrapping in an Arc<Arc<T>>.
The code tries very hard to accommodate non-clone-able data but in order to achieve this in a generic way it has to wrap everything in an Arc<T>.
If we enforce that all data passed to .data() implements both Send + Clone then we no longer need .app_data() - any data that isn't can simply be wrapped in an Arc<T> by the developer.

pfrenssen on 18 May 2020

👍6

I think .data() and .app_data() should be merged.

Configs like PayloadConfig and JsonConfig are extracted from Extensions. They will be respected when using .app_data() only.
There are examples of JsonConfigs, of cause. But the differences of .data() and .app_data() are not stated in documents. And it's really confusing.

zimingWu on 22 May 2020

Sorry I'm also researching this topic and is confused by the example in https://docs.rs/actix-web/3.0.0-alpha.3/actix_web/web/struct.Data.html . The description says the purpose of constructing web::Data::new outside and pass it to App::app_data is to avoid double Arc. But why not just pass the Mutex to App::data? In that case App::data will only wrap it in an Arc once, right? I guess the original intention of this example is to show how you can pass a data that is alreay Arc into app_data without wrapping it in web::Data::new, as @pfrenssen mentioned?

shinglyu on 27 May 2020

Just cross-posted to gitter and got a great explanation from fakeshadow

fakeshadow @fakeshadow 17:23
You can pass a Mutex to Data but it would result in a mtex for every thread of your app
If you don't care about sharing your state between thread. Use RefCell instead of Mutex

Shing Lyu @shinglyu 17:25
Ah OK, so in order to share the Mutex across threads I inevitably have to wrap it as Arc<Mutex<>>, so using data will cause double Arc, am I correct?

fakeshadow @fakeshadow 17:25
sry I mean App::data. Once you pass your Mutex to Data::new then you can clone and share it between thread
Yes. You need something like Arc to clone your pointer for the Mutex
Mutex itself can't be cloned

Shing Lyu @shinglyu 17:27
Got it! And sorry I'm very bad at these smart pointers. Can you elaborate on how RefCell works in this case?

fakeshadow @fakeshadow 17:28
RefCell is like Mutex for single thread. You can borrow it in async context on one thread but it can't be shared by other thread
If you have some mutable state for every thread of you App and you don't need to share them it's more perfomant than Mutex

Shing Lyu @shinglyu 17:30
I see, so each thread got its own RefCell that works independently

fakeshadow @fakeshadow 17:30
Yes exactly

shinglyu on 27 May 2020

👍1

Yes, data() and app_data() are confusing. I created a simple AppState with

    pub struct AppState {
    pub settings: Settings,
    pub db_pool: deadpool_postgres::Pool,

}

where Settings is just some data read from environment variables.
This struct has no Arc or Mutex. I still cannot use data(), it forces me to use app_data() to pass state to App instance.
It would help to clarify this.

peshwar9 on 20 Jun 2020

app_data can be shared in multi HttpServer, data just shared for one HttpServer, I think just rename app_data to global_data, and should we rename wrap function to middleware? It is clearly.

Silentdoer on 13 Jul 2020

👍1

And states is not store in same map by data and app_data, so we should not merge the two function

Silentdoer on 13 Jul 2020

Just my 2 cents.

Data type should be kept and changed to a wrapper of Rc instead of Arc. We don't share it between threads and It's just that the underlying data can point to shared pointers. An addtional layer of Rc would barely have any performance impact and we already use it all over the place internally in actix-web.

Adding the fact that FromRequest trait only accept static life in it's returned future. If we ever want to keep a convenient API for extracting Data in handler fnarg then it's almost must be a cloneable pointer.

So if we get rid of the wrapper type then it's the user's task to implement Clone for his/her data type and/or extract it manually in handler function. (We don't necessarily need the type to be Clone if we extract it manually in handler and only use the reference there).

We can also keep most of the Data related APIs as it is now. The only break change I can think of is that Data::new would not work any more.