On the lecture of July 6th Sebasti谩n was talking about FastAPI and ML models, and he said that is better to use sync notation instead of async when serving ML models. I didn't understand the motive, could someone explain it?
Thank you!
He recommended the use of sync rather than async functions when serving ML models because most ML operations are CPU intensive and the program would benefit from being able to do more computations in parallel. This means that in most cases the time will be spent actually doing this work rather than waiting around, rendering theasync notation less useful in speeding up the program.
If I understood correctly, I think he suggested running several processes in parallel for CPU intensive ML models, which would allow to generate several predictions at the same time for different requests.
Makes sense, thank you @rkbeatss
You can close the issue if you don't have other questions.
Thanks for the help here @rkbeatss and @phy25 ! :clap: :bow:
Thanks for reporting back and closing the issue @Kludex :+1:
It's mainly because:
Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second.
Ref: https://docs.python.org/3/library/asyncio-dev.html#running-blocking-code
By using normal def functions FastAPI runs them in a threadpool with loop.run_in_executor().
Most helpful comment
Thanks for the help here @rkbeatss and @phy25 ! :clap: :bow:
Thanks for reporting back and closing the issue @Kludex :+1:
It's mainly because:
Ref: https://docs.python.org/3/library/asyncio-dev.html#running-blocking-code
By using normal
deffunctions FastAPI runs them in a threadpool withloop.run_in_executor().