Scrapy: Discussion regarding Asyncio

Created on 3 Feb 2018  路  4Comments  路  Source: scrapy/scrapy

I have been going through the GSOC projects of last summer, and I was interested in asyncio prototype. Now, there was one issue that came after I searched through the details of asyncio. Asyncio does not have any built in abstraction for error handling, unlike Twisted, so that means up a lot of work regarding error handling which is quite error prone when it is handled manually. So I wanted to discuss as to how would the developers would go about the error handling of asyncio ?

gsoc-candidate

Most helpful comment

I am quoting below answer from my email reply:

Hello Yash,

I hope you are referring to this project http://gsoc2018.scrapinghub.com/ideas/#async-await-spiders

  1. The description in the project states that, scrapy should use the keyword async and await for using the inbuilt library asyncio.

Although it list asyncio on required skills, we don't plan to use asyncio at all. Twisted already support await/async syntax introduced by Python 3.5.
Please, see https://twistedmatrix.com/documents/16.4.1/core/howto/defer-intro.html#coroutines-with-async-await

I also found that backwards compatibility needs to be maintained. As asyncio is not supported by python 2, so what does 'supporting backwards compatibilty' means?

Scrapy by itself should remain Python2.7 compatible, but if a user creates a project under python 3.5+, she/he can use await/async initially for Spider's code, then for middlewares and extensions.
Spiders are the priority, once that is covered we can take a look at supporting this new syntax on other extensible parts of Scrapy.
A future project, not for this GSOC, is to modernize Scrapy core itself by dropping pre Python3.5 support, that will be the foundation to Scrapy 2.0 but as I said, it is not the goal of this project.

  1. Does the new version support Twisted framework?

Yes, we won't change Twisted by asyncio, we are adding await/async syntax support only and that works under Twisted 16.4+

I hopes that is clear, don't hesitate to ask.

thanks

All 4 comments

Hi @yashrsharma44 - Great question. I think the spirit of the Asyncio project is to bring Scrapy's functionality closer to CPython's core features, and to give us an alternative to a very large and sometimes limiting dependency (Twisted). So, the potential solutions don't have to be based on "bare" asyncio; it could mean finding alternative, more lightweight, preferably pure-python frameworks to Twisted that are based on asyncio.

Twisted was late to the Python 3 party, and this was a limiting factor in Scrapy's migration. That's an example of the kind of problem that moving closer to CPython/Asyncio might help solve. Reduced abstraction is another. And, if the solution _were_ pure-python, then perhaps it would be easier for contributors to Scrapy to also contribute to the back-end project.

Are you interested in taking part in GSoC as a student?

Thanks for the reply. I sent a mail regarding my queries regarding the project, so I am reproducing the queries here -

_I wanted to discuss with you about the project of using asyncio in Scrapy, part of Google Summer of Code 2018 project.
There were some points which were bothering me, so I decided to ask you -_

_1. The description in the project states that, scrapy should use the keyword async and await for using the inbuilt library asyncio. Now asyncio has been introduced after python 3.4, and async and await keywords have been introduced after python 3.5. So how will scrapy support the earlier version of python 3? Can we use decorators for using coroutines for this issue?_

_2. I also found that backwards compatibility needs to be maintained. As asyncio is not supported by python 2, so what does 'supporting backwards compatibilty' means?_

_3. Does the new version support Twisted framework?_

I am quoting below answer from my email reply:

Hello Yash,

I hope you are referring to this project http://gsoc2018.scrapinghub.com/ideas/#async-await-spiders

  1. The description in the project states that, scrapy should use the keyword async and await for using the inbuilt library asyncio.

Although it list asyncio on required skills, we don't plan to use asyncio at all. Twisted already support await/async syntax introduced by Python 3.5.
Please, see https://twistedmatrix.com/documents/16.4.1/core/howto/defer-intro.html#coroutines-with-async-await

I also found that backwards compatibility needs to be maintained. As asyncio is not supported by python 2, so what does 'supporting backwards compatibilty' means?

Scrapy by itself should remain Python2.7 compatible, but if a user creates a project under python 3.5+, she/he can use await/async initially for Spider's code, then for middlewares and extensions.
Spiders are the priority, once that is covered we can take a look at supporting this new syntax on other extensible parts of Scrapy.
A future project, not for this GSOC, is to modernize Scrapy core itself by dropping pre Python3.5 support, that will be the foundation to Scrapy 2.0 but as I said, it is not the goal of this project.

  1. Does the new version support Twisted framework?

Yes, we won't change Twisted by asyncio, we are adding await/async syntax support only and that works under Twisted 16.4+

I hopes that is clear, don't hesitate to ask.

thanks

Closing in favor of #3148.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

osmenia picture osmenia  路  3Comments

tonal picture tonal  路  3Comments

Urahara picture Urahara  路  4Comments

LokiSharp picture LokiSharp  路  3Comments

mkaya93 picture mkaya93  路  4Comments