Web3.py: Async versions of web3 apis

Created on 12 Sep 2018 · 15Comments · Source: ethereum/web3.py

What is it?

Implement w3.async.<something> which is an async mirror of APIs available on w3.<something>.

This will be exploratory work to figure out how we introduce an async/await friendly API into web3.

Source

pipermerriam

👍2

Most helpful comment

This may seem out of the blue, but I thought some suggestions and/or tooling might help out a bit in building the async stuff. Sorry I do not have time to write examples or do full technical summary. I will try to give more details as soon as possible if any of this looks promising to you gents.

First, please feel free to cannibalize anything that may be useful from this repo:
w3aio

That project is essentially some backend code we use in house for various projects. It is a limited replication of web3py in async await syntax with some other tools baked in. I am not looking to maintain a repo of the size/scope of web3py. I am just putting it out for you guys to see how we handled some of this in the backend of an async system. Sorry for the lack of docs. I will work on that as soon as possible. Until then I am happy to help decode any of the confusing bits.

If you guys are struggling with linking the middlewares, take a look at Paco and Janus.

Janus

Paco

Janus is a sync/async thread safe Queue and Paco is an async based functional framework that may help stitch the awaitable pipeline together. Specifically check out paco.compose.

How will we coordinate transactions into the correct order according to the nonce?

I don't think we do this now so in theory I think it's not an async specific problem.

On this point, just put a Queue in front of the transport object. Use the queue as a Fifo and ordering is maintained. This also helps with graceful reconnect since messages may be re-pushed on connection failure and handled once connection is stable again. If transaction handling vs call handling is the concern, callbacks may be used as one option. The other option is to carry a request Q for eth_call(s) and a transaction Q for state modifying methods. In this way you get fast read operations with the ability to ensure ordering of transaction nonce(s).

nelsonhp on 17 Oct 2018

🎉2 ❤1

All 15 comments

Note previous conversations in #657 and #856

I wanted to think through alternatives for co-locating the logic for sync and async inputs. I was wondering if we could still do something functional. This isn't a working example, just a rough sketch:

class Async_Eth(Module):                                                                                                       
    getUncleByBlock = async_caller(uncle_block_input_formatter, identity)            
    ...                                  

class Eth(Module):                                                                                                             
    getUncleByBlock = blocking_caller(uncle_block_input_formatter, identity)                                                                                                   
    ...

#################                                                                                        

@curry                                                                                  
async def async_caller(input_formatter, output_formatter, module, *args, **kwargs):                                    
    method, params = input_formatter(module, *args, **kwargs)                                                          
    result = await module.web3.manager.request_async(method, params)                                                   
    return output_formatter(result)                                                                                    


@curry                                                                                                                 
def blocking_caller(input_formatter, output_formatter, module, *args, **kwargs):                                 
    method, params = input_formatter(module, *args, **kwargs)                                                          
    result = module.web3.manager.request_blocking(method, params)                                                      
    return output_formatter(result)        

#################    

def uncle_block_input_formatter(eth, block_identifier, uncle_index):                                                   
    method = select_method_for_block_identifier(                                                                       
        block_identifier,                                                                                              
        if_predefined='eth_getUncleByBlockNumberAndIndex',                                                             
        if_hash='eth_getUncleByBlockHashAndIndex',                                                                     
        if_number='eth_getUncleByBlockNumberAndIndex',                                                                 
    )                                                                                                                  
    return (method, [block_identifier, uncle_index])

So it might be possible to do something like that without introducing a lot of new classes or lines of code.

carver on 2 Oct 2018

👍2

Just realized we still need to figure out how to do middlewares....

pipermerriam on 2 Oct 2018

👍1

@carver In your example, could we co-locate the modules if we allow the caller function to be configured? Is there anything else that needs to vary between the async and blocking modules?

I have been wondering about the manager request function, the middlewares and the provider request functions.

I don't see a dry way to have async middlewares. We could add some call wrapping like with the module methods, allowing colocation, but then we would lose the ability to replace any blocking calls with awaitables in the middleware functions. I think they need have duplicate async versions.
Would now be a good time to remove the provider list, before digging into awaitable provider request functions?
How will we coordinate transactions into the correct order according to the nonce?

dylanjw on 4 Oct 2018

How will we coordinate transactions into the correct order according to the nonce?

I don't think we do this now so in theory I think it's not an async specific problem.

pipermerriam on 4 Oct 2018

Would now be a good time to remove the provider list, before digging into awaitable provider request functions?

Yeah, this should be deprecated and removed. Might be able to accelerate the removal as it's possible nobody is using it but I don't know how you'd verify that.

pipermerriam on 4 Oct 2018

I don't see a dry way to have async middlewares. We could add some call wrapping like with the module methods, allowing colocation, but then we would lose the ability to replace any blocking calls with awaitables in the middleware functions. I think they need have duplicate async versions.

Yeah, this part is likely to suck a bit.

Best idea I have so far is to start by reducing the number of middlewares by moving most of the core middleware functionality into the input_formatter and output_formatter API proposed by @carver

Looking at the default list it appears we'll be able to remove all but gas_price_strategy_middleware as the others look like they can all be squeezed into the individual methods.

For the remaining middlewares we can likely decompose them into sync/async versions striking a balance between readability and re-usable code.

pipermerriam on 4 Oct 2018

👍1

Something we should be careful about is maintaining the ability to support things like geth_poa_middleware which provide compatibility glue for non-standard API responses which are not necessarily wrong, but that do violate our assumptions about return data.

pipermerriam on 4 Oct 2018

I suspect the ENS middleware would need to be duplicated async & sync, too.

carver on 4 Oct 2018

Best idea I have so far is to start by reducing the number of middlewares by moving most of the core middleware functionality into the input_formatter and output_formatter

👍 for enforcing formatters and not making them optional. I've already described the problem with making the formatters optional in #1057.

voith on 5 Oct 2018

First, please feel free to cannibalize anything that may be useful from this repo:
w3aio

If you guys are struggling with linking the middlewares, take a look at Paco and Janus.

Janus

Paco

Janus is a sync/async thread safe Queue and Paco is an async based functional framework that may help stitch the awaitable pipeline together. Specifically check out paco.compose.

How will we coordinate transactions into the correct order according to the nonce?

I don't think we do this now so in theory I think it's not an async specific problem.

nelsonhp on 17 Oct 2018

🎉2 ❤1

@pipermerriam @carver

Hello, Im starting on the async web3 api again. After I worked through several different approaches in https://github.com/ethereum/web3.py/pull/1088 Ive honed in on one. It would be great to get input before I start fleshing it out too far. I will open a new PR and close #1088, which took several right turns while I worked through this stuff.

Here is an outline of the design I would like to implement, i.e. turtles all the way up.

The provider api is async.
middleware functions are all coroutines.
manager.request_async() is now the primary request function.
manager.request_blocking() is just sync(manager.request_async()).
The sync() utility executes in a loop until completion.
internally, web3 only uses its own async api.
The web3 api is written async, with the sync api generated with sync() wrapping.

Ive altered the strategy of how to collocate the async/sync module methods. Im favoring replacing them outright with coroutines as it is cleaner than my implementation of @carver's proposed API (https://github.com/ethereum/web3.py/issues/1055#issuecomment-426358202). Synchronous variants of the coroutines are generated and added to the class.

As far as the web3 module api, Im in favor of collocating the sync and async methods in the same module namespace, with couroutines prefixed with coro_, e.g. web3.eth.coro_sendTransaction(). It made more sense to me to write them out at the top level of the module rather than put them in a separate namespace. Im ambivalent on this point.

This will be a necessarily large pull request. The sync() utility does not support nested calls to sync(). This means internally web3 can only call it's own coroutines when needing to make a client request. A downside to this is it will not be possible to introduce the async changes piece by piece. Converting the manager, middlewares and providers to async also requires async web3 modules.

There is a limitation to taking the sync() utility approach. Users writing async code will not be able to use the synchronous web3 api. For example, the following will raise an exception:

async def main():
    # Run asynchronously
    block = await web3.eth.coro_getBlock(100)

    # Run synchronously
    block = web3.eth.getBlock(100)  # will raise exception

asyncio.run(main())

Asyncio by design does not allow nested event loops (loop reentrency). See: https://bugs.python.org/issue22239. I don't know how big of a concern this is. It might be a breaking change for some people. Those people could install https://github.com/erdewit/nest_asyncio if they dont want to update their code.

After this initial push, there can be refactoring to take advantage of the async internals. This initial design makes a minimum amount of changes to get to an async api.

dylanjw on 5 Dec 2018

Yeah, python asyncio has an annoying habit of being very viral: introduce it to one part of the code, and it spreads everywhere. It's annoying to me because a lot of the things I love about python (eg~ open up a repl to run a few lines of code) don't seem to work well anymore.

I don't have a good solution other than never supporting asyncio in the first place. Although the API I mentioned before was partially designed with that goal in mind: how do we keep async from virally infecting the whole repo, especially all at once. Because this scares me:

This will be a necessarily large pull request.

Maybe we can take a look together at the async-minimizing API to see if it's possible to clean up? Other than that, the approach sounds okay. Just a few other comments:

manager.request_blocking() is just sync(manager.request_async())

Could you maybe run a test to see how much overhead sync() adds? Even just against a dummy function like async def quick(): return None.

Users writing async code will not be able to use the synchronous web3 api.

I don't think this is a blocker. If they are already using async, they shouldn't have potentially slow blocking calls inside, anyway.

carver on 5 Dec 2018

sync() does have overhead.

~ 3 microseconds vs 500 microseconds.

import time

from web3._utils.async_tools import sync


class MyTimer():

    def __init__(self, label=None):
        self.start = time.time()
        self.label = label

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        end = time.time()
        runtime = end - self.start
        msg = 'The test labeled {label} took {time} seconds to complete'
        print(msg.format(label=self.label, time=runtime))


async def co_dummy(*args):
    return None


def dummy(*args):
    return None


if __name__ == '__main__':
    with MyTimer('Synchronous'):
        dummy()
    with MyTimer('sync(Asynchronous)'):
        sync(co_dummy())

$  python benchmark_sync_util.py
The test labeled Synchronous took 3.0994415283203125e-06 seconds to complete
The test labeled sync(Asynchronous) took 0.0005705356597900391 seconds to complete

dylanjw on 6 Dec 2018

So when using this in a non-async context, it would put a hard cap at ~1700 requests per second (or some 1/n of that when we make multiple under-the-hood requests like estimateGas() or ENS lookups). If n=5, that means a maximum of 350 requests/sec in a synchronous context. That's starting to get into a gray area of maybe being too slow, but not obviously totally unacceptable.

carver on 6 Dec 2018

I just had a crazy idea. Instead of sync() using an asyncio event loop, why not run it in a stripped down loop? When running inside sync, the web3 async code doesnt (as of now) need to yield control to the event loop. *EDIT: after running tests, there are a few failures where asyncio.sleep() is used.

I replaced loop.run_until_complete() with this function:

def run_coro(coro):
    future = None
    while True:
        try:
            future = coro.send(future)
        except StopIteration as exc:
            return exc.value

Re-running the same benchmark test gets us down to 270 microseconds. Not as good as I hoped, but better.

dylanjw on 10 Dec 2018

Was this page helpful?

0 / 5 - 0 ratings