Pyright: Improve performance with boto3 stubs

Created on 23 Aug 2020 · 4Comments · Source: microsoft/pyright

When using pyright with boto3-stubs and the code below pyright takes > 20 secs to type check. Normally pyright takes only 1 or 2 secs.

I'm type checking this code:

from typing import Any, Dict, List, Optional, TypeVar, Union

import boto3
from mypy_boto3_ec2 import Client
from mypy_boto3_ec2.type_defs import FilterTypeDef


def describe(config: Dict[str, Any], name: Optional[str] = None) -> List[Dict[str, Any]]:
    """List EC2 instances in the region."""

    ec2_client:Client = boto3.client("ec2", region_name=config["region"])

    filters: List[FilterTypeDef] = [] if name is None else [{"Name": "tag:Name", "Values": [name]}]
    response = ec2_client.describe_instances(Filters=filters)

    instances = [
        {
            "State": i["State"]["Name"],
            "Name": first_or_else([t["Value"] for t in i.get("Tags", []) if t["Key"] == "Name"], None),
            "Type": i["InstanceType"],
            "DnsName": i["PublicDnsName"] if i.get("PublicDnsName", None) != "" else i["PrivateDnsName"],
            "LaunchTime": i["LaunchTime"],
            "ImageId": i["ImageId"],
            "InstanceId": i["InstanceId"],
        }
        for r in response["Reservations"]
        for i in r["Instances"]
    ]

    return sorted(instances, key=lambda i: i["State"] + str(i["Name"]))


E = TypeVar("E")
T = TypeVar("T")

def first_or_else(li: List[E], default: T) -> Union[E, T]:
    return li[0] if len(li) > 0 else default

Full example published at tekumara/pyright-boto3-stubs-example

Observations

remove type-defs.pyi and the time drops to ~2 sec
remove the call to first_or_else and the time drops in half to ~10 sec

Type checking the following only takes ~3 secs:

from typing import Any, Dict, Optional

import boto3
from mypy_boto3_ec2 import Client
from mypy_boto3_ec2.type_defs import DescribeInstancesResultTypeDef


def describe(config: Dict[str, Any], name: Optional[str] = None) -> DescribeInstancesResultTypeDef:
    """List EC2 instances in the region."""

    ec2_client: Client = boto3.client("ec2", region_name=config["region"])

    return ec2_client.describe_instances()

pyright 1.1.64

addressed in next version enhancement request

Source

tekumara

Most helpful comment

Thanks for the detailed repro steps.

Your code hit a few O(n^2) algorithms in Pyright that were taking a long time because "n" was atypically large in your code. I added two optimizations:

Pyright was recalculating the list of fields (and their types) within a TypedDict each time it needed to do a type comparison. It now caches this information. The boto3 stubs have TypedDict classes with hundreds and even thousands of entries. This optimization reduced the analysis time from about 17,500ms down to about 1,100ms.
The code in Pyright for building union types needs to ensure that newly-added entries don't already existing in the union. It uses an n^2 algorithm to do so. This is reasonable if unions contain <10 entries, which is typical. Some of the types defined in the boto3 stubs were unions with hundreds (approaching a thousand) entries. These were all string literal types. I added some special-case code for string literals in unions that makes these algorithms O(n) rather than O(n^2) for string literals. This reduced the analysis time from about 1,100ms down to about 100ms.

With those two optimizations, that's a ~175x speedup!

This will be included in the next release of Pyright and Pylance.

erictraut on 23 Aug 2020

🎉2 🚀1 ❤1

All 4 comments

Not sure if this is a pyright or stub issue, so have also raised https://github.com/vemel/mypy_boto3_builder/issues/43

tekumara on 23 Aug 2020

Thanks for the detailed repro steps.

Your code hit a few O(n^2) algorithms in Pyright that were taking a long time because "n" was atypically large in your code. I added two optimizations:

Pyright was recalculating the list of fields (and their types) within a TypedDict each time it needed to do a type comparison. It now caches this information. The boto3 stubs have TypedDict classes with hundreds and even thousands of entries. This optimization reduced the analysis time from about 17,500ms down to about 1,100ms.
The code in Pyright for building union types needs to ensure that newly-added entries don't already existing in the union. It uses an n^2 algorithm to do so. This is reasonable if unions contain <10 entries, which is typical. Some of the types defined in the boto3 stubs were unions with hundreds (approaching a thousand) entries. These were all string literal types. I added some special-case code for string literals in unions that makes these algorithms O(n) rather than O(n^2) for string literals. This reduced the analysis time from about 1,100ms down to about 100ms.

With those two optimizations, that's a ~175x speedup!

This will be included in the next release of Pyright and Pylance.

erictraut on 23 Aug 2020

🎉2 🚀1 ❤1

Thanks so much for addressing this. The boto3 stubs are unusually large, but pyright optimisations that improve working with them are much appreciated!

tekumara on 24 Aug 2020

This is now fixed in Pyright 1.1.65, which I just published. It will also be included in the next version of Pylance.

erictraut on 26 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings