Boto3: RecursionError using batch writer

Created on 12 Aug 2019  ·  8Comments  ·  Source: boto/boto3

For some reason, this works fine with a regular put_item call, but not if I pass the same item to the batch writer:

#!/usr/bin/env python3

import decimal
import os
import pprint
import random

import boto3
from faker import Faker
from faker.providers import company

dynamodb_client = boto3.client('dynamodb')
dynamodb = boto3.resource('dynamodb')
fake = Faker()
pp = pprint.PrettyPrinter(indent=2)

table_name = 'my_table'
existing_tables = dynamodb_client.list_tables()['TableNames']

# Create the DynamoDB table if it does not exist

if table_name not in existing_tables:
    response = dynamodb_client.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'artist_id',
                'KeyType': 'HASH',
            },
            {
                'AttributeName': 'album_id',
                'KeyType': 'RANGE',
            },
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'artist_id',
                'AttributeType': 'N',
            },
            {
                'AttributeName': 'album_id',
                'AttributeType': 'N',
            },
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 50,
            'WriteCapacityUnits': 500,
        }
    )

    print('Waiting for table ', table_name, 'to create...')
    waiter = dynamodb_client.get_waiter('table_exists')
    waiter.wait(TableName=table_name)

ddb_table = dynamodb.Table(table_name)

album_id = 0

album_formats = ['7" vinyl', '10" Vinyl', '12" Vinyl']

# Create 1000 artists, each with 10 albums

with ddb_table.batch_writer() as batch:

    for artist_id in range(1, 1001):

        item = {}
        item['artist_id'] = artist_id
        item['artist_name'] = fake.name()

        for album in range(10):
            album_id = album_id + 1
            item['album_id'] = album_id
            item['album_title'] = fake.catch_phrase().title()
            item['album_meta'] = {}
            item['album_meta']['year'] = fake.year()
            item['album_meta']['sku'] = f'{artist_id}-{album_id}'
            item['album_meta']['format'] = random.choice(album_formats)
            item['album_meta']['price'] = decimal.Decimal(str(
                round(random.uniform(3.00, 50.00), 2)))

            id_album = list(str(album_id))
            id_album.reverse()
            filepath = '/'.join(id_album)
            filename = os.path.join(filepath, f'{album_id}.jpg')
            s3_uri = f'/albumart/{filepath}/{album_id}.jpg'

            item['track_names'] = []
            item['album_meta']['s3_uri'] = s3_uri
            item['album_meta']['tracks'] = []

            # each album has 12 tracks

            for track_id in range(1, 13):
                item['track_names'].append(f'Track{track_id}')

                track = {}
                track['name'] = f'Track{track_id}'
                track['length'] = round(random.uniform(30000, 300000))
                track['name'] = f'Track{track_id}'
                track['position'] = f'{track_id}'
                item['album_meta']['tracks'].append(track)

            print(album_id)
            pp.pprint(item)
            batch.put_item(Item=item)

Here is the abbreviated stack trace:

  File "/usr/local/lib/python3.7/site-packages/boto3/dynamodb/types.py", line 231, in _serialize_m
    return dict([(k, self.serialize(v)) for k, v in value.items()])
  File "/usr/local/lib/python3.7/site-packages/boto3/dynamodb/types.py", line 231, in <listcomp>
    return dict([(k, self.serialize(v)) for k, v in value.items()])
  File "/usr/local/lib/python3.7/site-packages/boto3/dynamodb/types.py", line 102, in serialize
    dynamodb_type = self._get_dynamodb_type(value)
  File "/usr/local/lib/python3.7/site-packages/boto3/dynamodb/types.py", line 124, in _get_dynamodb_type
    elif self._is_type_set(value, self._is_number):
  File "/usr/local/lib/python3.7/site-packages/boto3/dynamodb/types.py", line 183, in _is_type_set
    if self._is_set(value):
  File "/usr/local/lib/python3.7/site-packages/boto3/dynamodb/types.py", line 178, in _is_set
    if isinstance(value, collections_abc.Set):
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/abc.py", line 139, in __instancecheck__
    return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison

Using boto3 1.9.205 on macOS 10.14.6 with Python 3.7.4.

closing-soon dynamodb

All 8 comments

@mrichman - Thank you for your post. This error can be eliminated by increasing the recursion limit. You can increase the recursion limit using sys.Setrecursionlimit(limit).

Here is a stack overflow thread which explains more about the issue :
https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-how-to-increase-it

Hope it helps and please let me know if you have any questions.

@swetashre I had tried this prior to posting this issue, and it had no effect.

I have encountered the same problem, sys.Setrecursionlimit(limit) dont work

mtr = meter.ConfusionMeter(k=3)

output = torch.Tensor([[.8, 0.1, 0.1], [10, 11, 10], [0.2, 0.2, .3]])
if hasattr(torch, "arange"):
target = torch.arange(0, 3)
else:
target = torch.range(0, 2)
mtr.add(output.data, target.data)
print(mtr.value())

RecursionError: maximum recursion depth exceeded while calling a Python object

Is there any other way to solve this problem?

@mrichman - Sorry for the late reply. As the data to be saved, contains a good amount of hierarchies the recursion limit is reached while saving all the data at once. Hence it will be best to flush the data periodically and this can be achieved by decreasing the flush_amount value. Here is the code snippet :

with ddb_table.batch_writer() as batch:
    batch._flush_amount = 1

    for artist_id in range(1, 2):

        item = {}
        item['artist_id'] = artist_id
        item['artist_name'] = fake.name()

        for album in range(10):
            album_id = album_id + 1
            item['album_id'] = album_id
            item['album_title'] = fake.catch_phrase().title()
            item['album_meta'] = {}
            item['album_meta']['year'] = fake.year()
            item['album_meta']['sku'] = f'{artist_id}-{album_id}'
            item['album_meta']['format'] = random.choice(album_formats)
            item['album_meta']['price'] = decimal.Decimal(str(
                round(random.uniform(3.00, 50.00), 2)))

            id_album = list(str(album_id))
            id_album.reverse()
            filepath = '/'.join(id_album)
            filename = os.path.join(filepath, f'{album_id}.jpg')
            s3_uri = f'/albumart/{filepath}/{album_id}.jpg'

            item['track_names'] = []
            item['album_meta']['s3_uri'] = s3_uri
            item['album_meta']['tracks'] = []

            # each album has 12 tracks

            for track_id in range(1, 13):
                item['track_names'].append(f'Track{track_id}')

                track = {}
                track['name'] = f'Track{track_id}'
                track['length'] = round(random.uniform(30000, 300000))
                track['name'] = f'Track{track_id}'
                track['position'] = f'{track_id}'
                item['album_meta']['tracks'].append(track)

            print(album_id)
            pp.pprint(item)

            batch.put_item(Item=item)

@swetashre Thanks! Is this documented anywhere? I can't find a reference to it outside the code itself https://github.com/boto/boto3/blob/develop/boto3/dynamodb/table.py#L97

Also, if we set _flush_amount to 1, and the default is 25, doesn't that defeat the purpose of batching altogether?

@mrichman - Yes it will defeat the purpose of batching altogether. That was only a workaround. The problem in your case is that the same item object is getting reused each time when you call put_item. What's happening is that you have a list of items you want to batch but they're all referencing the same object, so what happens is we iterate through each item trying to transform it, but because they're all the same reference, they get transformed over and over again for each element in the list

The fix for your use case is to just create a new item dictionary each time:

with ddb_table.batch_writer() as batch:

    for artist_id in range(1, 1001):


        for album in range(10):
            item = {}
            item['artist_id'] = artist_id
            item['artist_name'] = 'hi'
            album_id = album_id + 1
            item['album_id'] = album_id
            ...

Can you please try this solution and let me know if it helps or not ?

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

Was this page helpful?
0 / 5 - 0 ratings