Python-docs-samples: BigQuery error in AutoML notebooks example - TypeError: to_pandas() got an unexpected keyword argument 'timestamp_as_object'

Created on 27 Aug 2020  路  9Comments  路  Source: GoogleCloudPlatform/python-docs-samples

In which file did you encounter the issue?

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/tables/automl/notebooks/purchase_prediction/purchase_prediction.ipynb

Did you change the file? If so, how?

I tried to upgrade Pyarrow (1.0.1) but the error still occurred

Describe the issue

I attempted to run this notebook on a cluster on Dataproc using Jupyter.

When I run this command

# Save table.
query = """
SELECT
 date, 
 device, 
 geoNetwork, 
 totals, 
 trafficSource, 
 fullVisitorId 
FROM 
 `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
 _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB('2017-08-01', INTERVAL 366 DAY)) AND
 FORMAT_DATE('%Y%m%d',DATE_SUB('2017-08-01', INTERVAL 1 DAY))
"""
df = bq_client.query(query).to_dataframe()
print(df.iloc[:3])
saveTable(df, NESTED_CSV_NAME, BUCKET_NAME)

The following error occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-35-989b8f700951> in <module>
     14  FORMAT_DATE('%Y%m%d',DATE_SUB('2017-08-01', INTERVAL 1 DAY))
     15 """
---> 16 df = bq_client.query(query).to_dataframe()
     17 print(df.iloc[:3])
     18 saveTable(df, NESTED_CSV_NAME, BUCKET_NAME)

~/.local/lib/python3.7/site-packages/google/cloud/bigquery/job.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object)
   3381             progress_bar_type=progress_bar_type,
   3382             create_bqstorage_client=create_bqstorage_client,
-> 3383             date_as_object=date_as_object,
   3384         )
   3385 

~/.local/lib/python3.7/site-packages/google/cloud/bigquery/table.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object)
   1755                 extra_kwargs = {"timestamp_as_object": timestamp_as_object}
   1756 
-> 1757             df = record_batch.to_pandas(date_as_object=date_as_object, **extra_kwargs)
   1758 
   1759             for column in dtypes:

/opt/conda/anaconda/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()

TypeError: to_pandas() got an unexpected keyword argument 'timestamp_as_object'

I ran the following to show what versions of each package is installed:

!pip show pyarrow pandas pandas-gbq google-cloud-bigquery-storage google-cloud-bigquery 

Output:

Name: pyarrow
Version: 1.0.1
Summary: Python library for Apache Arrow
Home-page: https://arrow.apache.org/
Author: None
Author-email: None
License: Apache License, Version 2.0
Location: /opt/conda/anaconda/lib/python3.7/site-packages
Requires: numpy
Required-by: 
---
Name: pandas
Version: 1.1.1
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /root/.local/lib/python3.7/site-packages
Requires: numpy, pytz, python-dateutil
Required-by: pandas-gbq, visions, statsmodels, seaborn, phik, pandas-profiling, altair
---
Name: pandas-gbq
Version: 0.13.2
Summary: Pandas interface to Google BigQuery
Home-page: https://github.com/pydata/pandas-gbq
Author: The PyData Development Team
Author-email: [email protected]
License: BSD License
Location: /root/.local/lib/python3.7/site-packages
Requires: pydata-google-auth, google-cloud-bigquery, google-auth, google-auth-oauthlib, pandas, setuptools
Required-by: 
---
Name: google-cloud-bigquery-storage
Version: 1.0.0
Summary: BigQuery Storage API API client library
Home-page: https://github.com/googleapis/python-bigquery-storage
Author: Google LLC
Author-email: [email protected]
License: Apache 2.0
Location: /opt/conda/anaconda/lib/python3.7/site-packages
Requires: google-api-core
Required-by: 
---
Name: google-cloud-bigquery
Version: 1.27.2
Summary: Google BigQuery API client library
Home-page: https://github.com/googleapis/python-bigquery
Author: Google LLC
Author-email: [email protected]
License: Apache 2.0
Location: /root/.local/lib/python3.7/site-packages
Requires: google-api-core, google-cloud-core, google-resumable-media, six
Required-by: pandas-gbq
automl p2 samples question

Most helpful comment

Since I also not only needed library bigquery, but also library bigquery storage, I used the following to solve this:

pip install google-cloud-bigquery==1.26.1
pip install google-cloud-bigquery-storage==1.0.0

All 9 comments

Having the same issue.
As I found a few connected threads here and on SO with people suggesting updating different packages and neither worked for me I'd just share my current workaround until this is resolved.

I just pinned the following packages to the same versions listed in Migrating from pandas-gbq
Namely:
google-cloud-bigquery==1.20.0 google-cloud-bigquery-storage==0.7.0 pandas==0.25.1 pandas-gbq==0.11.0 pyarrow==0.14.1

Same issue here, outside of the AutoML notebook example. Also a jupyter notebook on a cluster, and it's showing up when I use either pandas-gbq or google-cloud-bigquery. Seemed to be working a couple days ago and then suddenly started throwing this error.

Is there something in particular that triggers this? Some way to avoid that, in terms of the data types I'm trying to query?

Are the above packages ones that avoid this error in general or just for this notebook example, @gslokoski?

@dawulf
In general.

Apologies I didn't realise that saying I have the same issue implied same environment too.

I specifically had the issue in a self hosted Jupyter notebook but have also tested it within a serverless Cloud Function.

@gslokoski No worries, I misunderstood.
Apologies for the basic question, but to ensure those older versions are actually the ones being used in the kernel, would it be enough to just start off a jupyter notebook with a !pip install pyarrow==0.14.1, etc, or does it need some sort of force uninstall beforehand?

By the way, in case others have this issue and are desperate for another workaround, I got around it by explicitly uninstalling pyarrow in my notebook, and just dealing with the ~25x hit to my to_pandas() speed. Hoping the error is dealt with soon, I see it may have been a very recent change causing it: https://github.com/apache/arrow/pull/7169

I think this might be due to conflicting versions of arrow in the notebook environment? I believe this is caused by python-bigquery starting to rely on Arrow 1.0.1 features

Please see https://github.com/jupyter/docker-stacks/issues/944 which seems to give instructions for replacing a version of Arrow (a Kernel restart is needed).

I had the same issue.

This code helped me:
pip install google-cloud-bigquery==1.24.0

instead of 1.27.2 version

Since I also not only needed library bigquery, but also library bigquery storage, I used the following to solve this:

pip install google-cloud-bigquery==1.26.1
pip install google-cloud-bigquery-storage==1.0.0

Greetings, we're closing this. Looks like the issue got resolved by using a specific version. Please let us know if the issue needs to be reopened.

Thank you. This helped me!!

Was this page helpful?
0 / 5 - 0 ratings