Google-cloud-python: [Documentation Issue] UDFResource examples missing

Created on 10 Sep 2016  Â·  13Comments  Â·  Source: googleapis/google-cloud-python

Page Name: bigquery-usage
Release: 0.18.1

An example of how to use UDFResource in both sync and async queries would be much appreciated.

bigquery backend p2

All 13 comments

@1mentat, sounds like a good idea to me!
I'm not super familiar with BigQuery but I'll give it a go.

I was able to get a UDF working in the console, but I'm struggling to get it working in code.

@tseaver here is my snippet code that I'm experimenting with.
https://gist.github.com/daspecster/e26346af9fea67aa6efc0716c11c5956

@daspecster, the feature was added in #2029 by @dwmclary. Maybe Daniel could supply an example of a working inline UDF?

@dwmclary if you can point me in the right direction that would be much appreciated! I'm not sure if I'm doing it wrong or if this is a bug.

There are some other bugs in the docs. I have working code after reading
the source. Will post when I'm at my desk.

On Fri, Sep 16, 2016, 7:50 AM Thomas Schultz [email protected]
wrote:

@dwmclary https://github.com/dwmclary if you can point me in the right
direction that would be much appreciated! I'm not sure if I'm doing it
wrong or if this is a bug.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2294#issuecomment-247620692,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA129nhlwM9s79USCQSvlMhZNoUjYlJZks5qqqy9gaJpZM4J5mUy
.

OK James, one you've got the bugs sorted out, post and I'll sort them out.

On Fri, Sep 16, 2016 at 8:23 AM, James Burns [email protected]
wrote:

There are some other bugs in the docs. I have working code after reading
the source. Will post when I'm at my desk.

On Fri, Sep 16, 2016, 7:50 AM Thomas Schultz [email protected]
wrote:

@dwmclary https://github.com/dwmclary if you can point me in the right
direction that would be much appreciated! I'm not sure if I'm doing it
wrong or if this is a bug.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
issuecomment-247620692>,
or mute the thread
AA129nhlwM9s79USCQSvlMhZNoUjYlJZks5qqqy9gaJpZM4J5mUy>

.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2294#issuecomment-247629487,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAl_pLyL-8_izfLXeTLKqY_oGVzJX-Usks5qqrSFgaJpZM4J5mUy
.

Sorry for formatting...

sync w/ inline udf:

from gcloud import bigquery
from gcloud.bigquery.job import UDFResource

q = """SELECT * FROM [<dataset>]"""
udf = """<udf>"""

query = client.run_sync_query(q)
query.udf_resources = [UDFResource("inlineCode", udf)]
query.run() # may be wrong in some places

async w/ inline UDF:

from gcloud import bigquery
from gcloud.bigquery.job import UDFResource

q = """SELECT * FROM [<dataset>]"""
udf = """<udf>"""
dataset = client.dataset(<dataset>)
table = dataset.table(name=<table_name>)
job = client.run_async_query("query", q)
job.udf_resources = [UDFResource("inlineCode", udf)]
job.destination = table
job.begin() # not right in some examples

async w/ resource uri UDF and destination table:

from gcloud import bigquery
from gcloud.bigquery.job import UDFResource

q = """SELECT * FROM [<dataset>]"""
udf_uri = "gs://some-bucket/js/lib.js"
dataset = client.dataset(<dataset>)
table = dataset.table(name=<table_name>)
job = client.run_async_query("query", q)
job.udf_resources = [UDFResource("resourceUri", udf_uri)]
job.destination = table
job.write_disposition = 'WRITE_TRUNCATE' # not right in some examples
job.allow_large_results = True # not documented in examples
job.flatten_results = False # not documented in examples
job.begin() # not right in some examples

@1mentat I'm not sure if you looked at my gist from before but I believe I have nearly the same thing as your "sync w/inline UDF" example.
However, that doesn't seem to work for me.

I'm curious if your <udf> example uses bigquery.defineFunction in it?

FYI the UDF in my gist works in the bigquery console.

Sorry, I must have missed it. Too much practice clicking on archive at work.

Inline UDFs through the API require doc headers for types. It's a known issue and was supposed to get fixed a while ago according to the searches I did. I think you need something like:

/**
 *
 * We define the two parameters, below, and specify the schema of the input row and
 * the output row.
 *
 * @param {{name: string, state: string}} row
 * @param function({{upper_name: string,
 *    state: string}}) emit
 */

immediately above the actual function.

OK thanks! I'll give that a go and let you know how it turns out.

@1mentat hmm that doesn't seem to work for me either.
I also tried copying exactly what I have in the UDF editor on the BigQuery console and passing it to query.udf_resources = [UDFResource("inlineCode", INLINE_UDF_CODE)].

I keep getting

  ERROR: BadRequest(u'Unknown TVF: UPPERCASENAME (POST https://www.googleapis.com/bigquery/v2/projects/my-project/queries)',)

My UDF string

    INLINE_UDF_CODE = """
        /**
         *
         * We define the two parameters, below, and specify the schema of the
         * input row and the output row.
         *
         * @param {{name: string, state: string}} row
         * @param function({{upper_name: string,
         *    state: string}}) emit
         */

        function upperCaseName(r, emit) {
            emit({upper_name: r.name.toUpperCase(),
                  state: r.state});
        }

        bigquery.defineFunction(
          'upperCaseName',
          ['name', 'state'],
          [{'name': 'upper_name', 'type': 'string'},
           {'name': 'state', 'type': 'string'}],
          upperCaseName
        );
    """

My UDF query...

    UDF_QUERY = ('SELECT upper_name FROM upperCaseName('
                 '[bigquery-public-data:usa_names.usa_1910_2013]) '
                 'WHERE state = "TX"')

@daspecster Looking back I'm not sure I got sync working. The same UDF works fine in async (in the sense that it runs until it runs out of resources).

from gcloud import bigquery
from gcloud.bigquery.job import UDFResource

import time
import logging

client = bigquery.Client()
UDF_QUERY = """SELECT
  upper_name
FROM
  upperCaseName([bigquery-public-data:usa_names.usa_1910_2013])
WHERE
  state = "TX"
"""

INLINE_UDF_CODE = """
bigquery.defineFunction(
  'upperCaseName',
  ['name', 'state'],
  [{'name': 'upper_name', 'type': 'string'},
   {'name': 'state', 'type': 'string'}],
  upperCaseName
)

/**
 *
 * We define the two parameters, below, and specify the schema of the
 * input row and the output row.
 *
 * @param {{name: string, state: string}} row
 * @param function({{upper_name: string,
 *    state: string}}) emit
 */
function upperCaseName(r, emit) {
    emit({upper_name: r.name.toUpperCase(),
          state: r.state});
}"""

job = client.run_async_query("uppercase_query_{}".format(int(time.time())), UDF_QUERY)
job.udf_resources = [UDFResource("inlineCode", INLINE_UDF_CODE)]

try:
    job.begin()

    job.reload()
    retry_count = 0

    while retry_count < 12 and job.state != u'DONE':
        time.sleep(1.5**retry_count)      # exponential backoff
        retry_count += 1
        job.reload()

except Exception as e:
    logging.exception(e)
    print "Something went wrong"

@fhoffa Can you shed some light?

Hello,
One of the challenges of maintaining a large open source project is that sometimes, you can bite off more than you can chew. As the lead maintainer of google-cloud-python, I can definitely say that I have let the issues here pile up.

As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a "bankruptcy" of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates.

My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request.

Thank you!

Was this page helpful?
0 / 5 - 0 ratings