The following snippet lives in the Google-Cloud-Ruby Github documentation and errors when tested. The snippet can be found here. The error is shown below.
From a brief inspection the bug has to do with the query named parameter. When I swap the named parameter with the actual value the snippet works correctly.
# Error
franknatividad@franknatividad0:~/Tests/ruby-bigquery$ bundle exec ruby query.rb
/usr/local/google/home/franknatividad/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/google-cloud-bigquery-0.23.0/lib/google/cloud/bigquery/service.rb:662:in `rescue in execute': invalidQuery: No matching signature for operator IN for argument types STRING and {ARRAY<STRING>} at [1:102] (Google::Cloud::InvalidArgumentError)
from /usr/local/google/home/franknatividad/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/google-cloud-bigquery-0.23.0/lib/google/cloud/bigquery/service.rb:660:in `execute'
from /usr/local/google/home/franknatividad/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/google-cloud-bigquery-0.23.0/lib/google/cloud/bigquery/service.rb:222:in `query'
from /usr/local/google/home/franknatividad/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/google-cloud-bigquery-0.23.0/lib/google/cloud/bigquery/project.rb:422:in `query'
from query.rb:10:in `<main>'
# Snippet
require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word, SUM(word_count) AS word_count " \
"FROM `bigquery-public-data.samples.shakespeare`" \
"WHERE word IN (@words) GROUP BY word"
data = bigquery.query sql, params: { words: ['me', 'I', 'you'] }
# Gemfile
source 'https://rubygems.org'
gem 'google-cloud-bigquery'
# Gemfile.lock
GEM
remote: https://rubygems.org/
specs:
addressable (2.5.0)
public_suffix (~> 2.0, >= 2.0.2)
faraday (0.10.1)
multipart-post (>= 1.2, < 3)
google-api-client (0.9.20)
addressable (~> 2.3)
googleauth (~> 0.5)
httpclient (~> 2.7)
hurley (~> 0.1)
memoist (~> 0.11)
mime-types (>= 1.6)
representable (~> 2.3.0)
retriable (~> 2.0)
google-cloud-bigquery (0.23.0)
google-api-client (~> 0.9.18)
google-cloud-core (~> 0.21.0)
google-cloud-core (0.21.1)
googleauth (~> 0.5.1)
googleauth (0.5.1)
faraday (~> 0.9)
jwt (~> 1.4)
logging (~> 2.0)
memoist (~> 0.12)
multi_json (~> 1.11)
os (~> 0.9)
signet (~> 0.7)
httpclient (2.8.3)
hurley (0.2)
jwt (1.5.6)
little-plugger (1.1.4)
logging (2.1.0)
little-plugger (~> 1.1)
multi_json (~> 1.10)
memoist (0.15.0)
mime-types (3.1)
mime-types-data (~> 3.2015)
mime-types-data (3.2016.0521)
multi_json (1.12.1)
multipart-post (2.0.0)
os (0.9.6)
public_suffix (2.0.5)
representable (2.3.0)
uber (~> 0.0.7)
retriable (2.1.0)
signet (0.7.3)
addressable (~> 2.3)
faraday (~> 0.9)
jwt (~> 1.5)
multi_json (~> 1.10)
uber (0.0.15)
PLATFORMS
ruby
DEPENDENCIES
google-cloud-bigquery
BUNDLED WITH
1.13.4
Does the snippet use standard SQL syntax? You may have to set useLegacySql to false or prepend the query with a line that is just #standardSQL.
@tswast
Yes the snippet uses standard SQL syntax and here's a related quote from the documentation:
As demonstrated above, passing the params option will automatically set standard_sql to true.
A quick try explicitly using standard_sql results in the same error. Maybe that's where the problem exists?
# Code
require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word, SUM(word_count) AS word_count " \
"FROM `bigquery-public-data.samples.shakespeare`" \
"WHERE word IN (@words) GROUP BY word"
data = bigquery.query sql, standard_sql: true, params: { words: ['me', 'I', 'you'] }
Can you debug the request by printing out the JSON it is sending? It's also possible it's not being populated correctly. (It is a bit tricky)
Yes, I will try!
code.google.com issue: https://code.google.com/p/google-bigquery/issues/detail?id=873
I suspect passing an array is simply an incorrect argument. According to Functions & Operators, the syntax is:
x IN (y, z, ...)
Notice that there is no array.
And replacing the array with a single string works:
require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word, SUM(word_count) AS word_count " \
"FROM `bigquery-public-data.samples.shakespeare`" \
"WHERE word IN (@words) GROUP BY word"
data = bigquery.query sql, params: { words: 'you' }
#=> [{"word"=>"you", "word_count"=>12527}]
It also works with the addition of UNNEST, per the syntax guide linked above:
The UNNEST form treats an array scan like UNNEST in the FROM clause:
x [NOT] IN UNNEST(<array expression>)
This form is often used with ARRAY parameters. For example:
x IN UNNEST(@array_parameter)
require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word, SUM(word_count) AS word_count " \
"FROM `bigquery-public-data.samples.shakespeare`" \
"WHERE word IN UNNEST(@words) GROUP BY word"
data = bigquery.query sql, params: { words: ['me', 'I', 'you'] }
#=> [{"word"=>"I", "word_count"=>21028}, {"word"=>"me", "word_count"=>8030}, {"word"=>"you", "word_count"=>12527}]
I think the best solution is to add UNNEST to the snippet.
Thanks, Chris! I'll send a PR to add UNNEST.
I think that's the only option. I also tried embedding the elements in a string argument, but it didn't work, apparently the string argument is treated as a single element.
require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
sql = "SELECT word, SUM(word_count) AS word_count " \
"FROM `bigquery-public-data.samples.shakespeare`" \
"WHERE word IN UNNEST(@words) GROUP BY word"
data = bigquery.query sql, params: { words: "'me', 'I', 'you'" }
#=> []
Thanks for your work on this @quartzmo! I think adding UNNEST is a great solution!
Most helpful comment
I think the best solution is to add
UNNESTto the snippet.