python-docs-samples/vision/cloud-client/detect/detect.py
No.
I tried using detect.py on a PDF that is stored in Google Cloud. Below is a sample of the code I tried
C:\temp1\google_vision>python detect.py ocr-uri gs://my_bucket_name/file_1003.pdf gs://my_bucket_name/output/
When I run my code I get the following error:
C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat
r/output
Waiting for the operation to finish.
Output files:
output/
output/clsoutput-1-to-2.json
output/output-1-to-2.json
outputoutput-1-to-2.json
Traceback (most recent call last):
File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 416, in Parse
js = json.loads(text, object_pairs_hook=_DuplicateChecker)
File "C:\Program Files (x86)\Python37-32\lib\json\__init__.py", line 361, in l
oads
return cls(**kw).decode(s)
File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 337, in de
code
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 355, in ra
w_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "detect.py", line 955, in <module>
run_uri(args)
File "detect.py", line 835, in run_uri
async_detect_document(args.uri, args.destination_uri)
File "detect.py", line 720, in async_detect_document
json_string, vision.types.AnnotateFileResponse())
File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 418, in Parse
raise ParseError('Failed to load JSON: {0}.'.format(str(e)))
google.protobuf.json_format.ParseError: Failed to load JSON: Expecting value: li
ne 1 column 1 (char 0).
How can I avoid this error? There is a resulting JSON file in the output folder.
It looks like the error is more about how it parses the JSON output file.
Hi, from your call C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat
r/output
It looks like you might be missing the end / on the gcs_destination_uri.
Should be: C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat
r/output/
Let me know if that works.
No, still the same error.
Is your target GCS bucket empty?
I created a new folder in my bucket and targeted that folder and still received the error.
Does it still throw an error if you use our example pdf?
gs://python-docs-samples-tests/HodgeConj.pdf
Yes, still same error.
For the example pdf (gs://python-docs-samples-tests/HodgeConj.pdf), can you share a little bit of the contents of the output file?
Here are the first 75 lines
{
"inputConfig": {
"gcsSource": {
"uri": "gs://python-docs-samples-tests/HodgeConj.pdf"
},
"mimeType": "application/pdf"
},
"responses": [{
"fullTextAnnotation": {
"pages": [{
"property": {
"detectedLanguages": [{
"languageCode": "en",
"confidence": 0.97
}, {
"languageCode": "az",
"confidence": 0.02
}
]
},
"width": 595,
"height": 842,
"blocks": [{
"boundingBox": {
"normalizedVertices": [{
"x": 0.09243698,
"y": 0.059382424
}, {
"x": 0.5243698,
"y": 0.066508316
}, {
"x": 0.5243698,
"y": 0.07482185
}, {
"x": 0.09243698,
"y": 0.06769596
}
]
},
"paragraphs": [{
"boundingBox": {
"normalizedVertices": [{
"x": 0.09243698,
"y": 0.059382424
}, {
"x": 0.5243698,
"y": 0.066508316
}, {
"x": 0.5243698,
"y": 0.07482185
}, {
"x": 0.09243698,
"y": 0.06769596
}
]
},
"words": [{
"property": {
"detectedLanguages": [{
"languageCode": "en"
}
]
},
"boundingBox": {
"normalizedVertices": [{
"x": 0.09243698,
"y": 0.059382424
}, {
"x": 0.13781513,
"y": 0.060570072
}, {
"x": 0.13781513,
"y": 0.06888361
}, {
Here are the three total files
test2_output-1-to-2.zip
test2_output-3-to-4.zip
test2_output-5-to-5.zip
Alright, cool. It looks like the Vision API call is successful, but when retrieving the results from GCS there seems to be an issue.
Are you on the latest version for the storage API?
If you run pip freeze | grep google
pip freeze | findstr google
google-api-core==1.8.2
google-auth==1.6.3
google-cloud-bigquery==1.10.0
google-cloud-core==0.29.1
google-cloud-storage==1.14.0
google-cloud-vision==0.36.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.9
I updated google cloud storage to 1.15.0 but I still get the same error
I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.
Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.
The demo code should probably be modified to handle this simple case a little better.
@benbluhm your suggestion solved my issue, thank you
Yes. It worked for me also.
Thanks, @benbluhm!
Closing the issue.
Hi Guys can someone put in the updated sample code. That would be great. \
Thanks
Most helpful comment
I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.
Try hardcoding a prefix something like
prefix = 'output/out'and that folder won't be included in the list.The demo code should probably be modified to handle this simple case a little better.