Python-docs-samples: Failed to load json using detect.py

Created on 4 Apr 2019 · 18Comments · Source: GoogleCloudPlatform/python-docs-samples

In which file did you encounter the issue?

python-docs-samples/vision/cloud-client/detect/detect.py

Did you change the file? If so, how?

No.

Describe the issue

I tried using detect.py on a PDF that is stored in Google Cloud. Below is a sample of the code I tried
C:\temp1\google_vision>python detect.py ocr-uri gs://my_bucket_name/file_1003.pdf gs://my_bucket_name/output/

When I run my code I get the following error:

C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat
r/output
Waiting for the operation to finish.
Output files:
output/
output/clsoutput-1-to-2.json
output/output-1-to-2.json
outputoutput-1-to-2.json
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 416, in Parse
    js = json.loads(text, object_pairs_hook=_DuplicateChecker)
  File "C:\Program Files (x86)\Python37-32\lib\json\__init__.py", line 361, in l
oads
    return cls(**kw).decode(s)
  File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 337, in de
code
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 355, in ra
w_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "detect.py", line 955, in <module>
    run_uri(args)
  File "detect.py", line 835, in run_uri
    async_detect_document(args.uri, args.destination_uri)
  File "detect.py", line 720, in async_detect_document
    json_string, vision.types.AnnotateFileResponse())
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 418, in Parse
    raise ParseError('Failed to load JSON: {0}.'.format(str(e)))
google.protobuf.json_format.ParseError: Failed to load JSON: Expecting value: li
ne 1 column 1 (char 0).

How can I avoid this error? There is a resulting JSON file in the output folder.

Source

boesiii

Most helpful comment

I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.

Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.

The demo code should probably be modified to handle this simple case a little better.

benbluhm on 26 Apr 2019

👍3 ❤2

All 18 comments

It looks like the error is more about how it parses the JSON output file.

boesiii on 9 Apr 2019

Hi, from your call C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat r/output

It looks like you might be missing the end / on the gcs_destination_uri.

Should be: C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat r/output/

Let me know if that works.

nnegrey on 18 Apr 2019

No, still the same error.

boesiii on 18 Apr 2019

Is your target GCS bucket empty?

nnegrey on 18 Apr 2019

I created a new folder in my bucket and targeted that folder and still received the error.

boesiii on 18 Apr 2019

Does it still throw an error if you use our example pdf?
gs://python-docs-samples-tests/HodgeConj.pdf

nnegrey on 18 Apr 2019

Yes, still same error.

boesiii on 19 Apr 2019

For the example pdf (gs://python-docs-samples-tests/HodgeConj.pdf), can you share a little bit of the contents of the output file?

nnegrey on 22 Apr 2019

Here are the first 75 lines

{
    "inputConfig": {
        "gcsSource": {
            "uri": "gs://python-docs-samples-tests/HodgeConj.pdf"
        },
        "mimeType": "application/pdf"
    },
    "responses": [{
            "fullTextAnnotation": {
                "pages": [{
                        "property": {
                            "detectedLanguages": [{
                                    "languageCode": "en",
                                    "confidence": 0.97
                                }, {
                                    "languageCode": "az",
                                    "confidence": 0.02
                                }
                            ]
                        },
                        "width": 595,
                        "height": 842,
                        "blocks": [{
                                "boundingBox": {
                                    "normalizedVertices": [{
                                            "x": 0.09243698,
                                            "y": 0.059382424
                                        }, {
                                            "x": 0.5243698,
                                            "y": 0.066508316
                                        }, {
                                            "x": 0.5243698,
                                            "y": 0.07482185
                                        }, {
                                            "x": 0.09243698,
                                            "y": 0.06769596
                                        }
                                    ]
                                },
                                "paragraphs": [{
                                        "boundingBox": {
                                            "normalizedVertices": [{
                                                    "x": 0.09243698,
                                                    "y": 0.059382424
                                                }, {
                                                    "x": 0.5243698,
                                                    "y": 0.066508316
                                                }, {
                                                    "x": 0.5243698,
                                                    "y": 0.07482185
                                                }, {
                                                    "x": 0.09243698,
                                                    "y": 0.06769596
                                                }
                                            ]
                                        },
                                        "words": [{
                                                "property": {
                                                    "detectedLanguages": [{
                                                            "languageCode": "en"
                                                        }
                                                    ]
                                                },
                                                "boundingBox": {
                                                    "normalizedVertices": [{
                                                            "x": 0.09243698,
                                                            "y": 0.059382424
                                                        }, {
                                                            "x": 0.13781513,
                                                            "y": 0.060570072
                                                        }, {
                                                            "x": 0.13781513,
                                                            "y": 0.06888361
                                                        }, {

boesiii on 22 Apr 2019

Here are the three total files
test2_output-1-to-2.zip
test2_output-3-to-4.zip
test2_output-5-to-5.zip

boesiii on 22 Apr 2019

Alright, cool. It looks like the Vision API call is successful, but when retrieving the results from GCS there seems to be an issue.

Are you on the latest version for the storage API?
If you run pip freeze | grep google

nnegrey on 24 Apr 2019

pip freeze | findstr google
google-api-core==1.8.2
google-auth==1.6.3
google-cloud-bigquery==1.10.0
google-cloud-core==0.29.1
google-cloud-storage==1.14.0
google-cloud-vision==0.36.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.9

boesiii on 24 Apr 2019

I updated google cloud storage to 1.15.0 but I still get the same error

boesiii on 24 Apr 2019

Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.

The demo code should probably be modified to handle this simple case a little better.

benbluhm on 26 Apr 2019

👍3 ❤2

@benbluhm your suggestion solved my issue, thank you