Python-docs-samples: Failed to load json using detect.py

Created on 4 Apr 2019  路  18Comments  路  Source: GoogleCloudPlatform/python-docs-samples

In which file did you encounter the issue?

python-docs-samples/vision/cloud-client/detect/detect.py

Did you change the file? If so, how?

No.

Describe the issue

I tried using detect.py on a PDF that is stored in Google Cloud. Below is a sample of the code I tried
C:\temp1\google_vision>python detect.py ocr-uri gs://my_bucket_name/file_1003.pdf gs://my_bucket_name/output/

When I run my code I get the following error:

C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat
r/output
Waiting for the operation to finish.
Output files:
output/
output/clsoutput-1-to-2.json
output/output-1-to-2.json
outputoutput-1-to-2.json
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 416, in Parse
    js = json.loads(text, object_pairs_hook=_DuplicateChecker)
  File "C:\Program Files (x86)\Python37-32\lib\json\__init__.py", line 361, in l
oads
    return cls(**kw).decode(s)
  File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 337, in de
code
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 355, in ra
w_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "detect.py", line 955, in <module>
    run_uri(args)
  File "detect.py", line 835, in run_uri
    async_detect_document(args.uri, args.destination_uri)
  File "detect.py", line 720, in async_detect_document
    json_string, vision.types.AnnotateFileResponse())
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 418, in Parse
    raise ParseError('Failed to load JSON: {0}.'.format(str(e)))
google.protobuf.json_format.ParseError: Failed to load JSON: Expecting value: li
ne 1 column 1 (char 0).

How can I avoid this error? There is a resulting JSON file in the output folder.

Most helpful comment

I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.

Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.

The demo code should probably be modified to handle this simple case a little better.

All 18 comments

It looks like the error is more about how it parses the JSON output file.

Hi, from your call C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat r/output

It looks like you might be missing the end / on the gcs_destination_uri.

Should be: C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat r/output/

Let me know if that works.

No, still the same error.

Is your target GCS bucket empty?

I created a new folder in my bucket and targeted that folder and still received the error.

Does it still throw an error if you use our example pdf?
gs://python-docs-samples-tests/HodgeConj.pdf

Yes, still same error.

For the example pdf (gs://python-docs-samples-tests/HodgeConj.pdf), can you share a little bit of the contents of the output file?

Here are the first 75 lines

{
    "inputConfig": {
        "gcsSource": {
            "uri": "gs://python-docs-samples-tests/HodgeConj.pdf"
        },
        "mimeType": "application/pdf"
    },
    "responses": [{
            "fullTextAnnotation": {
                "pages": [{
                        "property": {
                            "detectedLanguages": [{
                                    "languageCode": "en",
                                    "confidence": 0.97
                                }, {
                                    "languageCode": "az",
                                    "confidence": 0.02
                                }
                            ]
                        },
                        "width": 595,
                        "height": 842,
                        "blocks": [{
                                "boundingBox": {
                                    "normalizedVertices": [{
                                            "x": 0.09243698,
                                            "y": 0.059382424
                                        }, {
                                            "x": 0.5243698,
                                            "y": 0.066508316
                                        }, {
                                            "x": 0.5243698,
                                            "y": 0.07482185
                                        }, {
                                            "x": 0.09243698,
                                            "y": 0.06769596
                                        }
                                    ]
                                },
                                "paragraphs": [{
                                        "boundingBox": {
                                            "normalizedVertices": [{
                                                    "x": 0.09243698,
                                                    "y": 0.059382424
                                                }, {
                                                    "x": 0.5243698,
                                                    "y": 0.066508316
                                                }, {
                                                    "x": 0.5243698,
                                                    "y": 0.07482185
                                                }, {
                                                    "x": 0.09243698,
                                                    "y": 0.06769596
                                                }
                                            ]
                                        },
                                        "words": [{
                                                "property": {
                                                    "detectedLanguages": [{
                                                            "languageCode": "en"
                                                        }
                                                    ]
                                                },
                                                "boundingBox": {
                                                    "normalizedVertices": [{
                                                            "x": 0.09243698,
                                                            "y": 0.059382424
                                                        }, {
                                                            "x": 0.13781513,
                                                            "y": 0.060570072
                                                        }, {
                                                            "x": 0.13781513,
                                                            "y": 0.06888361
                                                        }, {

Alright, cool. It looks like the Vision API call is successful, but when retrieving the results from GCS there seems to be an issue.

Are you on the latest version for the storage API?
If you run pip freeze | grep google

pip freeze | findstr google
google-api-core==1.8.2
google-auth==1.6.3
google-cloud-bigquery==1.10.0
google-cloud-core==0.29.1
google-cloud-storage==1.14.0
google-cloud-vision==0.36.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.9

I updated google cloud storage to 1.15.0 but I still get the same error

I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.

Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.

The demo code should probably be modified to handle this simple case a little better.

@benbluhm your suggestion solved my issue, thank you

Yes. It worked for me also.

Thanks, @benbluhm!
Closing the issue.

Hi Guys can someone put in the updated sample code. That would be great. \

Thanks

Was this page helpful?
0 / 5 - 0 ratings