Azure-docs: ConnectionError with the text_recognition_url api

Created on 2 Jul 2018 · 6Comments · Source: MicrosoftDocs/azure-docs

Hello,

I got the API keys for the 7 day trial but I can't find a way to run the script. I have the error below:

ConnectionError: HTTPSConnectionPool(host='westcentralus.api.cognitive.microsoft.com', port=443): Max retries exceeded with url: /vision/v2.0/recognizeText?mode=Handwritten (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001C64FD91CF8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))

Any help please?

Thank you.

Détails du document

⚠ Ne pas modifier cette section. C’est obligatoire pour docs.microsoft.com ➟ Liaison des problèmes GitHub.

ID: e4c8bd02-16d6-d043-4f2c-a337a70954f5
Version Independent ID: 8fa84a6d-5c1e-3562-43aa-019f968edd0f
Content: Computer Vision Python quickstart handwritten text - Microsoft Cognitive Services
Content Source: articles/cognitive-services/Computer-vision/QuickStarts/python-hand-text.md
Service: cognitive-services
GitHub Login: @noellelacharite
Microsoft Alias: nolachar

cognitive-servicesvc cxp doc-bug triaged

Source

Haager

All 6 comments

PS: I m working from my local computer.

Haager on 2 Jul 2018

@Haager Thanks for the feedback! We are currently investigating and will update you shortly.

YutongTie-MSFT on 2 Jul 2018

👍1

@Haager Thanks for the feedback! I have assigned the issue to the content author to investigate further and update the document as appropriate.

@noellelacharite Hi, I have reproduced the bug the customer mentioned above, I just did it as the document. Can you please check if there any change that will make the issue and update the document as necessary? Thanks a lot!

YutongTie-MSFT on 2 Jul 2018

👍1

HTTPError: 400 Client Error: Bad Request for url: https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Handwritten

The code same as above just changed the key for my account.

devenderA17010 on 6 Jul 2018

Hi, I have filed a work item for this bug, it will be fixed shortly. Thanks for your feedback. We will now proceed to close this thread. If there are further questions regarding this matter, please reopen it and we will gladly continue the discussion.

YutongTie-MSFT on 11 Jul 2018

@YutongTie-MSFT i'm still facing the issue. I'm trying to run the API on 6000+ images. I've provided a delay of 60 seconds after every 10 images, which should ideally take care of the 20 transactions per minute quota.

Input: Images (.png)
Desired Output:

File with extracted text only
File with extracted text and their corresponding bounding boxes

Error Msg: ConnectionError: HTTPSConnectionPool(host='westcentralus.api.cognitive.microsoft.com', port=443): Max retries exceeded with url: /vision/v2.0/ocr?language=unk&detectOrientation=true (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))

Code:

import warnings
warnings.filterwarnings("ignore")

import glob
import os
import requests
import pandas as pd
import time

# Replace the value of subscription_key with your subscription key.
subscription_key = "{key}"
assert subscription_key

# Replace the value of vision_base_url (not necessary for trial version)
vision_base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/"
analyze_url = vision_base_url + "ocr"

# Initializing Source and Output Directories
source_directory = glob.glob(''folder/with/6000/images/*.png')
output_directory_textFiles = 'folder/for/saving/6000/textFiles/'
output_directory_JSONFiles = 'folder/for/saving/6000/JSONFiles/'

if not os.path.exists(output_directory_textFiles):
        os.makedirs(output_directory_textFiles)

if not os.path.exists(output_directory_JSONFiles):
        os.makedirs(output_directory_JSONFiles)

# Define Function for Extracting Text

def extract_text(image_path):
# Read the image into a byte array
    image_data = open(image_path, "rb").read()
    headers    = {'Ocp-Apim-Subscription-Key': subscription_key, 'Content-Type': 'application/octet-stream'}
    params     = {'language': 'unk', 'detectOrientation': 'true'}
    response = requests.post(analyze_url, headers=headers, params=params, data=image_data)
    analysis = response.json()

# Extract the word bounding boxes and text.
    line_infos = [region["lines"] for region in analysis["regions"]]
    word_infos = []
    for line in line_infos:
        for word_metadata in line:
            for word_info in word_metadata["words"]:
                word_infos.append(word_info)
    return(word_infos)

# Generating Text and JSON Files

counter = 0
for image in sorted(source_directory):
    counter += 1
    print(r'Processing %d %s' %(counter, image))

    word_infos = extract_text(image)

    filename = image.split('/')[-1].replace('.png', '')

    if len(word_infos) != 0:
        bboxOutput = pd.DataFrame(word_infos)
        bboxOutput[['x','y', 'width','height']] = bboxOutput['boundingBox'].str.split(',',expand=True)
        bboxOutput = bboxOutput.drop(['boundingBox'], axis=1)

        textFile = bboxOutput['text']     

        textFile = textFile.to_csv(r'{}/{}.txt'.format(output_directory_textFiles, filename), header = False, index = None, sep = ',')
        jsonFile = bboxOutput.to_json(orient = 'records')
        with open(r'{}/{}.txt'.format(output_directory_JSONFiles, filename), 'w') as f:
            f.write(jsonFile)
            f.close()

    else:
        word_infos = pd.DataFrame(word_infos)
        textFile = word_infos.to_csv(r'{}/{}.txt'.format(output_directory_textFiles, filename), header = False, index = None, sep = ',')
        jsonFile = word_infos.to_json(orient = 'records')
        with open(r'{}/{}.txt'.format(output_directory_JSONFiles, filename), 'w') as f:
            f.write(jsonFile)
            f.close()

    if (counter % 10) == 0:
        time.sleep(60)

    else:
        pass