Hello,
I got the API keys for the 7 day trial but I can't find a way to run the script. I have the error below:
ConnectionError: HTTPSConnectionPool(host='westcentralus.api.cognitive.microsoft.com', port=443): Max retries exceeded with url: /vision/v2.0/recognizeText?mode=Handwritten (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001C64FD91CF8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))
Any help please?
Thank you.
⚠Ne pas modifier cette section. C’est obligatoire pour docs.microsoft.com ➟ Liaison des problèmes GitHub.
PS: I m working from my local computer.
@Haager Thanks for the feedback! We are currently investigating and will update you shortly.
@Haager Thanks for the feedback! I have assigned the issue to the content author to investigate further and update the document as appropriate.
@noellelacharite Hi, I have reproduced the bug the customer mentioned above, I just did it as the document. Can you please check if there any change that will make the issue and update the document as necessary? Thanks a lot!
HTTPError: 400 Client Error: Bad Request for url: https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Handwritten
The code same as above just changed the key for my account.
Hi, I have filed a work item for this bug, it will be fixed shortly. Thanks for your feedback. We will now proceed to close this thread. If there are further questions regarding this matter, please reopen it and we will gladly continue the discussion.
@YutongTie-MSFT i'm still facing the issue. I'm trying to run the API on 6000+ images. I've provided a delay of 60 seconds after every 10 images, which should ideally take care of the 20 transactions per minute quota.
Input: Images (.png)
Desired Output:
Error Msg: ConnectionError: HTTPSConnectionPool(host='westcentralus.api.cognitive.microsoft.com', port=443): Max retries exceeded with url: /vision/v2.0/ocr?language=unk&detectOrientation=true (Caused by NewConnectionError('
Code:
import warnings
warnings.filterwarnings("ignore")
import glob
import os
import requests
import pandas as pd
import time
# Replace the value of subscription_key with your subscription key.
subscription_key = "{key}"
assert subscription_key
# Replace the value of vision_base_url (not necessary for trial version)
vision_base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/"
analyze_url = vision_base_url + "ocr"
# Initializing Source and Output Directories
source_directory = glob.glob(''folder/with/6000/images/*.png')
output_directory_textFiles = 'folder/for/saving/6000/textFiles/'
output_directory_JSONFiles = 'folder/for/saving/6000/JSONFiles/'
if not os.path.exists(output_directory_textFiles):
os.makedirs(output_directory_textFiles)
if not os.path.exists(output_directory_JSONFiles):
os.makedirs(output_directory_JSONFiles)
# Define Function for Extracting Text
def extract_text(image_path):
# Read the image into a byte array
image_data = open(image_path, "rb").read()
headers = {'Ocp-Apim-Subscription-Key': subscription_key, 'Content-Type': 'application/octet-stream'}
params = {'language': 'unk', 'detectOrientation': 'true'}
response = requests.post(analyze_url, headers=headers, params=params, data=image_data)
analysis = response.json()
# Extract the word bounding boxes and text.
line_infos = [region["lines"] for region in analysis["regions"]]
word_infos = []
for line in line_infos:
for word_metadata in line:
for word_info in word_metadata["words"]:
word_infos.append(word_info)
return(word_infos)
# Generating Text and JSON Files
counter = 0
for image in sorted(source_directory):
counter += 1
print(r'Processing %d %s' %(counter, image))
word_infos = extract_text(image)
filename = image.split('/')[-1].replace('.png', '')
if len(word_infos) != 0:
bboxOutput = pd.DataFrame(word_infos)
bboxOutput[['x','y', 'width','height']] = bboxOutput['boundingBox'].str.split(',',expand=True)
bboxOutput = bboxOutput.drop(['boundingBox'], axis=1)
textFile = bboxOutput['text']
textFile = textFile.to_csv(r'{}/{}.txt'.format(output_directory_textFiles, filename), header = False, index = None, sep = ',')
jsonFile = bboxOutput.to_json(orient = 'records')
with open(r'{}/{}.txt'.format(output_directory_JSONFiles, filename), 'w') as f:
f.write(jsonFile)
f.close()
else:
word_infos = pd.DataFrame(word_infos)
textFile = word_infos.to_csv(r'{}/{}.txt'.format(output_directory_textFiles, filename), header = False, index = None, sep = ',')
jsonFile = word_infos.to_json(orient = 'records')
with open(r'{}/{}.txt'.format(output_directory_JSONFiles, filename), 'w') as f:
f.write(jsonFile)
f.close()
if (counter % 10) == 0:
time.sleep(60)
else:
pass