Zettlr: Feature: Vision API

Created on 6 Feb 2019  ·  6Comments  ·  Source: Zettlr/Zettlr

So this is a real long shot but I'll put it in anyway.

One of the coolest features in Evernote and Onenote is the ability to OCR handwritten notes. That goes for notes written in the software and images imported in. At this point, I'm not thinking beyond md for Zettler but it would be awesome if images imported into my notes were somehow OCRed and became searchable (maybe a json file could be associated with the image - I haven't experimented with images in Zettlr yet).

I was just trying out Google's vision api - you can drag a photo of notes in there and see what google can find in your handwriting. It's pretty impressive and would be useful for the most part.

I realise that this is complicated with having an API key and how pricing would work etc.

So this is my initial proposal: keep this as a long term goal. First attempts at implementation don't come bundled with an API key - users provide their own (so if you want it, you have to sign up with google or whoever's API you want to use).

Most helpful comment

Huiuiui, well. OCR has been an issue for me since I started digitizing my workflows. OCR'd PDF or images are still something pretty difficult to implement if you want to abstain from proprietary applications.

While I'm generally in favor of supporting more and more stuff so that more and more people can migrate their workflow to Zettlr, here's my two cents:

  1. OCR support is a good thing in the long run, so please count me in on that idea!
  2. Yet, I will _not_ (and I repeat: not!) support any proprietary API. Everything Zettlr supports must come free of charge and be transparent. Besides, do we really want to give Google all our handwritten data …? For nearly everything there's an Open Source API — just take geo information (OpenStreetMaps) or SSL certificates (the Electronic Frontier Foundation)
  3. But, don't hesitate: There is an Open Source option available for OCR! Tesseract (funny thing: It's also been invented by Google) can be run on any operating system and I could easily integrate a small API that simply takes images and spits out recognised text.
  4. This would mean that Tesseract would enjoy the same status as Pandoc and the LaTeX-binaries: You install Zettlr, and afterwards install all the external components you want to use, and you're good to go!

So there are a lot of implementation questions arising from this general idea (which is where the "long run" part of this idea comes from). Just some questions that immediately came to my mind:

  • What functionality should an OCR import support? (e.g.: should there be a context menu entry "OCR this image…" or should it strictly work with external images, like a file import?)
  • How should OCR be configured (e.g. default languages to scan for)? What options would one possibly need?
  • Where should the OCR'd text appear? Instead of the embedded image in Markdown? Below it? In an extra dialog window from where one may copy it?
  • Maybe even a mode in which users may open a canvas, draw something with mouse or touch pen, and afterwards (by clicking save) let the tesseract engine recognise it?

What I would _not_ support would be OCR'd PDFs, because this is out of the range of functionality of Zettlr. There are talks of writing such a plugin for Zotero; but Zettlr should focus on pure image-to-text functionality. (I imagined Zettlr as a minimalist app for only writing Markdown files, and look what it has become already :D)

Any thoughts and additional comments?

Cheers!

P.S.: In spite of what I just wrote, I would suggest we change the title of this discussion to the more general "OCR support", as the Vision API is now out of discussion. Any objections?

All 6 comments

Image OCR is common-ish now. Google Keep has a brilliant one. You can just copy paste from there. As far as I've seen, people coming from Evernote or OneNote ask for OCR that works with PDFs.

But here's to short and long term goals. 🍻

Huiuiui, well. OCR has been an issue for me since I started digitizing my workflows. OCR'd PDF or images are still something pretty difficult to implement if you want to abstain from proprietary applications.

While I'm generally in favor of supporting more and more stuff so that more and more people can migrate their workflow to Zettlr, here's my two cents:

  1. OCR support is a good thing in the long run, so please count me in on that idea!
  2. Yet, I will _not_ (and I repeat: not!) support any proprietary API. Everything Zettlr supports must come free of charge and be transparent. Besides, do we really want to give Google all our handwritten data …? For nearly everything there's an Open Source API — just take geo information (OpenStreetMaps) or SSL certificates (the Electronic Frontier Foundation)
  3. But, don't hesitate: There is an Open Source option available for OCR! Tesseract (funny thing: It's also been invented by Google) can be run on any operating system and I could easily integrate a small API that simply takes images and spits out recognised text.
  4. This would mean that Tesseract would enjoy the same status as Pandoc and the LaTeX-binaries: You install Zettlr, and afterwards install all the external components you want to use, and you're good to go!

So there are a lot of implementation questions arising from this general idea (which is where the "long run" part of this idea comes from). Just some questions that immediately came to my mind:

  • What functionality should an OCR import support? (e.g.: should there be a context menu entry "OCR this image…" or should it strictly work with external images, like a file import?)
  • How should OCR be configured (e.g. default languages to scan for)? What options would one possibly need?
  • Where should the OCR'd text appear? Instead of the embedded image in Markdown? Below it? In an extra dialog window from where one may copy it?
  • Maybe even a mode in which users may open a canvas, draw something with mouse or touch pen, and afterwards (by clicking save) let the tesseract engine recognise it?

What I would _not_ support would be OCR'd PDFs, because this is out of the range of functionality of Zettlr. There are talks of writing such a plugin for Zotero; but Zettlr should focus on pure image-to-text functionality. (I imagined Zettlr as a minimalist app for only writing Markdown files, and look what it has become already :D)

Any thoughts and additional comments?

Cheers!

P.S.: In spite of what I just wrote, I would suggest we change the title of this discussion to the more general "OCR support", as the Vision API is now out of discussion. Any objections?

I wasn't disagreeing with you or anything. I was just pointing out the woes of people coming from softwares like Evernote & OneNote. PDF OCR support is more in demand if you look at the open issues of some of the other Writing apps or comments on Reddit and such.

OCR is good for digitizing workflows or people trying to go paperless.

I wasn't meaning to imply you were, so sorry for the misunderstanding! :)

I fully agree with you that PDF OCR is quite a huge problem, because there are only two apps I know of that reliably do this to a very good degree — Abby FineReader and Adobe Acrobat — and both are proprietary. So I definitely see the need to support OCR.

Still, I would insist that supporting to OCR PDF-files is not within the range of Zettlr, but should be handled by literature management softwares like Zotero or JabRef. That's why I just wanted to clarify that while OCR support is indeed great, OCR'ing PDF files is not what Zettlr should do.

Depending on how difficult it is, maybe a Zotero plugin would be the solution of all our problems. (I'm currently thinking that maybe I'll get to this as well, but again, it's longterm ...)

For a longterm plan it is good. My longterm plan is to ditch my car for a jetpack so 🤞.

I'll close this issue for now, as OCR is now at least on my mind, but I want to keep the issue list clean for now, b/c I experience first symtpoms of Open Source Burnout with me!

Was this page helpful?
0 / 5 - 0 ratings