Retroarch: Proposal: Adding OCR and translation button to allow playing games from any language

Created on 1 Dec 2017  路  31Comments  路  Source: libretro/RetroArch

This is a proposal to make it possible to play any game in any language with no ROM hack needed.

How the software will translate the text.

Tesseract is written in C++ and has C bindings, it could be compiled into RetroArch to extract text from the framebuffer.

The text could be sent to google translate(there are no other free translation tools that work between English, Japanese, Spanish and other euro languages and none thats open source or offline) and the framebuffer and output of translate logged, when offline the text associated with the closest matching framebuffer would be returned.

An overview of proposed usage from the users end.

  1. Set your native language.
  2. Set the font for the output text.
  3. Assign a button that when pressed with non native text on screen, would pause the game, print that text translated to there language on the screen, and when pressed the second time close the textbox, resume the game and write the logged framebuffer and translated text to "{SAVE_DIR}/{GAME_NAME}-translation-{LANG_CODE}/{TIMESTAMP}.{txt/png}".
  4. Set the game starting language in your playlist, or leave blank and default to auto detect based on libretro-db region codes.
  5. Start your game.
  6. (Optional)Submit your translation logs to libretro-db to be used by others offline.

This will require each branch of dialog to at least be played through once and submitted to the github repository for a fully working offline translation, but if you do find untranslated dialog you can still continue online and have it translated when you can connect to the internet.

If there ever is a open source offline translation tool it would be used in place of google translate and the offline caching would no longer be needed.

This is a huge task, it cannot be accomplished by one person and I currently have a few projects, but if a few people would like to work on it with me I will take this on.

discussion feature request

Most helpful comment

Hey everyone, I've been working on this issue myself. It was fun learning how to write c again. Anyway, here's a demo of what I have so far: https://youtu.be/o0TOaxD9zcs

The idea is that when you pause, it sends the image off to a url you specify in your configuration, and then displays the returned image. That way, you don't have to bloat up the repo with a lot of extra code, and you can get support on more platforms. The server code it calls is in another repo I made (open sourced), and it does the OCR via the google ocr api, and the translation via the google translate api. So to run it in practice you would either start up a local server with some google api keys, or you can connect to a server someone else has set up.

I would still like to add in the option of not calling the google api for OCR and instead use tesseract locally. Out of the box, you can probably get good enough results for latin-based languages, with some caveats. Japanese OCR with tesseract is a bit trickier, but it might work if we get enough google ocr api calls, and then use their return values to train a better OCR dataset. As well, in my retroarch code, there's some problems with writing the translated image when the core pixel format is XRGB8888 instead of 565, but hopefully someone with a better idea of the code base can help with that part.

All 31 comments

Is there an API for Google's translate-from-image service? If so, you could just take a screenshot and feed it to the service and then print the text right then and there. That is, no need for offline caching, etc.

The goal would be to have everything completely offline eventually, either with caching or if another tool comes out thats open source and works offline.

Is there any interest still in this feature? I'd be willing to throw 20 buck for a bounty for it. Its not much money but i feel its a cool enough feature other people might put in.

Yes, I have just been very busy with making a Palm OS emulator, the proposed process for translation is completely outlined above so someone else can take over if they know C.

It actually shouldnt be that hard ~1 month worth of time at most.

Cool, i went ahead and added $20 and made this a bounty. I will consider the bounty complete as outlined by meepingsnesroms.
Bounty source link: https://www.bountysource.com/issues/52185180-proposal-adding-ocr-and-translation-button-to-allow-playing-games-from-any-language

I had a bit of extra money so i added 20 more.

"Is there an API for Google's translate-from-image service?" - Probably not a good anyway as Google's APIs have a bad habit of disappearing or suddenly becoming a charged-for service.

there are no other free translation tools that work between English, Japanese, Spanish

Google Translate's API is paid-for. Bing has a free tier, but would require every user to sign up for Azure, and it is still capped.

While I could see options and value of integrating OCR, and perhaps a dictionary, into RetroArch (albeit I still believe per-platform methods relying on parsing screen data are often superior - the thing is, they're never universal), I don't think we're going to get machine translation.

I am learning Japanese now, so in a few months I may be able to verify the results if anyone gets this working.

That's cool to hear! Good luck in learning Japanese!

The idea sounds exciting, this kind of tool could potentially be used in any game to provide quick translations to a lot of other languages and make games more accessible to other countries. The tool could also be used to insert translated scripts by fans not just bad translations using google translate, would be really useful for quick fan translations projects. Video games fans are very dedicated just look at the amount of fan translations of roms, with this they could just provide a translation or localization of the script and the tool could synchronize it to the in game text, and because all is just OCR and text on an overlay you wouldn't have to worry about game compatibility with the software. You could even create a database with the translated scripts so the community can improve them or add their own. That certainly would be a really useful tool if the Retroarch team can pull it off, it could potentially revolutionize game translations, definitely going to contribute to the bounty as soon as I can.

That's another good thing, we can have the ocr dump the text and translate it and people can go back and re-edit the translation if its not good enough.

Hopefully in the future translation will be much better. I mean even 10 years ago we didn't have machine translators so there is much growth happening in the field. This bounty is about getting something started, making it easier to add and change it as technology changes and advances.

@denim2x Thanks for starting to implement this feature! What is the roadmap/plan for you implementation?

@nihilisticeevee I have something that I recently released that might be similar to what you're looking for, but is a standalone helper application instead. If you have a way I could DM you, I could send you more information (it's pretty cool stuff).

For this proposal though, from my experience, there are a few things to keep in mind: Tesseract by itself is not going to be general enough to do "scene" text, like text from a screenshot - that is still a difficult open problem. For image-to-text applications (like google's OCR API), you try a bunch of different pre-processing algorithms to reduce the image down to white-text on a solid black ground, and only then use a typical OCR library like Tesseract. Most OCR work comes down to writing those pre-processors and tweaking the library settings to make it work for your dataset - and even then it can still be hard enough. For a prototype, it makes more sense to allow users to supply their own google cloud keys if they want to use the google OCR/MT APIs. On machine scene-level OCR, while possible, will require an unreasonable amount of work. There will be a 4 sec lag between hitting the OCR key and getting the translated image (depends on the screen size for the upload and processing, as well as how much text is on the screen), but it would be playable.

Unfortunately (I don't mean to be a downer here, but you should temper your expectations), if the goal is Japanese->English machine translation, then the accuracy of the translations will definitely be bad, though you may still get the gist or a bit better. In the project I talked about above, I was doing German->English, and the MT was pretty good, since the languages are closely related (though it was rarely perfect). Despite that, I think on-machine MT is more feasible than OCR, assuming you can get the text itself. Visual Novel Reader had a plugin to get text without OCR (worked on a bunch of Japanese visual novels, as well as PSP, WII and gamecube I think), but obviously it was dependent on the game/novel format type (more recent novel format don't work, for example).

@BarryJRowe I would prefer it not be a separate application, as i would like it to be more multi-platform.

I do understand the problems with ocr, but i do hope we could tweak it for each system. While i do understand even within a given system the games fonts and format and vary differently, i do hope we can get most of the general cases.

For the lag, maybe we could split the process in to the other unused cores, as most emulators only use 1 core. If that is unfeasible, then the lag will be acceptable until a faster way appears.

As for the translation, i don't expect them to be great. I just want to have a game "more playable". Apart from just Japanese games, I am also interested in many of the European home computers. So maybe some of those translations will be better. But again, i only want "more playable" games.

@nihilisticeevee I'm planning to integrate a C++ port (work-in-progress) of the Facebook's MUSE lib with RetroArch; what is your view on this approach?

@denim2x Sounds promising, but i believe the main devs prefer the code to be c98.

@meepingsnesroms Any comments?

@denim2x I'm not sure I understand how you're using the MUSE lib. Is there something for OCR inside the particular port you're looking at, or are you using tesseract for the OCR portion as well? The github for the MUSE lib says it's for word vector embeddings, which is typically used as a step in NLP instead of OCR itself. I could see after the image-to-text phase has some probable words that might be in the image, and you want to select a best possible word based on word contexts in the sentence, you could use word embedding at that point, but that would be the last-last step of the process.

C++ should be fine because its an optional feature(it can just be compiled out if it doesnt work) and we cant be too picky if we want to use a preexisting library, C89 is obviously preferred though.
I think all are platforms work with C++ though, the limiting factor will most likely be WiFi access and device speed.

Alright, that sounds great.

@BarryJRowe The general idea: framebuffer --> (Tesseract) string/langX --> (MUSE) string/langY; MUSE helps with the translation part

If MUSE can do offline translations that would be way cool, that would make it viable for many more platforms.

@denim2x Ok, that makes more sense. I'm not familiar enough with MUSE or even MT in general to comment more though. However, I think the biggest question is whether or not the idea is feasible. Tesseract comes with a command-line version, so you can test out how well you can extract text with it, and then use the output with the python MUSE library to get an idea of how good the results would be in theory.

I'm going to have to rectify my proposal, as follows:

  • MUSE doesn't support Sentence translation for Japanase<->English, only word translation;
  • the better approach - Neural MT + training data from Kaggle etc.

Hey everyone, I've been working on this issue myself. It was fun learning how to write c again. Anyway, here's a demo of what I have so far: https://youtu.be/o0TOaxD9zcs

The idea is that when you pause, it sends the image off to a url you specify in your configuration, and then displays the returned image. That way, you don't have to bloat up the repo with a lot of extra code, and you can get support on more platforms. The server code it calls is in another repo I made (open sourced), and it does the OCR via the google ocr api, and the translation via the google translate api. So to run it in practice you would either start up a local server with some google api keys, or you can connect to a server someone else has set up.

I would still like to add in the option of not calling the google api for OCR and instead use tesseract locally. Out of the box, you can probably get good enough results for latin-based languages, with some caveats. Japanese OCR with tesseract is a bit trickier, but it might work if we get enough google ocr api calls, and then use their return values to train a better OCR dataset. As well, in my retroarch code, there's some problems with writing the translated image when the core pixel format is XRGB8888 instead of 565, but hopefully someone with a better idea of the code base can help with that part.

As an update, I'm still working on the tesseract text extraction to make it a bit smarter, but it's coming along. Most of the code left in the RetroArch repo that needs to be modified is around stability on different platforms, and the XRGB8888 conversion stuff I mentioned before. Anyone willing to help debug some issues are welcome to contact me over the discord channel (nick: Beaker).

@toolboc Interesting implementation, BarryJRowe is also working on a solution, his overlays the text on the game using only one window. He is using google vision api, google translate api, and tesseract. It would be great if you would join the libretro discord and we can discuss the different solutions.

Hi there to all bounty backers -

we want to start testing this OCR feature now, and we'd like to request the bounty backers' help in testing.

Could any of you meet us on our Libretro discord and contact Autechre? Send me a PM, tell me it's about the bounty OCR feature and that you are a backer. After that is checked out, we will bring you to a channel where you could test a special pre-release version of RA with this feature enabled for the purpose of testing it.

NOTE: It has been established already that Beaker and only Beaker will be eligible for collecting this bounty. Just a heads-up, he has already put in significant work and the 1.7.8 release will already see an initial version of the OCR work. From there, we want the userbase/audience to help inform us how they want the features to develop further. Afte that is done, we can reward him the bounty.

Hello, is it possible to add "romaji" and / or "kana" to the target translation languages?
This is the reading of the Japanese source in either roman letters or Japanese kanas.
This will allow players that understand Japanese but with kanjis reading difficulties - the most difficult aspect of learning Japanese - to read the text and understand it, and should be perfectly accurate as this is not a translation. Thanks!

I'm going to close this bounty so that @BarryJRowe can finally claim his just reward. Awesome achievement on his part.

Was this page helpful?
0 / 5 - 0 ratings