kha-white/manga-ocr-base is a forked repo from huggingface. License: apache-2-0
Go to file
Maciej Budyś 76ff029880 Create README.md 2022-01-20 22:39:39 +00:00
.gitattributes initial commit 2022-01-15 17:39:06 +00:00
README.md Create README.md 2022-01-20 22:39:39 +00:00
config.json manga-ocr-base 2022-01-15 20:18:35 +01:00
preprocessor_config.json manga-ocr-base 2022-01-15 20:18:35 +01:00
pytorch_model.bin manga-ocr-base 2022-01-15 20:18:35 +01:00
special_tokens_map.json manga-ocr-base 2022-01-15 20:18:35 +01:00
tokenizer_config.json manga-ocr-base 2022-01-15 20:18:35 +01:00
vocab.txt manga-ocr-base 2022-01-15 20:18:35 +01:00

README.md

language tags license datasets
ja
image-to-text
apache-2.0
manga109s

Manga OCR

Optical character recognition for Japanese text, with the main focus being Japanese manga.

It uses Vision Encoder Decoder framework.

Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:

  • both vertical and horizontal text
  • text with furigana
  • text overlaid on images
  • wide variety of fonts and font styles
  • low quality images

Code for inference is available here.

Code for training will be released soon.