kha-white/manga-ocr-base is a forked repo from huggingface. License: apache-2-0

Go to file

Maciej Budyś 76ff029880 Create README.md		2022-01-20 22:39:39 +00:00
.gitattributes	initial commit	2022-01-15 17:39:06 +00:00
README.md	Create README.md	2022-01-20 22:39:39 +00:00
config.json	manga-ocr-base	2022-01-15 20:18:35 +01:00
preprocessor_config.json	manga-ocr-base	2022-01-15 20:18:35 +01:00
pytorch_model.bin	manga-ocr-base	2022-01-15 20:18:35 +01:00
special_tokens_map.json	manga-ocr-base	2022-01-15 20:18:35 +01:00
tokenizer_config.json	manga-ocr-base	2022-01-15 20:18:35 +01:00
vocab.txt	manga-ocr-base	2022-01-15 20:18:35 +01:00

README.md

language

Manga OCR

Optical character recognition for Japanese text, with the main focus being Japanese manga.

It uses Vision Encoder Decoder framework.

Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:

both vertical and horizontal text
text with furigana
text overlaid on images
wide variety of fonts and font styles
low quality images

Code for inference is available here.

Code for training will be released soon.