Raspberry pi: JPG轉換成TIFF格式的OCR與的tesseract

首先安裝pip套件

sudo apt-get install python-pip

透過pip下載PIL套件

sudo pip install PIL

如果不能執行

pip install PIL --allow-unverified PIL --allow-all-external

接下來安裝 tesseract-ocr 套件

sudo apt-get install tesseract-ocr

安裝PyTesser

wget https://pytesser.googlecode.com/files/pytesser_v0.0.1.zip

將檔案解壓縮

unzip pytesser_v0.0.1.zip -d pytesser
cd pytesser

convert fonts_test.png -auto-level -compress none myimage.tif

如果不能執行

sudo apt-get install imagemagick

fonts_test.png => 輸入辨識圖片
myimage.tif => 產生出來的檔案
執行完之後會產生myimage.tif檔案再來建立一個demo.py檔案來觀看辨識結果，內容如下：

from PIL import Image
from pytesser import *


image_file = 'myimage.tif'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

檔案請自行跟改檔名~ 執行demo.py檔案(執行時間依電腦硬體而定)

python demo.py

參考資料:
https://www.youtube.com/watch?v=LRXS3mC0OKo
http://fosshelp.blogspot.tw/2013/04/how-to-convert-jpg-to-tiff-for-ocr-with.html

Raspberry pi

Sunday, May 31, 2015

JPG轉換成TIFF格式的OCR與的tesseract

No comments:

Post a Comment