View on GitHub

Ocr-recognition

Undirected Graphical Model for the optical character word recognition task

Download this project as a .zip file Download this project as a tar.gz file

OCR Recognition

Undirected Graphical Model for the optical character word recognition task

Author

Rishi Dua http://github.com/rishirdua

Disclaimer

The problem below has been borrowed (with minor changes) from the Probabilistic Graphical Models course offered by Dr. Parag Singla at IIT Delhi (Fall 2013 Semester).

Problem Statement

Implementing and experimenting with a undirected graphical model for the optical character word recognition task. We will be studying computer vision task of recognizing words from images. We can recognize a word by recognizing the individual characters of the word. However recognizing a character a difficult task and each character is recognized independent of its neighbors, which often can result words that are not there in English language. So in this problem we will augment a simple OCR model with additional factors that capture some of our intuitions based on character co-occurrences and image similarities.

The undirected graphical model for recognition of a given word consists of two types of variables:

Undirected Graphical Model

The model for a word w will consist of len(w) observed image ids, and the same number of unobserved character variables. For a given assignment to these character variables, the model score (i.e. the probability of the assignment according to the model) will be specified using following factors:

Dataset Format

Potential Directory: – ocr.dat: Contains the output predictions of a pre-existing OCR system for the set of thousand images. Each row contains three tab separated values "id a prob" and represents the OCR system’s probability that image id represents character a (P (char = a|img = id) = prob). Use these values directly as the value of the factor between image and character variables at position i, ψo (image(i) = id, char(i) = a) = prob. Since there are 10 characters and 1000 images, the total number of rows in this file is 10,000. – trans.dat: Stores the factor potentials for the transition factors. Each row contains three tab-separated values ”a b value” that represents the value of factor when the previous character is "a" and the next character is "b", i.e. ψt (char(i) = a, char(i + 1) = b) = value. The number of rows in the file is 100 (10*10).

Data Directory: – data/truth-loops.dat: Contains observed images of one word on each row with a empty line between pairs. The observed images for a word are represented by a sequence of tab-separated integer ids (”id1 id2 id3”). – data/data-loops.dat: Stores the true words for the observed set of images in the respective rows with a empty line between pairs. True words are simply represented as strings (e.g. ”eat”). You will need to iterate through both the files together to ensure you have the true word along with the observed images.

Contribute

License

This project is licensed under the terms of the MIT license. See LICENCE.txt for details