Automation is the driver that moves progress forward and every industry of the world's economy is interested in it. The reason for such interest is that progress is associated with effectiveness which leads to revenue growth. The demand for automation is a key market driver that gives opportunities for businesses. In this blog post from the mobile development series, we're going to speak about automation in document processing. To be more precise, we'll focus on document processing with optical character recognition (OCR) and will describe steps that are needed to manage the delivery of such solutions. So, let's go!
Initial analysis
Before starting the development process, we have to analyze what we have and what potential problems we may face. The first thing to focus on are the IDs for which we are going to build a processing solution. We have to make sure that it is possible to get as many originals as possible. Without having the original IDs it’ll be very difficult to deliver the OCR solution with high accuracy.
Every type of ID has its own fonts, palette, and specific structure. Once you've got the originals you're able to understand how it looks like within different lighting and environment conditions (in the office, on the street, in the morning, in the evening, on different surfaces etc). Besides that, you'll be able to analyze the font and understand the impact of the special symbols, watermarks or holograms to your capturing process. The developers will have to create appropriate algorithms to preprocess the images and make it possible to deliver the OCR solution that works in any lighting conditions or the environment.
Training dataset
After that analysis, we can proceed to the training dataset. Nowadays, all recognition solutions are working with neural networks. This is a modern technology that is based on how human's neurons work. This is a topic of a separate article, but important for us is to understand that in the context of neural networks it is a big training dataset that makes the difference. If you don't have it, you won’t be able to deliver a good OCR solution.
So, what is the training dataset? Basically, it is a set of ID images which are taken in different lighting conditions and environments. Different lighting conditions affect how the image looks like. And the goal of the dataset is to provide sufficient training information the neural network to understand the ID’s characters in those conditions. Using the dataset, the developers will be able to understand what preprocessing algorithms to apply for better OCR functionality.
Machine learning
The first thing that comes to mind is to take a laptop or a personal computer that is used for everyday work and start the training process there. That is a good approach to learn how everything works and to understand the concept. But it is not a good choice if you're building a solution. It would be more effective to use cloud-based services, like Amazon ML. By using such services you receive the speed of training measured in hours, compared with the first approach, where you receive it in days or even weeks.
Development process
After we've passed the analysis, understood our potential bottlenecks and resolved the training dataset problems, we can start the development process. It can be divided into several main steps: capturing, image processing, and OCR. Let's go one by one and see what is covered by those steps.
In order to understand how the implementation process goes, we have to introduce the concept terms, such as “the capture session” and “the frame”. The capture session is basically a video flow from the phone's camera. And the frame is a part of the video flow that is used for analysis. In a simplified way the development process will look like:
1. Start of the capture session
2. Capturing the frame
3. Frames pre-processing
4. Edges detection
5. Cropping the ID
6. Text detection
7. Cropping the text lines
8. Passing text lines to a neural network
9. Evaluating the OCR result
By analyzing the intermediate results of every step in the development process your developers will be able to understand what has to be tuned to achieve a better accuracy.
Summary
Developing the OCR solution could be very challenging. Besides the required for the development team to have specific skills like machine learning, computer vision, and image processing, it is very difficult to forecast the delivery timeline and accuracy of the solution. You should keep in mind that accuracy will be entirely dependant on the quality of the data set, its size, and innovative algorithms that are created by talented software engineers. Having all that in place will increase your chances for automation success.
In our next blog post we will focus on the Apple Store Review Guidelines and the most common problems related to the review process.