How To?

How to develop OCR for a mobile application?

min read

Published

October 9, 2018

How to develop OCR for a mobile application?

min read

Published

October 9, 2018

Artem Velykyy

Mobile Chapter Lead

LEARN ABOUT THE AUTHOR

Approved by

min read

Published

October 9, 2018

Updated

Table of contentS

This is also a heading
This is a heading

Automation is the driver that moves progress forward and every industry of the world's economy is interested in it. The reason for such interest is that progress is associated with effectiveness which leads to revenue growth. The demand for automation is a key market driver that gives opportunities for businesses. In this blog post from the mobile development series, we're going to speak about automation in document processing. To be more precise, we'll focus on document processing with optical character recognition (OCR) and will describe steps that are needed to manage the delivery of such solutions. So, let's go!

Initial analysis

Before starting the development process, we have to analyze what we have and what potential problems we may face. The first thing to focus on are the IDs for which we are going to build a processing solution. We have to make sure that it is possible to get as many originals as possible. Without having the original IDs it’ll be very difficult to deliver the OCR solution with high accuracy.

Every type of ID has its own fonts, palette, and specific structure. Once you've got the originals you're able to understand how it looks like within different lighting and environment conditions (in the office, on the street, in the morning, in the evening, on different surfaces etc). Besides that, you'll be able to analyze the font and understand the impact of the special symbols, watermarks or holograms to your capturing process. The developers will have to create appropriate algorithms to preprocess the images and make it possible to deliver the OCR solution that works in any lighting conditions or the environment.

Training dataset

After that analysis, we can proceed to the training dataset. Nowadays, all recognition solutions are working with neural networks. This is a modern technology that is based on how human's neurons work. This is a topic of a separate article, but important for us is to understand that in the context of neural networks it is a big training dataset that makes the difference. If you don't have it, you won’t be able to deliver a good OCR solution.

So, what is the training dataset? Basically, it is a set of ID images which are taken in different lighting conditions and environments. Different lighting conditions affect how the image looks like. And the goal of the dataset is to provide sufficient training information the neural network to understand the ID’s characters in those conditions. Using the dataset, the developers will be able to understand what preprocessing algorithms to apply for better OCR functionality. ‍

Machine learning

The first thing that comes to mind is to take a laptop or a personal computer that is used for everyday work and start the training process there. That is a good approach to learn how everything works and to understand the concept. But it is not a good choice if you're building a solution. It would be more effective to use cloud-based services, like Amazon ML. By using such services you receive the speed of training measured in hours, compared with the first approach, where you receive it in days or even weeks.

Development process

After we've passed the analysis, understood our potential bottlenecks and resolved the training dataset problems, we can start the development process. It can be divided into several main steps: capturing, image processing, and OCR. Let's go one by one and see what is covered by those steps.

In order to understand how the implementation process goes, we have to introduce the concept terms, such as “the capture session” and “the frame”. The capture session is basically a video flow from the phone's camera. And the frame is a part of the video flow that is used for analysis. In a simplified way the development process will look like:

1. Start of the capture session

2. Capturing the frame

3. Frames pre-processing

4. Edges detection

5. Cropping the ID

6. Text detection

7. Cropping the text lines

8. Passing text lines to a neural network

9. Evaluating the OCR result

By analyzing the intermediate results of every step in the development process your developers will be able to understand what has to be tuned to achieve a better accuracy.

Summary

Developing the OCR solution could be very challenging. Besides the required for the development team to have specific skills like machine learning, computer vision, and image processing, it is very difficult to forecast the delivery timeline and accuracy of the solution. You should keep in mind that accuracy will be entirely dependant on the quality of the data set, its size, and innovative algorithms that are created by talented software engineers. Having all that in place will increase your chances for automation success.

In our next blog post we will focus on the Apple Store Review Guidelines and the most common problems related to the review process.

‍

FAQ

What are the key steps involved in developing OCR for a mobile app?

First, gather and prepare a high-quality training dataset of images with text. Next, choose or build a machine learning model suited for recognizing characters. Then, integrate and optimize this model within your app, focusing on accuracy and performance.

How important is the training dataset for OCR accuracy?

The training dataset is crucial because the OCR system learns to recognize characters from this data. A diverse, well-labeled, and large dataset improves accuracy, especially for different fonts, languages, and lighting conditions.

Can OCR work effectively on mobile devices with limited resources?

Yes, but it requires optimization. Developers often use lightweight models or perform some processing in the cloud to balance accuracy and speed without overloading the device.

Product Discovery Lab

Free product discovery workshop to clarify your software idea, define requirements, and outline the scope of work. Request for free now.

LEARN more

From Bricks to Bots:
AI in Real Estate

Use cases for PropTech professionals.

Download for free

Software development Team

[1]

No items found.

related cases

[2]

Need estimation?

Leave your contacts and get clear and realistic estimations in the next 24 hours.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

What We Know, Now You Know.

How to develop OCR for a mobile application?

Initial analysis

Training dataset

Machine learning

Development process

1. Start of the capture session

2. Capturing the frame

3. Frames pre-processing

4. Edges detection

5. Cropping the ID

6. Text detection

7. Cropping the text lines

8. Passing text lines to a neural network

9. Evaluating the OCR result

Summary

FAQ

What are the key steps involved in developing OCR for a mobile app?

How important is the training dataset for OCR accuracy?

Can OCR work effectively on mobile devices with limited resources?

Product Discovery Lab

From Bricks to Bots:AI in Real Estate

Software development Team

related cases

Need estimation?

From Bricks to Bots:
AI in Real Estate