PaddleOCR - An Ultra Lightweight OCR System

This article has been divided into 8 sections, where we, four Embedded System master students from Delft University of Technology, will analyze this open-source project named PaddleOCR developed by Baidu Company. The first two sections will give a general view of it. Following that, it will analyze the main capabilities and usage of the product. Then, the stakeholders and the key attributes will be presented. Besides, it will explore the roadmap of this application. Finally, it will show the probable ethical issues of this current work. The structure of this article has been shown below:

Introduction and Goal of PaddleOCR

OCR(Optical Character Recognition) techniques have been developed to covert text-based documents into digital documents, whose application has skyrocketed recently. According to different recognition scenarios, OCR tools can be divided into general OCR tools and domain-specific OCR tools.

PaddleOCR is an open-source general OCR tool, which can use the original “Paddle” algorithm to support character recognition with light weight and multilingual features. It can position text in static and dynamic pictures regardless of their orientation and languages. The small-volume program could extract texts in the pictures with response time shortened to a microsecond.

For developers who are interested in OCR and deep learning, this project also provides a platform on which they can train and develop their ultra-lightweight model with the pre-trained model in PaddleOCR, making it an excellent tool for understanding the technology of deep learning and optical character recognition.

In the field of OCR, there exists a trade-off between effect and efficiency. The former refers to the accuracy of recognition, and the latter refers to reaction time. PaddleOCR apparently pursues a shorter response time and smaller size, which is not suitable for high-accuracy demand.

Domain Concepts

As a general OCR tool, PaddleOCR can be embedded into enormous applications, which is an excellent example of utilizing AI technology to create a paradigm shift in almost every field of technology industry.

PaddleOCR has three deployments - inference, serving and mobile Paddle Lite, which provides users and developers with flexible choices for different application scenarios. More specifically, inference and serving deployment are designed for local offline applications and the cloud, respectively. Mobile deployment can be implemented by compiling in Paddle Lite, which provides inference capabilities for mobile phones and IoTs, and integrates cross-platform hardware widely, providing lightweight deployment solutions for end-side deployment problems.

Figure: The Deployment of PaddleOCR

Considering its small size and response time, PaddleOCR is mainly designed for scenarios requiring high speed instead of precision. More specifically, it provides OCR support for IoT devices with limited computing power and storage space.

Currently, the demand of deploying artificial intelligence models on mobile and embedded devices is growing, mainly for two main reasons. Firstly, the enterprise may succumb to high cost brought by setting up a cloud server for the AI model. Secondly, users benefit from effective privacy protection because their pictures would be calculated in the local mobile terminals rather than servers in the cloud.

IoT devices are characteristic with embedded and mobility, which brings the challenge to the efficient deployment of the Neural Network model on devices with limited

computing power, low power demand and small memory. So developers could optimize the trained models by compressing model size and reducing computation during the deployment process to adapt to the limitations of computation power, energy and space, which leads to the compromise among accuracy, efficiency and flexibility of the model.

Main Capabilities

OCR systems have been used widely in variously different application scenarios. It is widely used both in factory automation, financial business, government business, and so on. There are several examples of the application scenarios of OCR. Firstly in the financial business, the OCR system can be used in the information extraction of business documents, like cheque, invoice, personal statement and receipt. Also, it can be used in automatic insurance documents and digital images of printed documents or books to make the documents searchable, e.g. Google Books. It also contributes a lot to the assistive technology for visually impaired users 1. Secondly, in factory automation, the OCR system is used in stamping and reading engraved parts with serial numbers to avoid mistakes in the production line. For example, at food factories to track the date codes, lot code and batch verification, and expiration dates to ensure food safety. Besides, in government business, it plays an important role in customs which is used to scan passports. Regarding infrastructures, in airports, it is used for passport recognition and information extraction. OCR is also can be used in traffic sign recognition. 2

Figure: Example of PaddleOCR Capabilities

Here the Paddle OCR is an ultra-lightweight OCR system designed to compensate for the computational cost. The overall model size is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols respectively. The PP OCR system can fit in a much smaller size embedded and mobile system. The low resource consumption determines the versatility of the applications, e.g. hand-held small devices.

Context

As we mentioned above, Paddle OCR can be applied in many contexts currently. As for now, the OCR follows some basic steps, including image pre-processing, character recognition and post-processing in OCR.

Figure: The framework of PaddleOCR

Paddle OCR follows the basic steps: The first step is image pre-processing in OCR, after pre-processing Paddle OCR will detect text. After text detection, it will rectify the detection boxes horizontally. Then it will recognize the text content and output the recognized image which is also called image post-processing.

As for the future context, it has been proposed in recent years that OCR systems deal more efficiently with specific types of input. Except for the specific lexicon, to achieve a higher-level performance, current OCR systems may take business rules, standard expression, or rich information contained in color images into account, which is called “Application-Oriented OCR” or “Customized OCR”, and has been applied to OCR of license plates, invoices, screenshots, ID cards, driver licenses, and automobile manufacturing. 4

Stakeholders and Demands

The stakeholders involved in this project covers industry, education, and independent developers.

Enterprises that need to develop embedded products with OCR can utilize this open-source project. Enterprises can obtain economic benefits. The lightweight feature of the system makes the hardware cost lower and eliminates the main process of OCR function development which saves software development costs. Software development and manufacturing costs reducing make it more competitive and profitable.

For universities and research institutions, they can use it for educational purposes. This project provides feasible solutions in the field of lightweight OCR. What’s more, PaddleOCR offers a research foundation and new thinking directions, which will nurture more innovative projects.

Independent developers can also benefit from this project by not only training their models but expanding development based on PaddleOCR. This project provides a lightweight OCR solution for independent developers, saving a huge amount of development time.