Emproto was mandated by an Executive agency of the European Commission to build a platform that would extract essential information like ingredients and nutrition from food wrappers. The objective of this exercise was to analyze the health quotient of packaged food products available in the European Union.
Brief given to Emproto:
Field workers from the agency would go to supermarkets and capture images from food wrappers available. The images would then be uploaded to the platform. The platform had to process the images and extract the data. The extracted data needs to be organized in a tabular format. This data would be checked and approved by an admin
The application uses Vision API provided by Firebase for text extraction. The extracted information is presented in tabular form to ensure usability and further data processing.
|API Calls||Retrofit, Android library https://square.github.io/retrofit/|
|Text Extraction||Vision API, Firebase|
|Image Capture||Camera2 Packagehttps://developer.android.com/reference/android/hardware/camera2/package-summary|
Selecting the right tool : Tesseract OCR vs Vision API
We considered Tesseract OCR and Vision API. Tesseract OCR gives text output and the accuracy was not as expected.
We chose Vision API for the following reasons
Vision api Organizing text into table format:
Vision API gives accurate results as text. We needed to convert the text to tabular form for further manipulation. We used a function, getBoundingBoxes() which gives a Rect Object corresponding to the position of the text.
rectTemp1 = textBlockTemp.getBoundingBox();
Multiple language support:
This application is intended for use across Europe and therefore needs to support multiple languages.
In Tesseract, If we need to support multiple languages. We need trained data to be stored in device path for each of those languages.
DATA_PATH -> it’s a path that all the trained data should be stored.
If we want multiple language support, we have to give it like this. tessBaseApi.init(DATA_PATH, “eng+tamil”);
In Vision API, the method
recognizer = vision.getCloudTextRecognizer(options);
allows us to recognize both Latin and Non-Latin languages.
Memory management issue:
We used the recycler view class to optimize memory but this resulted in OutofMemory issue. When the app tries to render more than 50 images at a time, memory overloading happens resulting in a crash. To handle this, we decided to show only 10 images first. On scroll, we show a sequence of 5 images.
The accuracy of the results obtained provide us a few exciting opportunities going forward. We can completely automate the workflow eliminating the need for manual approval. We can also use it to build a crowdsourced food recommendation engine that can guide us in our dietary habits.