ML Kit Tutorial: How to recognize and extract text in images

How to recognize and extract text in images

This Machine Learning Kit(ML Kit) tutorial is prepared to help you quickly get started with using Machine learning models in your app. In this article, we will learn how to use ML Kit to recognize text in images. The ML Kit has two types of APIs for text recognition: a general-purpose API suitable for recognizing text in images, such as texts of a street sign, and an optimized API for recognizing texts of documents. The general-purpose API has both on-device and cloud-based models. Document text recognition is available only as a cloud-based model.

In this lesson, we will learn the general-purpose ML Kit API. This tutorial does not require you a prior knowledge or experience in Machine Learning. Before we start, have a look at what we are going build in the end:

1. First create a project in Android Studio and name it say, MLKitText.

2. Add a project in the firebase console. Give a name to the project(in my case it is "ML Kit Codelab") and add the package name of your app(in my case "com.hoineki.mlkittext"). Then download the google-services.json file and add it to the app directory of your project.

3. Add the following rules to your project-level build.gradle file, to include the google-services plugin and the Google's Maven repository:

buildscript {
    // ...
    dependencies {
        // ...
        classpath 'com.google.gms:google-services:4.0.1' // google-services plugin
    }
}

allprojects {
    // ...
    repositories {
        // ...
        google() // Google's Maven repository
    }
}

And in the app-level build.gradle file, implement the following libraries and add the apply plugin line at the bottom of the file to enable the Gradle plugin:

apply plugin: 'com.android.application'

android {
  // ...
}

dependencies {
  // ...
  implementation 'com.google.firebase:firebase-ml-vision:15.0.0'
  implementation 'com.google.firebase:firebase-core:16.0.0'
}
apply plugin: 'com.google.gms.google-services'

4. Put three images(with signboards or streetsigns) in the drawable folder. For quick setup you might directly use these images.

5. Define your activity_main.xml file as shown here

6. Use FirebaseVisionTextDetector object to load ondevice text recognition model to detect text in images.You can do this as follows:

FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);
FirebaseVisionTextDetector detector = FirebaseVision.getInstance().getVisionTextDetector();
detector.detectInImage(image).addOnSuccessListener(
    new OnSuccessListener() {
        @Override
        public void onSuccess(FirebaseVisionText texts) {
            mButton.setEnabled(true);
            //Process the texts object
            }
        }).addOnFailureListener(
            new OnFailureListener() {
                @Override
                public void onFailure(@NonNull Exception e) {
                    // Task failed with an exception
                    e.printStackTrace();
				}
            });

7. In order to obtain plain text as string from FirebaseVisionText object, you may use the following method:

private void processTextRecognitionResult(FirebaseVisionText texts) {
List blocks = texts.getBlocks();
if (blocks.size() == 0) {
    Toast.makeText(getApplicationContext(), "No text found", Toast.LENGTH_LONG).show();
     return;
}
String s="";
for (int i = 0; i < blocks.size(); i++) {
	List lines = blocks.get(i).getLines();
    for (int j = 0; j < lines.size(); j++) {
        List elements = lines.get(j).getElements();
        for (int k = 0; k < elements.size(); k++) {
            s +=elements.get(k).getText()+" ";
        }
    }
}
Toast.makeText(getApplicationContext(), s, Toast.LENGTH_LONG).show();
}

8. In order to use the cloud-based model to detect text in images, use FirebaseVisionCloudDocumentTextDetector object as follows:

FirebaseVisionCloudDetectorOptions options =
new FirebaseVisionCloudDetectorOptions.Builder()
	.setModelType(FirebaseVisionCloudDetectorOptions.LATEST_MODEL)
	.setMaxResults(15)
	.build();	
FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);
FirebaseVisionCloudDocumentTextDetector detector = FirebaseVision.getInstance()
	.getVisionCloudDocumentTextDetector(options);
detector.detectInImage(image).addOnSuccessListener(
	new OnSuccessListener() {
		@Override
		public void onSuccess(FirebaseVisionCloudText texts) {
			mCloudButton.setEnabled(true);
			// Process FirebaseVisionCloudText object. You can obtain the plain text by texts.getText() method.
		}
	}).addOnFailureListener(
	new OnFailureListener() {
		@Override
		public void onFailure(@NonNull Exception e) {
			// Task failed with an exception
			mCloudButton.setEnabled(true);
			e.printStackTrace();
		}
	});

9. Now the full code of your MainActivity.java file is as shown here

10. Now run your project. But before running make sure you have defined INTERNET permission and read/write to EXTERNAL STORAGE permissions in your AndroidManifest.xml file. You shoud see the following screen:

Screenshot text recognition first

Click the first FIND TEXT button to load the ondevice text recognition model. Then click the show text button to see the extracted text. You will find that the ondevice model is loaded and works but the cloud-based model is not loaded. This is because the project is not upgraded to Blaze plan and the Cloud Vision API is not enabled.

11.You can upgrade to the Blaze plan as follows:

  1. Open firebase console and select the project.
  2. At the bottom of the left panel, you will see the upgrade option. Click there and you should see the following options:

    Firebase plans
  3. Select the Blaze plan and do the necessary billing. And then your project is upgraded and you should see upgrade option disappear from the left panel.

In order to enable the Cloud Vision API, do the following:

  1. Open the Cloud Vision API in the Cloud Console API library.

  2. Select your project from the drowpdown menu at the top.

  3. Then Enable the API by clicking on the Enable option.

12. Now try loading the cloud-based text recognition model from your app by clicking on the FIND TEXT(cloud) button. It should work now. You shoud see that cloud-based model is more accurate and can detect multiple langauges.

13. Now that you have successfully loaded the cloud-based model, lets draw the detected texts in the image itself. For this, include TextGraphic.java, CloudTextGraphic.java and GraphicOverlay.java in your project. Then modify MainActivity.java like this and modify activity_main.xml like this.

14. Now run the project. You should see that the app is now completed exactly as shown in the video above. Note that in the MainActivity.java file, we have scaled down the input images before passing them to the detectors. This is done to improve the latency of the image processing tasks.

For quick set up, you may download the project directly from here or you may refer to this repo for all the source codes.

And thats it! You have just learnt how to use the ML Kit. This is the first tutorial of the ML Kit tutorial series. If you have any issue while running the project or setting it up, just leave a comment below.






Author:


Ratul Doley
Ratul Doley
Entrepreneur and AI researcher. Currently learning and working on Unsupervised learning and Data Clustering. Professional Android app developer and designer. Updated Nov 15, 2018