ML Kit Tutorial: How to determine content of an image(Image Labeling)

ML Kit Tutorial: How to determine content of an image(Image Labeling)

One very cool feature of the Lose It app is Snap it. With this feature, if you want to enter or track a food item, all you have to do is take a snap and you are done. ML Kit's Image Labeling API can help you build such a feature without much effort. With this API, you can determine the content of an image(What's there in the image) without providing any additional contextual metadata. You can identify objects, locations, activities, animal species, products, and many more.

When you use the image labeling API, along with the entities that were detected, you also get with each entity a score that indicates the confidence the ML model has in its relevance. And with this information, you can perform tasks like automatic metadata generation and content moderation.

ML Kit's image labeling API is available to run both on-device and the cloud. On-device is completely free, has low latency and no network is required because everything runs on the phone. It supports roughly 400+ labels. But if you want something more powerful and something that would give you high accuracy results, you would use the cloud based API. It is free for the first 1,000 API calls per feature per month and paid after that. It supports roughly over 10,000+ labels.

Here, it may be helpful if we consider an example: take a look at the pic below:

Image Labeling Example

If we feed this image to the on-device API, we would get labels like fun, infrastructure, neon, person, sky. And if we feed it to the cloud based API, we would get ferris wheel, amusement park, night, outdoor, recreation, fair. Thus we see that the Cloud API gives more relevant and accurate information.

In this lesson, we are going to learn how to use both the on-device and the cloud API to determine the content of an image and perform image labeling. This tutorial does not require you prior knowledge or experience in Machine Learning. But you should be well familiar with Android Studio and its directory structures. If not, then you may refer Android Studio Project Overview.

Before we start, have a look at what we are going build in the end:

1. Follow steps 1 to 7 of ML Kit Tutorial: How to detect faces with ML Kit API and identify key facial features.

2. If you are going to use the cloud based API, you need to upgrade your project to Blaze plan and enable the Cloud Vision API. In order to do this, Follow step 11 of ML Kit Tutorial: How to recognize and extract text in images

3. Now the action part:

On-device labeling

The on-device image labeler returns at most 10 labels for an image by default. If you want to alter this setting, create a FirebaseVisionLabelDetectorOptions object and use this as follows:

FirebaseVisionLabelDetectorOptions options =
	new FirebaseVisionLabelDetectorOptions.Builder()
	.setConfidenceThreshold(0.8f)
	.build();

In order to find labels in an image, you need to instantiate FirebaseVisionLabelDetector. It is a detector for finding FirebaseVisionLabels in an image.

FirebaseVisionLabelDetector detector = FirebaseVision.getInstance()
    .getVisionLabelDetector();
// Or, to set the minimum confidence required:
FirebaseVisionLabelDetector detector = FirebaseVision.getInstance()
    .getVisionLabelDetector(options);

Now you can pass the image to the detectInImage method as follows:

Task> result =
	detector.detectInImage(image)
	.addOnSuccessListener(
		new OnSuccessListener>() {
			@Override
			public void onSuccess(List labels) {
				// Task completed successfully
				// ...
			}
		})
	.addOnFailureListener(
		new OnFailureListener() {
			@Override
			public void onFailure(@NonNull Exception e) {
				// Task failed with an exception
				// ...
			}
		});

All these tasks are defined in ImageLabelingProcessor.java. You can include this file directly in main java package for quick setup. For convenience, you may put this in a separate package folder(in my case "imagelabeling").

If the image labeling operation succeeds, a list of FirebaseVisionLabel objects will be passed to the success listener. Each FirebaseVisionLabel object represents a label in the image. For each label, you can get the label's text description, its Knowledge Graph entity ID (if available), and the confidence score of the match as follows:

for (FirebaseVisionLabel label: labels) {
    String text = label.getLabel();
    String entityId = label.getEntityId();
    float confidence = label.getConfidence();
}

This task along with methods for rendering a label within an associated graphic overlay view are defined in the LabelGraphic.java. You can include this file in the same package folder as ImageLabelingProcessor.java.

Cloud based image labeling

Cloud based image labeling has the same codes as the On-device labeling, just a change in the name of the base classes with "Cloud" included.

The Cloud detector uses the STABLE version of the model and returns up to 10 results by default. You can change these settings using FirebaseVisionCloudDetectorOptions object as follows:

FirebaseVisionCloudDetectorOptions options =
  new FirebaseVisionCloudDetectorOptions.Builder()
	.setModelType(FirebaseVisionCloudDetectorOptions.LATEST_MODEL)
	.setMaxResults(15)
	.build();

The FirebaseVisionLabelDetector is now FirebaseVisionCloudLabelDetector and can be instantiated as follows:

FirebaseVisionCloudLabelDetector detector = FirebaseVision.getInstance()
        .getVisionCloudLabelDetector();
// Or, to change the default settings:
// FirebaseVisionCloudLabelDetector detector = FirebaseVision.getInstance()
//         .getVisionCloudLabelDetector(options);

And finally you can pass the image to the detectInImage method as follows:

Task> result =
	detector.detectInImage(image)
	.addOnSuccessListener(
		new OnSuccessListener>() {
			@Override
			public void onSuccess(List labels) {
				// Task completed successfully
				// ...
			}
		})
	.addOnFailureListener(
		new OnFailureListener() {
			@Override
			public void onFailure(@NonNull Exception e) {
				// Task failed with an exception
				// ...
			}
		});

All these tasks are defined in CloudImageLabelingProcessor.java. You may put this file in a separate package folder named "cloudimagelabeling".

Now here the label objects returned on successful labeling operation will be FirebaseVisionCloudLabel objects. You can get the label's text description, its Knowledge Graph entity ID (if available), and the confidence score of the match as follows:

for (FirebaseVisionCloudLabel label: labels) {
    String text = label.getLabel();
    String entityId = label.getEntityId();
    float confidence = label.getConfidence();
}

This task along with methods for label rendering are defined in CloudLabelGraphic.java. You may include this file in the same package folder as CloudImageLabelingProcessor.java.

5. Now we can use ImageLabelingProcessor.java and CloudImageLabelingProcessor.java in MainActivity.java like this. This is the full and final code of MainActivity.java. In MainActivity.java, we have defined a spinner to switch between the on-device and the cloud labeler. So we need a spinner in the activity_main.xml

6. Now run the project. You should see that the app is now completed exactly as shown in the video above.

For quick set up, you may download the project directly from here or you may refer to this repo for all the source codes.

And thats it! You have just learnt how to use the ML Kit's image labeling API to determine content of an image. This is the third tutorial of the ML Kit Tutorial Series. If you have any issue while running the project or setting it up, just leave a comment below.






Author:


Ratul Doley
Ratul Doley
Entrepreneur and AI researcher. Currently learning and working on Unsupervised learning and Data Clustering. Professional Android app developer and designer. Updated Nov 15, 2018