Back in May we released MediaPipe Solutions, a set of tools for no-code and low-code solutions to common on-device machine learning tasks, for Android, web, and Python. Today we’re happy to announce that the initial version of the iOS SDK, plus an update for the Python SDK to support the Raspberry Pi, are available. These include support for audio classification, face landmark detection, and various natural language processing tasks. Let’s take a look at how you can use these tools for the new platforms.
Aside from setting up your Raspberry Pi hardware with a camera, you can start by installing the MediaPipe dependency, along with OpenCV and NumPy if you don’t have them already.
python -m pip install mediapipe
From there you can create a new Python file and add your imports to the top.
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import numpy as np
You will also want to make sure you have an object detection model stored locally on your Raspberry Pi. For your convenience, we’ve provided a default model, EfficientDet-Lite0, that you can retrieve with the following command.
wget -q -O efficientdet.tflite -q https://storage.googleapis.com/mediapipe-models/object_detector/efficientdet_lite0/int8/1/efficientdet_lite0.tflite
Once you have your model downloaded, you can start creating your new ObjectDetector, including some customizations, like the max results that you want to receive, or the confidence threshold that must be exceeded before a result can be returned.
# Initialize the object detection model
base_options = python.BaseOptions(model_asset_path=model)
options = vision.ObjectDetectorOptions(
base_options=base_options,
running_mode=vision.RunningMode.LIVE_STREAM,
max_results=max_results, score_threshold=score_threshold,
result_callback=save_result)
detector = vision.ObjectDetector.create_from_options(options)
After creating the ObjectDetector, you will need to open the Raspberry Pi camera to read the continuous frames. There are a few preprocessing steps that will be omitted here, but are available in our sample on GitHub.
Within that loop you can convert the processed camera image into a new MediaPipe.Image, then run detection on that new MediaPipe.Image before displaying the results that are received in an associated listener.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_image)
detector.detect_async(mp_image, time.time_ns())
Once you draw out those results and detected bounding boxes, you should be able to see something like this:
You can find the complete Raspberry Pi example shown above on GitHub, or see the official documentation here.
While text classification is one of the more direct examples, the core ideas will still apply to the rest of the available iOS Tasks. Similar to the Raspberry Pi, you’ll start by creating a new MediaPipe Tasks object, which in this case is a TextClassifier.
var textClassifier: TextClassifier?
textClassifier = TextClassifier(modelPath: model.modelPath)
Now that you have your TextClassifier
, you just need to pass a String
to it to get a TextClassifierResult
.
func classify(text: String) -> TextClassifierResult? {
guard let textClassifier = textClassifier else {
return nil
}
return try? textClassifier.classify(text: text)
}
You can do this from elsewhere in your app, such as a ViewController DispatchQueue, before displaying the results.
let result = self?.textClassifier.classify(text: inputText)
let categories = result?.classificationResult.classifications.first?.categories?? []
You can find the rest of the code for this project on GitHub, as well as see the full documentation on developers.google.com/mediapipe.
To learn more, watch our I/O 2023 sessions: Easy on-device ML with MediaPipe, Supercharge your web app with machine learning and MediaPipe, and What's new in machine learning, and check out the official documentation over on developers.google.com/mediapipe.
We look forward to all the exciting things you make, so be sure to share them with @googledevs and your developer communities!