If you're reading this blog, then you're probably interested in creating a custom machine learning (ML) model. I recently went through the process myself, creating a custom dog detector to go with a Codelab, Create a custom object detection web app with MediaPipe. Like any new coding task, the process took some trial and error to figure out what I was doing along the way. To minimize the error part of your "trial and error" experience, I'm happy to share five takeaways from my model training experience with you.
Preparing your data for training will look different depending on the type of model you're customizing. In general, there is a step for sourcing data and a step for annotating data.
Finding enough data points that best represent your use case can be a challenge. For one, you want to make sure you have the right to use any images or text you include in your data. Check the licensing for your data before training. One way to resolve this is to provide your own data. I just so happen to have hundreds of photos of my dogs, so choosing them for my object detector was a no-brainer. You can also look for existing datasets on Kaggle. There are so many options on Kaggle covering a wide range of use cases. If you're lucky, you'll find an existing dataset that serves your needs and it might even already have annotations!
MediaPipe Model Maker accepts data where each input has a corresponding XML file listing its annotations. For example:
There are several software programs that can help with annotation. This is especially useful when you need to highlight specific areas in images. Some software programs are designed to enable collaboration–an intuitive UI and instructions for annotators mean you can enlist the help of others. A common open source option is Label Studio, which is what I used to annotate my images.
So expect this step to take a long time, but keep in mind that it will take longer than you expect.
If you're anything like me, you have a wonderfully grand idea planned for your first custom model. My dog Ben was the inspiration for my first model. He came from a local golden retriever rescue, but when I did a DNA test, it turned out that he's 0% golden retriever! My first idea was to create a golden retriever detector – a solution that could tell you if a dog was a "golden retriever" or "not golden retriever". I thought it could be fun to see what the model thought of Ben, but I quickly realized that I would have to source a lot more images of dogs than I had so I could run the model on other dogs as well. And, I'd have to make sure that it could accurately identify golden retrievers of all shades. After hours into this endeavor I realized I needed to simplify. That's when I decided to try building a solution for just my three dogs. I had plenty of photos to choose from, so I picked the ones that best showed the dogs in detail. This was a much more successful solution, and a great proof of concept for my golden retriever model because I refuse to abandon that idea.
Here are a few ways to simplify your first custom model:
So when it comes to your first ML training experience, remember to simplify, simplify, simplify.
Simplify.
Simplify.
As much as I'd like to confidently say you'll get the right results from your model the first time you train, it probably won't happen. Taking your time with choosing data samples and annotation will definitely improve your success rate, but there are so many factors that can change how the model behaves. You might find that you need to start with a different model architecture to reach your desired accuracy. Or, you might try a different split of training and validation data. You might need to add more samples to your dataset. Fortunately, transfer learning with MediaPipe Model Maker generally takes several minutes, so you can turn around new iterations fairly quickly.
When you finish training a model, you're probably going to be very excited and eager to add it to your app. However, I encourage you to first try out your model in MediaPipe Studio for a couple of reasons:
With MediaPipe Studio, I was able to quickly try out different score thresholds on various images to determine what threshold I should use in my app. I also eliminated my own web app as a factor in this performance.
After sourcing quality data, simplifying your use case, training, and prototyping, you might find that you need to repeat the cycle to get the right result. When that happens, choose just one part of the process to change, and make a small change. In my case, many photos of my dogs were taken on the same blue couch. If the model started picking up on this couch since it's often inside the bounding box, that could be affecting how it categorized images where the dogs aren't on the couch. Rather than throwing out all the couch photos, I removed just a couple and added about 10 more of each dog where they aren't on the couch. This greatly improved my results. If you try to make a big change right away, you might end up introducing new issues rather than resolving them.
With these tips in mind, it's time for you to customize your own ML solution! You can customize your image classification, gesture recognition, text classification, or object detection model to use in MediaPipe Tasks.
If you’d like to share some learnings from training your first model, post the details on LinkedIn along with a link to this blog post, and then tag me. I can't wait to see what you learn and what you build!