MediaPipe is an open-source perception pipeline framework introduced by Google, which helps to build multi-modal machine learning pipelines. A developer can build a prototype, without really getting into writing machine learning algorithms and models, by using existing components. This framework can be used for various vision & media processing applications (especially in VR) such as Object Detection, Face Detection, Hand Tacking, Multi-hand Tracking and Hair Segmentation. MediaPipe supports various hardware and operating system platforms such as Android, iOS & Linux by offering API’s in C++, Java, Objective-c, etc. And this framework also capable of utilizing GPU resources.
MediaPipe Components
The framework is comprised of three major components
-
- A framework for inference from the pipeline data
- Tools for evaluation
- And a collection of reusable inference and processing components
It follows the approach of Graph-based frameworks in OpenCV and all processing happens with the context of the Graph. The Graph contains a collection of nodes and each node is implemented as a Calculator. Graph is configurable using GraphConfig buffer and then running using a graph object. In Graph, Calculators are connected to each other using a data stream and each stream contains time series of data Packets. These calculators and streams define the dataflow in the graph.
Use Cases-Object Detection
Mediapipe comes with ready to use models, where the developers can start using it directly or with their specified modifications. You can find all the sample models in the source tree. Objection detection can be handled very easily without consuming much system resources. ML-based object detection from a live feed camera with the frame rate of 30 fps usually consumes high resources and is not feasible due to long inference time. MediaPipe achieves this by running, tracking and detection in parallel, so each process will never be blocked by another.
With reference to the above diagram, the objection detection happens in two separate process – a slow process for detection and a fast process for tracking. And as configured in Pipeline’s graph configuration, calculators run parallel threads to execute the process.
How to Install MediaPipe
Mediapipe supports various operating systems including Debian, Ubuntu, Centos and Android, and IOS. It also supports the installation using Docker. Mediapipe can run with OpenCV (3.x and above) and Tensorflow. Below are the instructions on how to install MediaPipe in Ubuntu operating system.
Get the latest source from the github repo.
1 2 |
$ git clone https://github.com/google/mediapipe.git $ cd mediapipe |
Run setup_opencv.sh from the source tree to automatically build OpenCV along with FFmpeg from source and modify MediaPipe’s OpenCV config.
1 |
sh$ setup_opencv.sh |
To test the environment, you may can run Hello World applications, as mentioned at MediaPipe Examples
References
- MediaPipe Source tree Github repo
- Google Developer Blog
- https://sites.google.com/view/perception-cv4arvr/mediapipe
- https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html
An Entrepreneur, an Opensource Enthusiast and Researcher in the domain of Embedded Systems, Wireless and IoT – Has over 16+ years of experience in managing and contributing enterprise Research Projects, in Embedded Systems, Software Technologies, Product Conceptualizations and development, Telecommunication, Media and Entertainment and Consumer Electronics.
Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress