[0:00]Hello, everybody, and welcome to the AI chat show. So today we have a very special guest. He's a professor in electrical engineering and computer science at MIT and a member of CSA, he's also a principal investigator at the Computer Science and Artificial Intelligence Lab at MIT. And the research interests of his group include machine learning systems for autonomous applications. Today, our guest will talk about the tiny ML platform and inference on it. So let's welcome Professor Dr. Joel Emer. Thank you. Hello, everyone. I hope you can all hear me well. So I'm going to share my screen. So today I'm going to talk about a recent trend, which is TinyML, or how to run machine learning models on very tiny computing devices. So I will start with giving you a brief introduction about why TinyML, what is TinyML, then I will explain where it is used, and then I will try to dive a little bit into the technical details on how to build TinyML applications, and also how to deploy them. And I will conclude with giving you some future directions and also some recent advances and open research questions. So before starting, I would like to give a brief introduction about myself. So I am a research scientist at MIT, CSAIL, so I am part of the group of Professor Emer. And before joining MIT, I was a PhD student at EPFL, where I was working on machine learning for medical applications. So I'm going to start with a question. So how many of you think that you have an AI device in your pocket? So when I ask this question in an in-person event, usually I can see that 80 to 90% of the people raise their hands. And actually, this is true because we use AI every day when we use our smartphones. And the reason is because our smartphones come with a lot of sensors that collect data from the environment and then process this data to provide us with smart features and enhanced user experience. But the smartphones are not the only devices that we use every day that contain AI. So for instance, when we drive our cars, we actually drive an AI device. When we wear our smartwatches, also we are wearing an AI device because our smartwatches are continuously collecting data from our body and from the environment, and they're inferring a lot of health metrics and fitness metrics. So we are surrounded by a lot of AI devices. But what about the tiny devices? And actually, when I say tiny devices, I mean devices that are as tiny as a grain of sand, and that can work on a tiny battery for months or even years. And these devices are usually called IoT devices or embedded devices. And in some applications, we might need AI to be working on these tiny devices. For instance, let's take the example of a smart home or a smart office. In a smart office, we have a lot of sensors that are deployed everywhere, so they are deployed to monitor the environment to detect motion. For instance, if there is a person passing by in front of a camera, so they can trigger an event or they can make an action. So if you want these tiny devices to be smart enough and to extract information from the environment and to act autonomously, we need to embed AI into these devices. And this is where TinyML comes into play. So TinyML is basically running machine learning workloads, either training or inference, but mostly inference on very tiny, low-power microcontrollers. So why is TinyML a big deal now? And the reason is because with the rise of machine learning, especially with deep learning, we have seen an incredible accuracy for many tasks. And because of that, there is a lot of interest in embedding machine learning everywhere, even in tiny devices. Also, the cost of devices is decreasing tremendously, so we can afford to have a lot of devices deployed everywhere. Also, another reason is because a lot of advances have been made in embedded hardware, so now we have microcontrollers that have hardware accelerators, so they are powerful enough to run some machine learning workloads on them. And finally, because of privacy concerns, so with TinyML, we want to bring the computation closer to the data source. So instead of sending the data to the cloud, we process the data at the edge, and then we extract information, and then we send to the cloud only insights. So how TinyML is different from embedded AI? So TinyML is actually a subset of embedded AI. So embedded AI is about running AI on embedded devices such as GPUs, FPGAs, or some powerful microcontrollers, and these devices usually are Linux-based, and they have at least 100 megabytes of RAM. On the other hand, TinyML is about running AI on very tiny devices. So these devices are usually microcontroller-based, and they have only few kilobytes of RAM and also few kilobytes of storage. So you can see that the computational and the memory resources on these tiny devices are very constrained compared to the embedded devices. So why should we care about TinyML? So the first reason is about always-on applications. So if you want to have an always-on application that detects, for instance, a keyword or a wake word, or it detects an anomaly in a machine, we need to have a very tiny device that consumes low power and that can be working for months or even years on a tiny battery. And the main application for this is keyword spotting. So for instance, when you talk to your smart device and you say, Hey, Google, or Hey, Siri, there is a tiny ML model that is listening to you, and that detects your keyword. And once it detects the keyword, then it sends the rest of the voice to a bigger device that can do further processing. The second reason is about privacy. So as I mentioned before, we want to bring the computation closer to the data source. So instead of sending your private data, for instance, your voice to the cloud, we just process it locally, and then we only send insights to the cloud. For instance, if you are detecting a keyword, you just send the flag that says that the keyword has been detected, but you don't send the voice itself. The third reason is about latency. So in some applications, for instance, in medical applications or in autonomous driving, we need to have decisions made very quickly. So we cannot wait for the data to be sent to the cloud, then processed, and then get the results back. So we need to have all the processing done locally on the device. And the fourth reason is about energy and cost. So these devices, as I mentioned, they are very tiny, so they are very cheap. So you can deploy them everywhere, and they consume low power, so you don't need to recharge them every time. So they can work for months or even years on a tiny battery. So what are the key challenges in TinyML? So the main challenge is the fact that these devices have very limited computing resources. So they have very little memory, very little storage, and they have very low clock frequency. Also, the fact that they consume very low power is another challenge because we need to optimize our models to be very energy efficient. So where is TinyML used? So TinyML is being deployed in many fields and in many applications. For instance, it is deployed in industries for predictive maintenance, anomaly detection. In the health sector, it is used for continuous health monitoring. It is also used in smart homes, for instance, for gesture recognition, or for keyword spotting. It is also used in agriculture for crop monitoring, or for soil analysis. It is used in smart cities for traffic monitoring, or for smart parking. And it is also used in defense, for instance, for surveillance or for object detection. So I will give you some examples of TinyML applications. So one example is the people detection sensor. So this sensor is a TinyML device that has a tiny camera, and it detects if there is a person in a room or not. And it can work on a tiny battery for months or even years. Another example is the visual wake word. So this is similar to the audio keyword spotting, but it is for images. So for instance, if you have a camera and you want to detect if there is a person in front of the camera, so the device can detect that, and then it can trigger an event or an action. Another example is the gesture recognition for drones. So instead of using a remote control to control your drone, you can just use your hands to control it. So you can make gestures with your hand, and then the drone can recognize these gestures and then act accordingly. Another example is the embedded health monitoring. So this is a device that is embedded in a shoe, and it collects data from your foot, and it can detect if you are about to fall or not. So it can alert you before you fall. Another example is the smart agriculture. So this is a device that is embedded in the soil, and it collects data about the soil, and it can detect if there is a disease in the crop or not. And it can also detect if the crop needs water or not. So how to build TinyML applications? So the first step is to collect data. And usually, we collect data from the sensors that are available on the device. And after collecting the data, we need to label this data. And this is a very important step because the quality of your model depends on the quality of your data. So after labeling the data, we need to pre-process the data. So for instance, if you have an audio signal, you need to extract features from it, for instance, MFCCs or spectrograms. And then after that, we need to design our model. So we need to choose the architecture of our neural network, and we need to choose the number of layers, the number of neurons, and so on. And then after that, we need to train our model. And usually, we train our model on a powerful GPU or a powerful server, because these models are usually very big and they require a lot of computation. And then after training, we need to evaluate our model. And if the accuracy is not good enough, we need to go back and iterate. So we need to collect more data, or we need to change the architecture of our model, and so on. And then after that, we need to optimize our model. So this is a very important step in TinyML because we need to make our model as small as possible and as energy efficient as possible. And I will explain in the next slides how to optimize our models. And then finally, we need to deploy our model on the device. So this is the overall workflow for building TinyML applications. So I will try to dive a little bit into the technical details on how to optimize our models. So there are many techniques to optimize our models, and the first one is called quantization. So quantization is about reducing the precision of the weights and the activations of our neural network. So usually, we train our models with 32-bit floating-point numbers. And with quantization, we try to reduce this precision to 8-bit integers, or even 4-bit integers, or even 1-bit integers. And the reason is because with lower precision, we can store more weights in the same memory footprint. And also, the operations with lower precision are much faster and more energy efficient. The second technique is called pruning. So pruning is about removing the redundant connections or the redundant neurons in our neural network. And the reason is because neural networks are usually over-parameterized, so they have a lot of redundant connections that do not contribute much to the accuracy. So we can remove them without affecting the accuracy much. The third technique is called knowledge distillation. So knowledge distillation is about training a small model, which is called the student model, to mimic the behavior of a large model, which is called the teacher model. And the reason is because the large model usually has very high accuracy, but it is too big to be deployed on a tiny device. So we train a small model to mimic its behavior, so we can deploy the small model on the tiny device. And the fourth technique is called neural architecture search. So neural architecture search is about automatically finding the best neural network architecture for a given task and for a given device. And the reason is because designing neural network architectures manually is a very time-consuming and error-prone process. So we can use neural architecture search to automate this process. So these are the main techniques for optimizing our models for TinyML. So how to deploy our models on the device? So there are many frameworks that allow us to deploy our models on the device, and the most popular one is TensorFlow Lite for Microcontrollers. So TensorFlow Lite for Microcontrollers is a framework that is developed by Google, and it allows us to convert our TensorFlow models into a very compact format that can be deployed on microcontrollers. Another framework is PyTorch Mobile, which is developed by Facebook, and it is similar to TensorFlow Lite, but it is for PyTorch models. Another framework is Edge Impulse, which is a platform that allows us to build, train, and deploy TinyML models on various devices. And the last framework is TinyML, which is an open-source framework that is developed by the TinyML Foundation, and it provides a set of tools and libraries for building TinyML applications. So these are the main frameworks for deploying our models on the device. So I will give you a brief overview about the TinyML ecosystem. So the TinyML ecosystem is composed of many components. So we have the hardware, which is the microcontrollers, and we have the software, which is the frameworks that I mentioned. And we also have the data, which is the datasets that we use for training our models. And we also have the community, which is a very active community of researchers and developers who are working on TinyML. So the TinyML ecosystem is growing very fast, and it is becoming more and more mature. So I will conclude with giving you some future directions and also some recent advances and open research questions. So the first future direction is about on-device learning. So currently, we train our models on powerful servers, and then we deploy them on the device. But with on-device learning, we want to train our models directly on the device, so we don't need to send the data to the cloud. And this is a very challenging task because the devices have very limited computing resources. The second future direction is about federated learning. So federated learning is about training a global model on multiple devices without sending the data to a central server. So each device trains its own local model, and then it sends only the model updates to a central server, which then aggregates all the updates and creates a global model. And this is a very promising direction because it addresses the privacy concerns and also the latency concerns. The third future direction is about explainable AI. So currently, neural networks are usually black boxes, so we don't know how they make decisions. But with explainable AI, we want to understand how the models make decisions, so we can trust them more and also debug them more easily. And this is a very important direction, especially in critical applications like medical applications or autonomous driving. The fourth future direction is about hardware-aware neural architecture search. So currently, neural architecture search is usually focused on finding the best architecture for a given task, but it doesn't take into account the hardware constraints. But with hardware-aware neural architecture search, we want to find the best architecture that is optimized for both the task and the hardware. And this is a very important direction for TinyML because we have very limited hardware resources. So these are the main future directions and open research questions in TinyML. So I will summarize my talk with a few key takeaways. So the first key takeaway is that TinyML is a new and exciting field that is enabling AI to be deployed on very tiny, low-power devices. The second key takeaway is that TinyML is being deployed in many applications and in many fields, and it is growing very fast. The third key takeaway is that there are many challenges in TinyML, especially the limited computing resources of the devices. And the fourth key takeaway is that there are many techniques to optimize our models for TinyML, such as quantization, pruning, knowledge distillation, and neural architecture search. And finally, the fifth key takeaway is that there are many frameworks that allow us to deploy our models on the device, such as TensorFlow Lite for Microcontrollers, PyTorch Mobile, Edge Impulse, and TinyML. So I would like to thank you for your attention, and I'm open to any questions. Thank you, Professor Emer, for this insightful presentation. It was really interesting to learn about TinyML and its applications. So we have a few questions from the audience. The first question is, what are the main differences between TinyML and edge computing? So TinyML is actually a subset of edge computing. So edge computing is about bringing the computation closer to the data source, so it can be on embedded devices, it can be on powerful microcontrollers, or it can be on tiny microcontrollers. But TinyML is specifically about running AI on very tiny, low-power microcontrollers. So the main difference is the scale of the devices. So edge computing can be on bigger devices, but TinyML is only on very tiny devices. Okay, thank you for clarifying that. The next question is, what are the challenges in deploying TinyML models in real-world applications? So the main challenges are related to the limited resources of the devices. So we need to make sure that our models are very small and very energy efficient. Also, another challenge is the data collection. So in real-world applications, it is sometimes difficult to collect enough labeled data to train our models. And finally, another challenge is the security and privacy concerns. So we need to make sure that our models are secure and that they don't leak any private information. Okay, thank you. And the last question is, what are the ethical implications of TinyML? So this is a very important question. So with TinyML, we are deploying AI everywhere, so we need to make sure that we are not creating any bias in our models. So we need to make sure that our models are fair and that they don't discriminate against any group of people. Also, another ethical implication is about the job displacement. So with AI, we are automating a lot of tasks, so we need to make sure that we are not displacing jobs. And finally, another ethical implication is about the accountability. So if an AI model makes a mistake, who is responsible for that? So we need to make sure that we have clear accountability for our AI models. Thank you, Professor Emer, for answering these questions. It was a pleasure having you on the show. Thank you for having me. Thank you, everyone, for watching. And we will see you in the next episode.

A Year of Change in U.S.-South Korea Relations
Korea Economic Institute of America
27m 47s3,351 words~17 min read
AI audio transcription
Transcript source
AI audio transcription
This transcript was generated from the video's audio because no usable YouTube caption track was available. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.
Pull quotes
[0:00]He's a professor in electrical engineering and computer science at MIT and a member of CSA, he's also a principal investigator at the Computer Science and Artificial Intelligence Lab at MIT.
[0:00]And the research interests of his group include machine learning systems for autonomous applications.
[0:00]So today I'm going to talk about a recent trend, which is TinyML, or how to run machine learning models on very tiny computing devices.
[0:00]And I will conclude with giving you some future directions and also some recent advances and open research questions.
Use this transcript
Related transcript hubs
Watch on YouTube
Share
MORE TRANSCRIPTS


