Thumbnail for Demystifying Data Science | Mr.Asitang Mishra | TEDxOakLawn by TEDx Talks

Demystifying Data Science | Mr.Asitang Mishra | TEDxOakLawn

TEDx Talks

15m 21s2,163 words~11 min read
Auto-Generated

[0:03]My name is Asitang Mishra, and I work with NASA Jet Propulsion Lab in Pasadena, California. They are the ones who've been repeatedly saving Matt Damon from his space adventures in The Martian and then in Interstellar. Well, if you've been meaning to ask me, I'm not a rocket scientist yet. And uh in fact, just the other day, my friend's little kid was telling me how an engine really works, and he was pretty perplexed because he thought I should be the one who would be doing most of the explaining. Um because I'm a data scientist, and you must have heard the buzz around the word. It's everywhere. Data is everywhere. It's becoming such a huge deal. Data science, um there's like artificial intelligence, machine learning. What else? Uh, cloud computing, data analytics, and the ever elusive term, data science. The one term to rule them all. Well, so what is data science? Well, there's no structured crash course or anything, let me tell you. And when people think of data science, they might think of computer programs doing a lot of cool intelligent work. And there are even articles about these really cool computer programs that have defeated the masters in their own game. Like this famous Chinese game called Go, and the data scientist is the master who holds the power to this dark art of data science. So, who is this data scientist? Well, a data scientist, I have no clue. Because it's a really new field, it's changing every second. But since I'm here, I'll try my best to tell you everything I know about it. So, data science became popular when a Forbes article declared it to be the sexiest job of 21st century. In 2012. And if you believe these people who've been trying really, really hard to define what a data scientist is since then, you'll hear things like, he's a person who's better at statistics than any software engineer and better at software engineering than any statistician. Or a data scientist is a unicorn that bridges math, algorithms, experimental design, engineering chops, communication, and management. And this is what I told my Uber driver when he asked me, so what do you do at NASA? I was like, uh, well, I, I'm, I'm a data scientist, not a rocket scientist or a geologist kind of scientist, more like a computer scientist, but not a software engineer, one who makes software with focus on data analysis, like a data analyst, but with a lot of data, but not always.

[3:09]So, did you guys get any of that? My Uber driver was pretty lost, and so I would say that data science or a data scientist has multiple skills. According to this famous Venn diagram, they are good with computers, mathematics, and a little bit of subject matter expertise as well. But I didn't include this last part that says, they aren't specialists in every aspect. So, whatever I just told you, we don't have to be responsible for any of those things. Right, this, this is pretty cool.

[3:51]So, um there's something else also. There are other things that people think about us. And, uh, they think we're really smart, and they also think we're magicians who can solve all problems if you just throw enough data at us. We're like data, data wizards. And so, it's good to know what the general perception or knowledge of data science and data scientist is, but nothing beats a good example. At JPL, data scientists do a lot of different things. One of the tools that my team has built, it detects anomalies on space hardware. Now, satellites have a lot of instruments on them. All of them generate massive amount of data, which if you visualize, it looks something like a human heart ECG. Now, if anything goes wrong in one of those instruments, you can clearly see it in these graphs. But for a human, and humanly it's not possible for someone to monitor this 24/7. So we've built algorithms that can automatically detect anomalies in these graphs, and then when they do that, they tell an operator, come look at it. So that making their lives much easier. Now good to know, and this is pretty cool. But what goes behind these cool products in data science, one would ask? So, it's called an algorithm. So, it takes in data and spits out predictions or results. Now, in plain English, it's logic. And a data scientist can do a few things in this process. They can write the logic itself.

[5:37]And, yeah, but we are really lazy people. We want minimum effort and maximum results, and who wouldn't want that? So, most of the times, we use a pre-written algorithm by choosing from a list of thousands of algorithms from a library of uh, of algorithms, by writing just one line of code. True story. And once we have these algorithms, all these um, algorithms, we, we spend some time finding the right ones for our data and our problem. Now that we have the best algorithm that suits us, we spend some time tuning it to get the best results. Now, this is a very core part of what we do as data scientists, but it's only 20% of what goes into data science for solving problems. And in fact, there are meta algorithms these days that can that are trying to also master the art of finding the right algorithm for you and then tuning it for you. Now, NASA JPL is also working on a project called DARPA D3M, that is trying to do just that. So, even this 20% of this iceberg of data science is melting. So what's in this 80%? We'll see. So as a data scientist, there's something that we do really well. We take a problem from one field, and we can or to say it more easily, we convert a computer problem. Sorry. We convert a human problem into a computer problem. That makes more sense. Now, something like this. So, and this process in data science still remains for most part human. Because as humans, we do it really well. We take a problem from one field, convert it into a problem in another field, and then solve it using the principles of this new field. Like this space telescope that NASA is building, to fold this huge structure inside a relatively smaller rocket, they're using the principles of origami. And these days, computers can also solve origami. But a data scientist is not going to be enjoying being this awesome central part of this equation for a very long time because there is yet another thing that we do really well. And that is we automate. So, at some point, we are going to automate ourselves and go out of business. So, I'm trying to say that data scientists, we solve human problems using say principles of mathematics. And that's why it's so important for us to communicate our ideas in a language that is not necessarily mathematics, but that is more human, like English.

[8:48]So, so one time I was pitching this idea to some clients and I was going on and on about it. And I was like, I want to build a crawler that's going to learn from its crawl graph based on the law of delayed returns, and I saw the faces of my clients and they were really blank. And I took a deep breath and I said, well, I want to build a tool, that is that can emulate how humans are, how they are so good at browsing the Internet, and they get better with practice. They bought it, and another time a client told me, Asitang, I don't like this product because it takes human input. It's not fully automated. I said, well, that's not the idea for this product at least. It is not there to replace humans, but it's there to make their job easier and better, help them make better decisions. Now, I'm not saying you need to sugarcoat things, but a data scientist needs to inform people how data science can help them best if not in the same manner in which they thought it would.

[9:58]So, we saw some human aspects of being a data scientist. There's something else that's called data care, or at least I call it data care. Don't quote me on that. So what is data care? It includes finding the data. A lot of times, all we have is data. Sorry. All we have is the problem. That would make the problem much easier if we all, we had is the data. So all we have is the problem, we have no data. So what we do is we try to find the right data that can solve the problem, or we try to merge data from different sources. In fact, the very reason why they sent and then saved Matt Damon in both these movies was to collect data. And once you have the data, you clean it. It's a time taking process but it's very important. My computer teacher used to say, Asitang, computers are GIGO, garbage in, garbage out. And similarly, algorithms are also GIGO. It, if you give it bad data, it'll give you bad predictions. Now comes understanding the data. What is this data about? What are the fields about, what are the columns about? Getting a general feel for the data is very important for us. So, this, this includes data care. Now that you have done all this hard work, it's time to show the people what you've done. I'm talking about storytelling. What if I told you that NASA has sent a satellite that it's and it's called SMAP, that's measuring soil moisture at really high accuracy from space? You will be a little interested. What if I showed you this picture? You'll be a little more interested. Now, what if I told you that it's going to save millions of lives by predicting flood, drought, and assisting in crop productivity? You'll be like, oh, that's really awesome. That matters to me. So, storytelling is very important in data science, but there's yet another thing that no university is going to ever teach you. And that is copying from other people's answers. Yeah. So, in data science, it's encouraged. It's also encouraged to show people your answers. We don't want to say, to solve the same problems, we want to solve new problems. And to share our work with the community in general, we do something called open sourcing of our code. Our code, free for everyone to use, modify, and share. In fact, one of the jobs of my team at JPL is to help other teams open source their code. Like this open source rover that looks and works almost like the one that is on Mars. You can do it yourself at home, and this one has a cuter face. And we also collaborate with a huge open source community called the Apache Foundation, that maintains over 300 open source projects in over 25 different programming languages. How cool is that? So, I would say data science is more about problem solving than anything else. And to prove my point, I would tell you what happened when I asked my colleagues, so what do you guys do? They said, I work with circuit boards, I manage projects, I research proposals, I do robotics, I work with sensors, I do random techy things, whatever I like. That was actually a guy who said that. And so, and they're all data scientists.

[14:01]So I would say that data scientist is someone who has a knack for problem solving and knows how to use computers for it. And with so much interest in computer these days, there will be a lot of people in the future that are like that. And in fact, you don't even need a computer science or a data science or mathematics or statistics degree to be a data scientist. And neither are a lot of these other people who I work with. They have degrees in physics, mechanical, economics, even psychology. These are all people who were solving problems using data and computers in their own field. And at some point realize that they can use this expertise to solve problems in general. According to a report by LinkedIn, data science roles have grown by 650% from 2012 to 2017. According to another report by IBM, there will be 2 million new jobs by 2020 for a data scientist. Now, if you think about the fact that there are so many free resources online to learn things related to data science, computer science, coding, programming, solving problems in general using computers, in essence, anyone could be a data scientist. Thank you.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript