Thumbnail for Where business and data science meet: Interview with BCG Data Scientist by GravityAI

Where business and data science meet: Interview with BCG Data Scientist

GravityAI

41m 10s7,659 words~39 min read
AI audio transcription
Transcript source

AI audio transcription

This transcript was generated from the video's audio because no usable YouTube caption track was available. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:00]Hey, this is Daniel Huss, CEO of Gravity AI, and in today's interview, we'll be talking with Medi Salmani, who's a lead data scientist at Boston Consulting Group's Digital Ventures team.
[0:00]Um, as you know, we're exploring the intersection of data science and the business world, and how those two communicate and work together.
[0:00]Um, so I kind of want to jump right in with the first question, um, which is, uh, what is one thing that you wish that the executives, the business side of the world knew or understood better, um, about data science?
[0:00]Uh, if you don't mind, I will break down my answer to probably two, three things.
Use this transcript
Related transcript hubs

[0:00]Hey, this is Daniel Huss, CEO of Gravity AI, and in today's interview, we'll be talking with Medi Salmani, who's a lead data scientist at Boston Consulting Group's Digital Ventures team. as part of our ongoing series that connects business with data science. Take a listen. Medy, uh, thank you so much, uh, for taking the time today to talk with us. My pleasure. Um, as you know, we're exploring the intersection of data science and the business world, and how those two communicate and work together. Um, so I kind of want to jump right in with the first question, um, which is, uh, what is one thing that you wish that the executives, the business side of the world knew or understood better, um, about data science? Yeah. Uh, if you don't mind, I will break down my answer to probably two, three things. Yeah. Please. Uh, the first thing is I would suggest executives to learn the very basics about AI and, uh, machine learning in general. It might take them 5 minutes, 10 minutes to just understand the paradigm, the AI paradigm and machine learning, how it works, you know? And then that will help them to gauge the ideas out there, or the ideas that they have, people are coming to them, is it feasible or not? Uh, the second part would be that they should not expect what people are going to do in data science. It's kind of like science fiction movies or, um, Chinese propaganda videos that they see. They can track people in a few seconds where you are. These kind of things, uh, it it it creates a huge, uh, like, uh, unrealistic expectations in business, you know? If they understand, hey, these are the risks, these are the, uh, power of AI machine learning that can help them, empowers their business, that will be very helpful for them. You know, it's funny you say that. I think, you know, you've seen so much in the movies that has painted this picture of where this is going, um, that sometimes you have, you know, I sometimes call it like the shiny object syndrome, where, you know, I remember when, uh, you know, back in design when minority report came out, everyone wants the the hand controlled things, you know, now we're starting to see that in the world of data science. Yeah. Um, what do you think, uh, have you seen any, uh, examples in like movies that you can think off the top of your head where, like, this is actually come to fruition as it as, you're like, okay, this is close to reality. Yeah. There are a lot of things going on in Black Mirror, probably you've seen that some of them are fairly realistic. Sure. And the way that they made that movie, a lot of a lot of episodes there, uh, you can think of that might happen in the near future or long. Sure. Yeah. Uh, I think that Black Mirror is probably the best sample for AI in general. And they are, they have a lot of, I don't remember exactly the title of the episode, but I've seen multiple episodes there, which is, uh, it's pretty exciting, you know? Maybe I can do that today too. No, I might be able to do that. Yeah, but it's not real, you know? So, um, uh, I very much agree too with your point about, um, you know, from, uh, an executive standpoint, often the business world does have that kind of movie view. And obviously, there's a lot of nomenclature around machine learning and AI. Um, what, uh, can you think of any good resources? Or where where should an an executive start, do you think, like, to just start grasping some of this, um, uh, and the differences between, you know, what's possible and what's not? Yeah, uh, there are, there, I believe there are courses on Coursera, like very short courses for executives, designed for executives, or LinkedIn Learning, or I'm sure in like bigger organizations, they have data science teams. Yeah. Just sit down with one of the people who have some understanding also about, uh, business to explain for them for 10 minutes, or maybe half an hour, they can explain for them what AI machine learning and what it can do for them. Right, right. You know? That's probably the, I would start, and also HBR, uh, Harvard Business Review has multiple articles around in that space. It can be very helpful too. Yeah, those are great. Um, all right, so let's do the flip side of that question now, like, so, uh, uh, if you're an executive, right? So, um, obviously they have certain goals that they're trying to achieve. Um, what's one thing that you imagine that they wish data scientists knew about their side of their business? Uh, I would say probably the most important thing is that business people, they are looking for impact. Yeah. A lot of time, the dollar amount next to that. Yeah, yeah. On the data science side, especially if they are PhDs coming out of quantitative fields, and what they want to do or what we want to do is doing some cool stuff. Yeah. We may not necessarily care about KPIs that we have to meet. Yeah, yeah. We don't care about the impacts at the end of the time, or the execution time, you know? And another thing I found that it's very interesting for usually business leaders, too, they want to see is the risks, uh, which are coming with these type of systems. Sure. Especially, they are very complex models. And if they are going to replace that, what are the risks around it? And data scientists if they can explain that elaborate that, hey, these are the risks, these are the limitations of the models systems that we are building, which, it it can help business leaders to think and decide better. Well, um, could we maybe elaborate just a little bit more on that? Because I think that's a really good point. And, uh, you know, I have an example in my own head of of of an instance of that. But could you, um, maybe talk a little bit more about some of the types of risks that you see that are associated, um, uh, with building models and how they relate back to that kind of business function and things that like up and coming data scientists should start to become aware of? Yeah, uh, sometimes that you have, generally your machine learning models, uh, might be limited to the range of the data it has seen. Yeah. And when it's go to an unseen range of the data, not unseen data, like outside of the range of the data it has seen before, you know? Imagine that you are a ride sharing company, and you have your own pricing, the data for your pricing for just LA area. And if you have a request in Santa Barbara, what is your machine system is going to do? You have to be careful about that. Yeah. You know? That's the limitation when you are building the model, you have provided a lot of data for LA area. Yeah. If it goes outside of that, it might mess up, unless that the data scientist or the whole system design team, they understand, hey, these are the risks, and they are mitigating it. Yeah. They have to make sure that they are going to see these kind of, uh, issues. Yeah, yeah. And sometimes it's it needed to be communicated with the business team. I I think that's, um, really interesting. I've had an experience where sometimes machine learning or models are viewed as like almost binary, it works or it doesn't work, and that's not the case, right? Like, there there's a range of possibilities here. Exactly. Um, how how have you, uh, seen effective ways for teams to kind of communicate that out, um, and and maybe what are the right questions that executives should be asking? Yeah. I believe that these two questions, what are the, uh, risks of the this model? Okay. That, hey, you are coming up with this demand forecast for me. Sure. How confident are you there? Yeah. You know? And it's good always to do some fact checking, you know? You are going to provide the data to a data science team, consulting team. Yeah. Sometimes keep a little bit of data for your own sake to check it later with their model's output, you know? And finding out what is the confidence level helps you to make the decision much more informed decision. Yeah. That's under risks, under also under limitation, it's good to know, hey, this is the limitation. We are not going to abuse the system or misuse the system for other things. And and when you say confidence level, Yeah. Um, you know, that's literally translating into a percentage, right?

[8:04]Like, um, accuracy will be a bit different. Different, yeah. Accuracy is the KPIs and metrics that you are going to look at that, uh, but the confidence is, uh, sometimes that, as I said, in demand forecasting, for example. We come up with some demands for this specific company, for the logistics and, uh, like we are going to expect to sell 10,000 pieces of X next month. Right, right.

[8:29]How much confident are you there? That that forecast is correct. Yeah. What if I I'm going to have 15,000, uh, requests, you know? Yeah. If I don't have the confidence level around that, not to be prepared, I might lose a customer. Yeah. There are real business implications for having, you know, so they can make a better informed decision, you know? If if you come in and you're saying, I'm 10% confident in that, you know, I'm like, oh, right? Exactly. And you might sometimes for some businesses, you might need 95% confidence sometimes, 80% is enough, you know, to make the decision. Yeah, yeah. And make it worth it. Exactly, but it's important for the business leaders to understand that. So on the accuracy side then as well, um, there's important decisions to make there too. So like, not necessarily forecasting, but like there could be user experience implications and other business implications. Any any thing come to mind on that side that you've encountered? Yeah. As far as the risk is concerned. And accuracy side, uh, it's important to understand the different types of errors that mistakes that your system might make. Imagine that you have an are building an autonomous vehicle. It sees an object next to the road. If that's a human, or not, it can be detrimental, you know? It can be very crucial to understand, if there is a like a high chance of having this as a human, probably you need to slow down or even stop. Yeah. You know? If you make a mistake, a human with an object, that can probably kill a program or a project, you know? The consequences are pretty severe, right? Exactly. You understand, but if you make a mistake between object and detected as a human, probably it's less important, you know? If you stopped, yeah, there will be some satisfaction for the driver, but it's much less important compared to hitting a person, you know? So that's, you know, a direct business decision, right? Um, discomfort of, you know, perhaps stopping too much or too early, versus the tradeoff of a substantially larger issue, and and those things need to be communicated. Yeah. False positive and false negatives, and and false negatives that you have to identify and Balance. And the value assigned to each of them. Yeah. Do you, um, so on that, do you think, and I'm just guessing here, do you think too often, um, the business side is relying on the data scientists to just make those decisions in a vacuum? Or do you think that they have a good understanding of how those might actually come into the business? Yeah. If companies they started just just started their own data science division or working on data science projects, they may not know they have to communicate these things to data science team. And if data science team is not very experienced, they may not find that out soon enough, you know? Yeah. These are the things probably needed to be communicated from the beginning. Yeah.

[11:16]Hey, these are the important factors for us. These are the KPs that we want to see. These are the dollar amount assigned to each kind of mistakes that you might make. Yeah. You know? And based on that, if they define well-defined that from the beginning, there'll be much better for the data scientist and also for the business leader. So getting, getting the, the business leaders to really kind of map out their priorities, right? Like the most important aspects of this and for them what they see are the key risks of, you know, the product or project that they're working on. Yeah. And generally, business leaders, they understand the dollar amounts. Yeah. You know, they can like come up or the business people can come up with the numbers for each kind of mistakes, which, whatever the, uh, errors that the system might have. Yeah. So actually on that subject, um, I've seen a couple numbers that suggests that nearly nine out of 10 internal data science projects fail. Yeah.

[12:07]Um, uh, could could you give us your take, uh, on why you think that is, and, um, there's probably a multitude of of things that impact that.

[12:22]Yeah.

[12:25]Uh, I'm a data person and I love to see that how they did the how they came up with that number. Yeah, yeah. I want to see the sample size where they went. But if I want to guess, probably, uh, uh, I'll put a, uh, uh, uh, uh, credit right here, we'll we'll link to the article. Uh, I would, uh, I would probably see that one of the main issues can, uh, come from the miscommunication. Yeah. If you don't communicate well what you want from the data science, or you have very high or unrealistic expectations, that might fail. Yeah. And sometimes all back to the movies. Yes. Exactly. And there are other issues like lack of data, proper data, and, uh, sometimes also on the data science side, they might underestimate the amount of time they they need or what they are going to deliver. Ambiguity. How you are going to define the project, how you are what you are going to deliver, what are the KPs that you are going to meet. These are the things that probably at the beginning needs to be, uh, need to be, uh, well-defined before even they start the project. Yeah. Um, now, in in in regards to that, uh, you mentioned the time that it's going to take. In your experience, like, how long does it create time, uh, to create a model, right? Um, just to get the model up and running, and then there's obviously optimization time after that. What types of ranges have you seen that on the business side, they they might be surprised to hear, um, Uh, there are, when you want to get a model just up and running and test, uh, you generally need to prepare your data first and have the data ready to do your analysis on that. Uh, that might take you, it depends, if the data is ready or not. I I assume that your data is ready. If the data is ready, uh, running a model, building a model, probably for the initial model, probably might take you a day or two. Nice. Yeah. And you can get an initial idea what's going on there. This would be that to understand the feasibility of the project, right? Like how much additional effort will it take? Yeah, yeah, yeah. From model to come off to gain insight or turn that to a product, it's a much much longer way. Yeah. I remember I built a, uh, I was building a product, a computer vision product for one of the companies. Uh, they were assigning around 6 months for that project. And after first day, I built an initial model. Yeah. Like, they were saying, oh, if you can do that in a day, maybe what you can do is 6 months? I said, no, no, wait, wait, wait. This is just a model. Turning that to a product, might take us the next 3 to 6 months. Yeah. And it took us around 3 months to turn that to a product, you know? Which their users, the people could use it like shop, product, and get what they wanted. Yeah. Um, uh, now, that sounds like that actually went fairly smoothly, right? Yeah. And partially because you did this kind of feasibility test up front, right? Like you knew what you were getting into. Yeah.

[15:20]Um, but if if, you know, it's true that internal data science projects don't make it into production that often, and and some are going to fail. Um, I've literally heard an executive use the words, this can't be another science experiment. Yeah. Uh, what what would you say to that? Uh, the mindset of experimentation I think it's growing, you know? Uh, it it was more on the tech companies' side, uh, but it's growing more and more in business. Yeah. And on the data science side, I think that if we, uh, there are different kind of data science projects. Sometimes that you just need to do some initial modeling and analysis, and report what you have. You don't need to like productionize it necessarily. Sometimes that you want to build a model, every few years you might use it. Again, you may not productionize it. But when you want to productionize it, there are a lot more resources that you need around that. And sometimes people don't see that part. Maintenance or turning a model to a product, it takes a lot of effort. Probably more than what they expect, and some maintenance of the data products are, uh, AI products are usually heavier than software products. Let's actually dive into that a little bit. Um, so let's split that into the maintenance component and then like the the product side of it. Yes. When you think about taking a data science project and turning it into a functioning product, Yeah. What, um, what should folks, uh, on the business side be thinking about in terms of things like resourcing and how the team is structured in in addition to that, um, and the types of, uh, um, additional budget that would need to go into that as well? Yeah. Um, if I want to do that, probably I would prefer to have a very small team at the beginning to run some science, uh, experiments, you know? Yeah.

[17:13]I would not call it science experiments, but feasibility experiments. Yeah. And to see that how we can bring in the data, different kind of data sources that we need, and also the KPIs that they want, usually that it's feasible or not. You get a sense of that within a maybe a months, or two, or sometimes within a week, you might get a sense about is it feasible or not. When you get to that point, uh, you can, you have much better idea that how I can, uh, build on top of that to turn it to a full-fledged product, you know? Certainly.

[17:44]Uh, it might take two months, it might take 8 months, or it might take a year to turn that to a product or a business around that. So it sounds like this initial feasibility experiment that we're talking about is fairly critical. And it's it's worth, you know, have have you had a situation where you ran it and you're like, okay, this is just not going to be feasible, and like how do you communicate that out? What does that what does that look like to to tell that story? To pivot the like, the ideas happened that we tested, and expectation is for example, the expectation in the accuracy is 90%. Yeah. And if I feel that I can get to 80%, like, especially if it's in, if it's in an area that which I have a lot of experience, I probably can say if it's feasible or not, sometimes.

[18:44]It'd be extremely difficult, yeah. And, uh, but, uh, if I get it to 50%, I would say forget about that. Yeah. Usually that we try to get to 70%, 80%, or very close to what we want in terms of the final KPIs to make sure it's it's possible to deliver next 3 to 6 months. Yeah.

[19:04]Because especially in the environment that I'm working right now, uh, it's very fast paced, and we need to deliver that within 3 months, or something within 6 months that yeah, we have to deliver a full-fledged product. Yeah. Um, uh, now, I know the answer to this, so I'm kind of just fishing a little bit. Yeah. But, um, uh, how would you describe to a business person the difference between getting to 80% and from getting from 80% to 90% and 90% to, you know, 95%? Like, what is that look like, and, um, what types of expectations should a business have around that? Yeah. I would say, even before that, getting to 80% or 90%, preparing the data and having the data ready is a critical thing, you know? Some companies, especially tech companies, they have their data ready. Yeah. You go, you get I I just you need to have access to that specific data set, you know? That can be that can happen within a few seconds, you know? Right, right. But in a lot of like traditional businesses, the data may not be ready, the data might be sitting in different areas, uh, in silos. Yeah. And or the data has not been collected over time. You might need to even build a pipeline. You need to build that the mindset of we want to turn this business to a data driven business, you know? Yeah. If that didn't happen yet, forget about 80%. Let's build the infrastructure, you know? Yeah. Gotta start there. Exactly. But when you get the data, and you want to start what kind of data you need, you know? That's also another question. If I am running a business without having any infrastructure to collect data, what can I do there? There I would come up with building a very small or simple data collecting pipeline, not necessarily again, going to turn everything digital or to collect the data. No. I want to see if there is any value or also what we want to get out of that. Right. You know? If I have limited budgets, I will go like that. If I don't have unlimited budgets, probably I will turn everything. Let's build that entire infrastructure. And I will collect as many data, as many signals as possible, and later I will build model, models on top of that. Yeah. But when you have the data ready, uh, usually that you can, uh, get get, uh, accuracy. This is what I'm saying is very, very, uh, uh, rough estimate. It doesn't happen on every project. No one's holding me to this. Uh, you might get it to 70, 80% within a like a few days or sometimes a few hours. Right, right. But when you want to get that to higher rates like 85, 90%, it might take you a few months, when you want to get to 99.9, it might take you a few years, or if you like longer time, you know? Yeah, the uh, uh, there's exponentially more work, right? To get just that little bit of extra, you know, accuracy or confidence out of it. Yeah. And sometimes that you may not necessarily go through just the tradition, just one model. You might need to train different kind of models for different kind of situations. That adds a lot more to the complexity. Right, right. Um, uh, so I I like talking about the kind of data infrastructure thing because that transitions into kind of one of the next questions that I have, and I want to I want to jump back into that a little bit. But, um, uh, I'm seeing a lot of articles, uh, you know, with the title like death of the data scientist, um, uh, is this happening? And how do you see the role of a data science changing, um, over time in in response to this? Um, and, uh, you know, are we going to see, you know, um, uh, different technologies that are replacing data scientists? Is that happening? Yeah. Uh, if I want, like, uh, answer in short, I I don't believe in that. Yeah. Data science is probably like software engineering in early 2000. Yeah. People were saying, oh, so many people are going into computer science, you know? And, uh, I remember when I wanted to go to school, I was saying, how many how many software engineers, or how many computer engineers they need? Yeah. Right, right. But over time we see, is it is the demand for the software engineering is dropping? Probably not. Yeah. It's growing, you know? Yeah. The same for data science. But what's happening about data science, five years back, 10 years back, companies didn't know what they want out of a data scientist. Right. They would open a division, a usual data science and what do you need? I don't know, let's put 50 different things on the job requirements and bring people here, while they needed a person who was very professional with Excel or with tableau or some other softwares. Sure. But later they found out, okay, I need some analysts, I need some data scientists who are like expert specialists in pricing, for example, or specialist in demand forcast, who knows natural language processing. Yeah. These are like, like companies are growing more mature. They know what they want better. On the other side, there are a lot of technology are coming out there. Big big cloud service providers, like Google, Amazon, uh, Microsoft, they are providing a lot of APIs, which helps people to do scrappy things very quickly. Yeah. And that might give you an idea, oh, doing object detection is in computer vision is that simple. Why do I need a computer vision expert, you know? Yeah. Turning that to a product is a very different story, you know? And sometimes you may not want to if you have like 10,000, no, not a thousands or millions of pictures that you want to process. You don't want to do that with, uh, those clouds, API. Right, right. It's probably very expensive. You want to you might want to do that on your own with some more sophisticated models that you are going to develop with your custom data, you know? Yeah. That's where that you need the data scientist. And also that there are concepts of AutoML, which is coming out and people are talking that, hey, you can push a button and get results out of that. Sure. There are a lot more to the data science that's, uh, compared to, uh, having just AutoML system, just push the bottom. You need to understand the data, you need to prepare the data, and feature engineer, a lot of those things. I have doubt if it's very easy to, uh, do that with AutoML. And the most important thing is the KPIs and business needs and impacts that you want to drive out of the data science problem that you have. And that's probably, I don't think it's in the near future that will happen that easily to get rid of the data scientist, and, oh, I'm pushing a button. You know, uh, uh, critical thinking never goes out of style, right? Yeah. Um, uh, and, uh, no no amount of AutoML is going to be able to understand, um, whether or not this is working towards the business's objectives, right? Um, so, uh, I I do get a sense though that, you know, new data scientists entering the field are kind of just focused on on that aspect of it, though. Yeah. Um, they're focused on the how do I just produce a model, um, uh, would you agree? Do you see that? What would you tell some of the new data scientists, uh, entering the field now with the prevalence of these types of technologies? Uh, I would say that what you can bring to the table as a data scientist in addition to AutoML or all of these technologies out there. One thing is that first you have to be practitioner of machine learning. You can probably utilize these APIs, these ready to use tools, to to produce that 80%, 70% accuracy much faster. Yeah. You know? That also gives the confidence to the business leaders, hey, we can if we start bringing a team, which can build as customized model, or customized system here in house, that can like exceed 90, 95%, probably, you know? Yeah. That's like a lot of like open source tools or AutoML tools or APIs, available out there, they can help you the new data scientists, or all probably all data science can benefit from that, when they want to start a new project to just test the idea. Yeah. Later you might want to rewrite your own algorithms, or use some like combined multiple algorithms to improve that. So, uh, we just impacted a lot on the, um, uh, thinking about the build versus buy implications. Earlier in the conversation, there's an aspect that I'd really want to get back into, which was, uh, when we first started talking about the depth of the data scientist and infrastructure. I think a lot of these two things start to become related because, um, you know, in the comparison that you made with software engineer, um, you start to see much, much more specific types of engineers and data scientists. You mentioned, you know, someone who focuses on NLP. Yeah.

[27:45]And businesses, um, uh, were just kind of blanketing their resumés with 50 things on there. Yeah. Um, what are some of those kind of more specialized verticals that you're seeing, and then, um, in addition to that, could you maybe speak towards the skill sets that are required on the data management side, versus the data science side, and if there is a difference between those? Because it does seem like there's, um, uh, a fair amount of time obviously, involved in managing and cleaning data, and companies are hiring the same role to do those parts as well. Yeah. Uh, on the the skill side that, uh, you're definitely seeing more and more specialized in people in different areas, like NLP and by in NLP, there are different like divisions, people are experts in natural language generation, natural language understanding, natural language processing, and they do like PhDs there, you know? Yeah. And they are expert there. There are a lot of products coming out. NLP is probably the hottest area right now, and computer vision probably was 2 years ago or something like that, and computer vision by itself there are plenty of like, uh, like cell fields that people are working on that. Yeah.

[29:00]There are a lot of things are transferable, that you, for example, you learn deep learning, it's it helps you to utilize that, use that for computer vision or NLP, for this is like the foundation for that or certainly. A little bit like LSTM or, uh, any of those time series related deep learning type of model, you can use that for time series, demand forecasts, kind of like problems. Yeah.

[29:22]And that can be like deep learning to me is a foundation. You learn that properly and use that and build on top of that expertise in computer vision, natural language processing, or time series analysis. The other side is that you have to like definitely learn the basics in statistical learnings. There are variety of, uh, very good, powerful statistical learning techniques out there. Yeah. Random forest, these kind of algorithms are still very powerful, very useful. And stats is definitely needed. Yeah. And, uh, your question was around that. Am I right? That's correct. Yeah, yeah, yeah. And then the, uh, uh, it was a long question. Yeah.

[30:06]The second part of that is, you know, um, uh, the the difference between the kind of data management side and the data science side, um, uh, because, you know, I think what I I see sometimes is, you know, you hire a team of data scientists and then again, they're spending, you know, 80 90% of their time just cleaning and managing data. Yeah.

[30:24]Um, is there a shift in in roles happening now? Is is should should the business side when they're writing up, um, you know, job descriptions, be thinking about different skill sets for that? Yeah. Before to get this question, I would like just finish the one thing around the first question. Is like, uh, a lot of time that when you want to like develop a skill set, uh, it's good to find out what kind of industry you're interested to work. You know? What line of industry you love to work on for next 5, 10 years, you know? If you want to do like more of a computer vision related type of things, probably we'll focus on that and learn more. If you want to go, if or you are in an industry right now, which needs, uh, which needs like some specific type of, some specific type of analysis. Go and learn and then see what machine, how machine learning has been used in that other space, you know? Yeah.

[31:12]Uh, back to your question about, uh, the difference between data management and the data science and, uh, is there any shift, uh, these days in that other space? Yeah. Um, uh, I would say what I like to hire, if we are going to hire people, what I like to see in a resume is like full stack data scientist. A person who can like get the data, clean it, prepare it, and run some models, turn that model to an API, turn like productionize the model. Right, right. Run a lot of tests and making sure like, they have to, to me, they have to know all of these steps, and they have to be comfortable to do that. Yeah.

[31:53]A lot of places you go and see data scientists, they don't want to touch the data before it's ready to use, you know? I I prefer to have people who are not like that. You know, you have to like, the reason behind it, I think that if you don't touch the data, if you don't play with the data, if you want someone else to take care of your data, uh, probably you will not have a good understanding about it. Yeah. And if you want to run just one a single model on a data set, probably it will be okay. But when you are building a product, you and in a in a foreign industry, usually you need to take care of a lot of issues, a lot of limitations, and you might need to have a very thorough perspective about the whole data, which is provided, and what kind of model or models you are going to build. And to understand the limitations, risk and stuff like that, you know? I think if you play more with data, get intimate with data, it helps you to come up with better models and systems. Yeah. So, uh, what I'm hearing is that if you're not, you know, in your own sandbox, you know, that sandcastle is not going to turn out right, you know? Exactly.

[33:00]Like, you've got to know it in in its deepest format. Exactly. I like that, get intimate with the data. Exactly.

[33:10]I think that data scientists have an extra layer of difficulty, right? Exactly, and also in data science, a lot of data science problems, you are in an experimentation mode. Yeah. It might have worked for other use cases, but when you are coming for this specific use case, you need to test. It may not work properly for any reason, there might be a variety of reasons. Uh, this is important to be like, this expectation need to be managed, you know? And also time-wise, it's not like a lot of software problems have been solved before. Yeah. And you say, oh, somebody built that API, built that app, built that website, you know? These are the repeatable things. It should be like fairly easy to estimate the for the time, for the budget, for the team. But in data science, it may not be as easy to estimate as software projects are. Right, right. And, uh, other than that, for the data science people in the team, it's important to communicate constantly with the product managers. Yeah. And to see what are the KPIs needed, where they are, you know? Any changes it's needed, or any issues that they have, you know? To make sure that product will be delivered on time, product will be delivered on time. At the same time, they are working on the the right thing, you know? Yeah. We tend to work on what we like to work on. Yeah. And it might deteriorate you from the important stuff, you know? There's lots of cool things out there to work on, you know? Yeah. They're not always the things that need the most attention. You know, as a product manager, I I feel that. I feel you. Um, uh, my last couple of questions for you. Yeah. Um, uh, and we've touched on a lot of this already, but, um, one of the things that I'm seeing the most of, you know, traveling around social media on LinkedIn is that Venn diagram with, you know, stats, computer science, and the business category. Yeah.

[35:00]Um, and, you know, in the middle of that Venn diagram is data scientist. Um, and we've obviously been focused a lot on on the business side of things. And I want to zoom in, if we're picturing that Venn diagram, I want to zoom in on that circle. Yeah. What are the types of skills that you think are are most critical for the data scientist, um, within, if we're zooming in on that business circle, because, you know, I understand computer science and stats, right? They're they're fairly well, um, uh, self-contained. Yeah.

[35:29]But business is just like a broad thing to have there. Yeah. Um, so what are some of those skills? We talked about obviously identifying the KPIs and working, but what what would you say are the most important ones? Yeah. Like understanding the like communication is probably, uh, it will have everything in it, but still it's probably very broad to say communication. It can contain that understanding the KPIs, understanding the impacts of that, and understanding the like dollar amount assigned to that, the performance that you are going to have and, uh, or the mistakes that you might make. These are the things that generally it's important for data, it should be important for data scientists prior to going to build the model and system, you know? Yeah.

[36:14]And, um, as I said, and also understanding the time limits that if you have one week to deliver, or you have a year to deliver, you might do very different things. Yeah. And again, probably people who are a lot of data science, they have like PhD background in some different quantitative fields. They might come, oh, from a research mindset, hey, I'm I'm going to solve this problem in a year or two. No, this is 5 years. No. This is in business, it's important that, hey, we are going to time box that. Yeah. We want to see the impact in a limited time. Yeah. Are we going to deliver that or not? Yeah. That driving impact might happen just through running some simple regression, you know? Yeah. You don't need necessarily to run heavy deep learning problems, you know? Yeah. Technology for the sake of technology. Yeah. And and there are a lot and a lot more aspects. What is the computational budgets that we have? What is the speed that we need? These are around the problem that we have to communicate that prior to getting to solving the problem. Yeah. Again, we touched a little bit on this, but how would you recommend a data scientist, um, teams be built out? If you're going towards a if you're a business, and you have a, uh, a data science problem that you're trying to solve that you're trying to actually turn into a product, um, what are the key roles that the data scientist should be interacting with?

[37:40]Yeah. Like who are the other people outside of the data science organization, you know, since communication is so important that they should be having regular touch points with? Yeah. Uh, in terms of on the business side, it's it's important that, uh, the data science manager or the lead to have a like a frequent touch base with the business leaders, uh, CEO, general managers, you know, to understand what are the business needs, and why we are really doing that. What's the expectation, managing the expectation, you know? Yeah. That's probably very critical role to, uh, do as a data science leader there. Managing expectations in any situation can be very tough.

[38:16]I think that data scientists have an extra layer of difficulty, right? Exactly. And also in data science, a lot of data science problems, you are in an experimentation mode. Yeah. It might have worked for other use cases, but when you are coming for this specific use case, you need to test. It may not work properly for any reason, there might be a variety of reasons. Uh, this is important to be like, this expectation need to be managed, you know? And also time-wise, it's not like a lot of software problems have been solved before. Yeah. And you say, oh, somebody built that API, built that app, built that website, you know? These are the repeatable thing. It should be like fairly easy to estimate the for the time, for the budget, for the team. But in data science, it may not be as easy to estimate as software projects are. Right, right. And, uh, other than that, for the data science people in the team, it's important to communicate constantly with the product managers. Yeah. And to see what are the KPIs needed, where they are, you know? Any changes it's needed, or any issues that they have, you know? To make sure that product will be delivered on time, product will be delivered on time. At the same time, they are working on the the right thing, you know? Yeah. We tend to work on what we like to work on. Yeah. And it might deteriorate you from the important stuff, you know? There's lots of cool things out there to work on, you know? Yeah. They're not always the things that need the most attention, you know, as a product manager, I I feel that. I feel you. Um, my last question for you. Um, uh, what do you think is one of the most important things that you would tell a data scientist coming up in the field today? And again, I think we've touched on this a little bit, but if you had to narrow it down to one kind of thing, what what would you want them to know? It will be probably depending on the field, industry, they are going, but definitely it's important to know the foundation of the data science. But in a business that you are going, it's important to get to know the business very quickly, like one of the suggestions when I was finishing my after my PhD, uh, before I started my first job, uh, after PhD, one of my friends told me that go and learn acronyms of that company, that technology. Yeah. The jargon. I was thinking, how many acronyms they have? I never found, oh, there are if not more than a thousand acronyms were there. People were using like different group. Yeah, yeah.

[40:37]It seems that everybody knew that. Yeah, yeah.

[40:41]But I like, it took a while for me to learn that. Absolutely. And that helps you to understand the data, the business better, you know? And the other thing, probably I would say, question. Whatever you are working on, make sure you understand what are the impacts and what are the, like, your time frame, like, all the aspects of the problem, as much as possible question that, not in a negative way, more out of curiosity. That curiosity is very important about that. Medy, thank you. Likewise, it was a great pleasure. Thank you. Cheers. Thanks.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript