Thumbnail for Data Quality Explained by IBM Technology

Data Quality Explained

IBM Technology

3m 52s599 words~3 min read
AI audio transcription
Transcript source

AI audio transcription

This transcript was generated from the video's audio because no usable YouTube caption track was available. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Pull quotes
[0:00]Your company generates lots of data, but the business outcomes you gain from that data can be largely affected by data quality.
[0:00]To use an analogy, imagine you're a chef and you have the highest accolades in the industry, a highly experienced team.
[0:00]But when the ingredients come in, those are poor quality ingredients, picture rotten tomatoes, rotten onions.
[0:00]So when you go and make those entrees, the end result is poor quality, and your restaurant's reputation suffers.
Use this transcript
Related transcript hubs

[0:00]Your company generates lots of data, but the business outcomes you gain from that data can be largely affected by data quality. To use an analogy, imagine you're a chef and you have the highest accolades in the industry, a highly experienced team. But when the ingredients come in, those are poor quality ingredients, picture rotten tomatoes, rotten onions. So when you go and make those entrees, the end result is poor quality, and your restaurant's reputation suffers. This is the same impact that poor data quality can have on your business, causing your company's reputation to suffer as a result. There are a lot of different factors that can impact data quality such as the number of sources or the size of your company. But today, I want to talk about four main qualities within data itself: Accuracy, completeness, consistency, and uniqueness. And I'm going to talk about them through the lens of a lead generation company. Starting with accuracy. Accuracy is about the current state of your data versus reality. So, for my lead generation company, imagine I'm driving traffic to a website. And all of a sudden, I get a sudden spike in usage from bots that hit the click generation. If I don't account for this spike, when I go and pull that data at the end of the day, it's not going to reflect reality. So, it's not going to be accurate. Next, I want to talk about completeness, which is about how you have filled out all the required fields in your data set. So, let's say I'm launching a survey campaign, and I'm collecting names and email addresses. But I don't require this field, so when I go and pull that data, I notice that some of my participants didn't put their name. Some of my participants didn't put their email, so when I go and pull that picture of the client, of the customer, I have an incomplete data set and incomplete picture. Next, we talk about consistency, which is about how uniform your data set is throughout different data sources. So, back to my lead generation example, let's say I'm driving traffic for a drop shipping campaign. And I have my procurement team collecting zip codes and my marketing team collecting zip codes. But my procurement team is looking at them in a five-digit format, while my marketing team is collecting them in a nine-digit format. When I go tap into both of these databases and pull the customer profile, it might be incomplete because those zip codes don't match up throughout my systems. And lastly, there's uniqueness, which is largely tied to the number of duplicates I have in a data set. So, in my lead generation context, you can imagine having 50,000 leads at the end of the year. But when I actually go into those leads, I realize that 20% are duplicates from customers who filled out the information previously. So now when I go and pull that report, I actually have 20% less data and a lot less positive-looking picture for my company. So, looking at these aspects, it's easy to think, wow, there's a lot of manual inspection here. How can I go through all of my data and understand these resources, these qualities, right? Well, you can actually leverage machine learning and AI to automatically sense these key features as data enters your system, saving you time and manual inspection. If you're curious about these AI features, check out the links below, and if you're curious about technology, subscribe to the channel. Thank you.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript