[0:03]How's it going, everyone? In today's video, we're going to learn about functional programming in Python. And to do that, we're going to build the same TCG card pipeline twice. First, the way you'd probably already write code with mutable and in place changes, then the functional way, which uses pure functions with immutability. You'll see exactly why the second approach shines for pipelines. It's not about forcing a paradigm, it's about having options. When you need testability, reproducibility, and composability, immutable data flow makes your life much easier. In this first file, we're starting with what's familiar, the imperative approach. We'll use a classic pipeline class that stores mutable state and modifies records in place at each step. This works, but it comes with some tradeoffs, we'll point out along the way. Before we continue, I'd just like to thank Z for sponsoring this video. Z is a lightning fast code editor written in Rust, and I've been using it to do all my coding for months now. If you feel like trying out Z, I've left a link in the description box down below where you can download it for free. Anyway, let's get started with our first example. Here, we have a data class that holds all the data for a single TCG card as it moves through the pipeline. Each field represents a stage in processing. We add more information as the card progresses from raw JSON to validated and transformed data. If you have ever used something like N8N, this builds that style of pipeline. The record passes through the stages, starting with the raw source, which is the original JSON string we received. Then we have the parsed data, which is the dictionary after we use the loads method from JSON. Below that, we have the normalized data, which is the transformed data after validation and normalization. Then we have an is valid flag. And this checks whether the card passed validation. We also have an error message for when something goes wrong. This tells us what it is. And finally, we have a timestamp that tells us when the processing completed. Below the TCG record, we have a class that represents the mutable state and steps in our TCG card processing pipeline. Each step modifies the record in place. Starting with the initializer block. Here we have two lists which hold our pipeline state, all the records that have been processed successfully and all the ones that failed along the way. Then we have a parse method, which parses the raw JSON string into a TCG record. If JSON loads succeeds, we get a dictionary with some passed data, and is valid remains false because we have invalidated yet. If it failed, we still return a record, but with an error message explaining what went wrong. Moving on, we have the validate method. This checks that required fields exist in the past data. And here we modify the card in place, setting is valid and error message directly, here, here, and here. Below that, we have the normalize method. And this is used to normalize the card data, convert types and standardize values. And here we modify the card in place, setting normalized data directly. As you can see at the bottom of the method, we are directly updating the normalized data. This line normalizes the game name to canonical form, which means lower case and no dashes or spaces. This lets us accept Yu-Gi-Oh, Yu-Gi-Oh, and Yu-Gi-Oh as the same game. Next, we make sure that it's a game that we recognize. So if it's not Pokemon, Yu-Gi-Oh or Magic, it's an unknown game. Below that, we normalize the rarity to title case. Then we convert the price from a string to a float, and convert the quantity to an integer. It will default to one if not provided. This way missing quantity doesn't cause an error. Below, we map the canonical names back to display names for pretty printing. And finally, we build the normalized data dictionary with all transformed values. And finally, we have the last method, which takes some raw JSON strings of type list of string and returns a tuple of list of TCG record and another list of TCG record. And this is the main entry point. We loop through each JSON string and run it through our pipeline steps. Notice how we're mutating the same record object as it moves through each stage. Here, we pass the card's data. If it failed, we add it to the failure list. Next, we attempt to validate the card. If the card is invalid, we also add that to the failure list. And finally, we attempt to normalize the card. If it fails, we append it to the failure list. And at the end, we returned the cards which succeeded and the ones which failed. In each of these functions, we mutate the record in place. Now, in our very limited example, this does seem fine, but in a larger pipeline, we may not have the luxury of having the records produced and contained purely within the pipeline. They may be passed to other systems during different phases and be used in other places. The core issue with this is that we can no longer trust the records as sources of truth. Even worse, if a stage of the pipeline fails, we now have to deal with our state being dirtied by incomplete transformations. But let's move on to the main entry point. Here, we're going to have some sample TCG cards to process from Pokemon, Yu-Gi-Oh, and Magic. And just to show you all of the information, I'm going to zoom out. Just to show you that here we have valid cards, missing fields, invalid JSON, and unknown games. Here, we're missing a quantity. Here, it's just invalid JSON. And here we have an invalid game. So we have one of each. Below, we instantiate the pipeline. And finally, we process the JSON. Then we print everything that was successfully processed, and everything which failed. So now when we run this, what we should end up with are a bunch of cards which were successfully processed and two which failed. The ones which failed tell us exactly why they failed. But now, let's refactor this to use pure functions. Each step takes input and returns new output. No mutation in place. This makes everything composable and testable. The key difference is that we're creating new records at each step, rather than modifying existing ones. And the beauty of this approach is that you can replay any step, run it multiple times or even run it in parallel. The same input always gives the same output. So what we're going to do is import Funk tools, and also import callable. And first things first, we make our data class frozen. This ensures that once a TCG record is created, it can never be modified. Any attempt to change a field will raise an error. This is a key part of the functional approach. Immutability gives us predictable, reproducible data. Everything else inside here remains the same. And now, because we're taking a more pure, immutable approach, we don't need as many methods attached to the record. We treat our object as a pure data container that stores data at each stage. The class is just a container. All the logic lives in standalone functions, which means at this stage we can completely remove the class and the initializer block. We do not need those anymore. But remember to indent all of these functions. So I'm just going to highlight all of them and indent. And obviously, we're going to have to edit each and every single one of these, because they no longer refer to an instance. So self is not going to mean anything here. But here's our first pure function, parse. What it's going to do is take a raw JSON string and convert it into a TCG record. So to do that, we're going to remove the leading underscore and remove self. And this will always return a TCG record. And the rest will remain the same. If JSON loads succeeds, we get a dictionary with the past data, and is valid stays false, because we haven't validated yet. If it fails, we still return a record, but with an error message explaining what went wrong. Either way, we always return a TCG record. No exceptions escape this function. Next up is validate. It checks that the past data has all the fields we need. Notice the pattern here. We always pass along the data we've accumulated so far. If there's already an error from a previous step, we bubble it up unchanged. This is important. We don't lose context as the record moves through stages. The required fields are game, name, set code, rarity, and price. If any are missing, we return a new record with is valid set to false and an error. If all are present, we return is valid, but still no normalized data just yet. That comes in the next step. So, let's remove all of this because we need to rewrite this code. First, we're going to check if there's already an error from passing. If there is, just pass it through. There's no point in validating something we know is already broken. And before we continue in here, we should remove the leading underscore, self, and specify that this returns a TCG record. Then we want to check if the past data exists at all. If it doesn't, we return a TCG record with no past data. Now we can check that all the required fields are present once again. If fields are missing, we will return an error with a list of what's missing. Otherwise, everything's good. We will return a record with is valid being set to true. Next up, we have the normalize function. This is where the actual transformation happens. We convert types from string to float, standardize values such as game names, and calculate derived fields such as the total value, which equals the price times the quantity. If you followed one of my recent videos, this might look a little familiar. And that's because this too uses a very monadic approach to handling the data. What we do is return a new record with normalized data set. If validation failed or there's an error, we return early with normalized data set to none. So once again, let's remove the leading underscore and self. We are no longer in a class, and this will also return a TCG record. Next, we should skip if there's already an error or validation that didn't pass. And below that, we're going to check whether there is past data or whether the card is valid. If any of these are false, we return the following TCG record. Then we grab the data from the past data, and once again, we normalize the game name to canonical form so that we can accept variations of the same name. Then we check whether the game is one we recognize. And if it isn't, we return that it's an unknown game. Below that, we normalize the rarity to title case, convert the price from string to float, and convert the quantity to an integer, which will default to one if it's not provided. Then we map the canonical names back to display names for pretty printing. And with all of that information, we can build the normalized data dictionary with all transformed values. And finally, we return a new record with normalized data set. This is the final form, a fully processed TCG card. Below that function, we're going to create a pipeline function. And here's where the magic happens. It takes a list of transformation functions and applies them one after another. The key insight is that we use funk tools dot reduce to chain the functions together. Reduce starts with our card from passing, then applies each function in turn. Each function takes a TCG record and returns a new one. The output of one becomes the input to the next. And this is exactly the same pattern as Unix pipes. For those of you familiar with Unix, each command transforms the stream and passes it along. One benefit of this approach is that we can reorder transformations, add new ones or remove them without changing any code logic. So here, we first pass the JSON string into a TCG record. If passing failed, we skip the rest of the transformations. There's no point of running validation on broken data. And finally, we use reduce to apply each transformation in order. And just to give you a quick insight on how reduce works, I have this comment right here. So if you add three functions in a list, it's practically the same thing as calling it like this. This replicates what we had in the imperative example. Processing a list of JSON strings and returning two lists, one for successfully processed cards and one for failures. We use a list comprehension to process each card individually. Then separate them based on whether they have normalized data. A card with normalized data is success, without it is failure. So now let's fix the process function, and to do that, I'm just going to create it from scratch. So to start off, I'm going to give it a new name and signature. And what it's going to take is some raw JSON strings of type list of string, and some transformations. And this will return a tuple, which contains the successful transformations and the unsuccessful transformations. First, we need to process each JSON string through the pipeline. Then, we will separate them into processed, which has normalized data, and into failed, which has no normalized data. Notice how each function is self-contained and returns a new record. We can now easily test each step in isolation. And if something goes wrong, we have the original record preserved at each stage. No hidden mutation. This also allows us to roll back in the case of errors and to ensure that all records act as a source of truth. And here's another benefit. We can compose these functions in different ways. If you want to skip validation, just compose normalize alone. If you want to add a new step, just insert it into the transformations list. Now going back to main, we do need to make a slight adjustment. For example, we no longer have TCG pipeline as a class, which means that we can also safely delete the method associated with it. What we're going to do instead is create some transformations, which will contain validate and normalize. Then we can grab both the processed and the failed cards, using process batch, with the raw documents and the transformations. And now when we run this, we should end up with the exact same result. We have all the cards which were successfully processed and the ones which failed.
[17:10]Now, before I end this video, I just want to quickly mention that I've uploaded all this code to my GitHub repository. So, in case you want to play around with it on your own machine, you can find a link to it in the description box down below. But otherwise, that just about covers everything I wanted to talk about in today's video. So, as always, thanks for watching, and I'll see you in the next video.



