[0:00]Hey, I'm Tim Berglund with Confluent. I want to tell you everything you need to know and only what you need to know about Apache Kafka.
[0:12]Apache Kafka is an event streaming platform used to collect, store, and process real-time data streams at scale. It has numerous use cases including distributed logging, stream processing, and pub sub messaging. And that sounds like something that a committee of MBAs would write if they had their persona document in front of them and were just trying to nail the messaging. And that's not what we're going to do in this series. But all those words are true. that really is kind of a nice one sentence description of what Kafka is, but there's so much in there that we have to expand on for any of this to make sense. I mean, even the phrase event streaming platform, that's totally accurate, requires a bit of a journey before the full significance of the words really land on you. And these videos are that journey. To begin with, I want to start with just the idea of an event. It's worth just thinking about what an event is. Once we do that, then we can talk about how Kafka stores events, how events get in and out, how to analyze them, all that stuff. But first, we have to agree on what an event is. Now, an event is just a thing that has happened. That's it, and I know that sounds a little abstract, but that really is true. Uh, it can be any kind of thing. My go-to example is a smart thermostat, phoning home to report the current temperature and humidity and status of the HVC system in the house. Like, that's an event. Uh, but an event can be other kinds of things. Uh, an event can be the change in the status of some business process. Say an invoice becomes past due. Well, that's an event. Uh, an event can be some kind of user interaction, uh, somebody is mousing over a certain link on a screen or clicking on a thing. That's certainly an event. A microservice completes some unit of work and wants to put the the record of that unit of work somewhere. That's an event. All these things are events, they're just things that have happened combined with the description of what happened. So, uh, an event is a combination of notification, that's the element of whenness to the thing, that can be used to trigger some other activity. It's notification and state. Now, the state of an event is usually fairly small, say less than a megabyte or so in in concrete terms.
[2:46]And is normally represented in some structured format, like JSON or JSON schema, or Avro or protocol buffers or something like that. Uh, the state is is is serialized in some, usually standard format. Now, Kafka has a little bit of a data model for an event. An event in Kafka is modeled as a key value pair. Internally, inside Kafka, when these things are actually stored, keys and values are just sequences of bytes. Kafka internally is loosely typed, but externally outside, like you're, you're not. I mean, just look at you and and your programming language that you're using, whatever it is, is probably not that loosely typed. There's probably some kind of structure to the data and so going back and forth between the way that key value pair, that event is represented in your language's type system, uh, and the representation inside Kafka. Uh, Kafka famously calls that, uh, the process of serialization and deserialization. We came up with those words ourselves. Uh, and again, that serialized format is is usually like JSON or JSON schema, Avro, proto buff, something like that. And the value, that serialized object is usually the representation of an application domain object or some form of raw message input, like the output of a sensor or something like that. So that's why that structure of that thing's important, because in your world as you think about it, it probably has some structure. Now, the key part, I said a message is a key value pair. Keys in Kafka can be a fairly rich topic. I'm going to summarize them very simply right now. They can be complex domain objects, serialized with all those same formats, but are often just primitive types like strings or integers. So the key part of a Kafka object is probably not a unique identifier for the event. If you're thinking of like a primary key in a database table where the key uniquely identifies the row, the key in a Kafka message is is not like that. Uh, it's more likely the identifier of some entity in the system, like a user or an order or a particular connected device like the the ID of that smart thermostat or something like that. And this may not sound significant right now, but we will see later on that keys are crucial for how Kafka deals with things like parallelism and data locality and things like that. So, that's the very basics of Kafka, the one sentence definition, and the notion of events.



