Thumbnail for Lecture 4: Conditional Probability | Statistics 110 by Harvard University

Lecture 4: Conditional Probability | Statistics 110

Harvard University

16m 18s2,699 words~14 min read
Auto-Generated

[0:00]So, um, at the end of last time we were doing this matching problem, De Montmort, and and then, uh, then I wanted to say a little bit more about about that problem, uh, because I was a little rushed at the end of last time. So, I'll finish talking about that, that was an example of inclusion exclusion, and then, uh, for the rest of today, um, I want to introduce independence. We've already kind of been talking informally about independence, we say things are independent, what does that really mean? Okay? And then the the big main topic for this entire week is conditional probability and and conditioning, so that'll be the rest of today. First, let me just quickly remind you of this, uh, De Montmort problem. Matching problem continued from last time, this because because I had to explain it a little too fast last last time, and there were a couple more comments about this. This is sort of the most famous example of inclusion exclusion. Uh, if you read the strategic practice and if you go to section, you'll be seeing lots more examples of inclusion exclusion, but this is probably the most famous one and it's a nice one. Uh, so I'm not gonna write the whole thing again, but I'll just remind you of our notation from last time in case you forgot, so I'll just remind you briefly. We had, we have a deck of N cards, the cards are labeled one through N, and you're just playing this game where you're just flipping cards over one by one, you say one, two, three, whatever. And you win if, if in fact, at some point the card that you name is the card that that appears, okay, that's the game, you wanna know what's the probability that you win that game. Okay, so I defined AJ to be the event that the Jth card matches, in other words, the Jth card in deck is labeled J. And we were trying to find the probability of the union. And I want to explain this in a slightly different way, and then then make a couple more comments about it. So, so what we're interested in is the union. And you can think about how to do this directly, but it's pretty hard, and and inclusion exclusion is the easiest way to do this. So, if you remember your inclusion exclusion, it says that this problem of the union is gonna be something that looks like first you're going to add up the individual probabilities. Then you're going to subtract the probabilities of different intersections, right? And then you're then you're going to add triple intersections and you're going to subtract and add and you keep going like that. So, in order to compute what we need for inclusion exclusion, basically what we need is the probability of A1 intersect A2, blah, blah, blah, intersect AK. This is a handy trick, this this is like really, really, uh, simple, and yet and yet students for some reason don't like doing this. I highly recommend, what what did I do here? I picked the first K of them for concreteness. Uh, what what what makes this problem doable, you know, without spending weeks on it, is the symmetry of the problem. So, I could have written, you know, A7 and A9 and whatever subset of K of them I wanted, okay? But for concreteness, I'm just picking the first K by symmetry it it wouldn't it wouldn't change if we if we if we were, we could start introducing some double if we wanted to be really formula, we could start writing like A sub J sub one, A sub J sub two and get all these double subscripts, and then people start to get confused. But by symmetry it's the same as just writing this one down, okay? So it really helps to make it concrete. Now, this is just N minus K factorial over N factorial, that's immediate from the naive definition of probability.

[3:57]Because we're assuming that all orders of the deck of cards are equally likely, that's the denominator, and the numerator, what what this says is that the first card one is is labeled one, card two is labeled two, blah, blah, blah, card K is labeled K, okay? So that means the rest of the N minus K cards could be in any order, but the first K are constrained. So, so that's immediate from the naive definition. So that's what we did last time, and then I I'm going to explain it in a slightly different way now. Just notice that there are how many of these terms are there? Well, if you look back at your inclusion exclusion, we don't just do this for one through K, obviously, we do this for all subsets of size K. So we're going to have N choose K terms that look like this. And N choose K, you remember is N factorial over N minus K factorial K factorial, such terms, all same by symmetry.

[5:05]And and now that this makes it easier to see, then what I wrote last time, easier to see why it's going to cancel into something very nice because we have N minus K factorial over N factorial, we have N factorial over N minus K factorial, it cancels, the only thing that's left is the one over K factorial. So, so now by inclusion exclusion, it's immediate that the probability of the union equals one minus one over two factorial, plus one over three factorial, minus, dot, dot, dot, plus negative one.

[5:43]And I think I forgot the plus one last time, because I was writing it in a hurry at the end of class, you may want to fix your notes. There was a plus one there there, that's the last term. And let's just do a quick sanity check. Does this one over N factorial make sense? Well, this one over N factorial term is saying the cards are perfectly ordered one through N, right? This is the case where all of them match. There's only one way that could happen, that the cards are one through N. So this term should make sense, okay?

[6:16]So that's what we did. And then suppose we wanted the probability of often often the problem is phrased the other way, as what's the probability that there are no matches? So, of course, that's just going to be one minus this.

[6:33]And just for practice, let's write that in in uh, set theory notation. Remember from your math review, uh, if you take the complement of a union, it's the intersection of the complements, right? So, so the complement of that is the intersection of AJ complement, and that's just one minus this. So, I'm just going to write one minus one plus one over two factorial, minus one over three factorial, plus, dot, dot, dot, plus, now it's negative one to the N, because the the sign flipped. All right, so this is the exact answer for the probability of no match. Now, this thing looks pretty complicated, so what if we wanted an approximate answer? Um, then you you have to recognize the pattern here, whenever you see something that looks like, you know, factorials in in the bottom, that that that should be making you think of various tailor series, right? And the two tailor series that we need over and over again in this class are a geometric series and the tailor series for E to the X. This this, when so, when you see something like this, this should immediately remind you of the tailor series for E to the X. Of course, we could put a one factorial here and a zero factorial there, without changing anything. That's exactly the tailor series for E to the X evaluated at X equals minus one, so this is approximately one over E. Which is about 0.37. And and most people find this result pretty surprising because first of all, where did E come from? Seems like you have a pretty discrete, you know, E is making you think of logarithms and maybe some calc calculus continuous stuff. This just seems like a discrete problem, just like, you know, arranging these cards randomly, what does a deck of cards have to do with the number E? The other thing that's surprising about it, well, actually I haven't done a survey of of this. Uh, not like formally. It would be interesting to see, uh, what people's intuition is, if you let N goes to infinity, um, I'm pretty sure most people who hadn't seen this would guess that the probability of no match either goes to zero or goes to one, right? And it's sort of depends whether you're you're a pessimist or an optimist, right? Because you have this, if N is getting very large, you have like, you know, a billion cards, okay? Now, each one of those cards is extremely unlikely to be in sort of it in its position, but you have so many of them.

[9:11]Somehow those two kind of competing forces reduce to to one over E. And we're going to see the number one over E a lot in in in this class, so sometimes I like to say that if if you have no idea whatsoever, you know, on a problem, and you had to guess, you should guess one over E. Um, you don't have a good chance of getting it right and you, you know, and you're supposed to justify your answer, but probably one over E is going to give you a better chance than anything else. So, and and and, um, so this is exact as N goes to infinity, it will converge to one over E. But even if N is like ten, ten factorial is over three million, it's a huge number, so even if N is ten, I I think this this this number is correct to within like two, uh, something like two out of a hundred million accuracy, something something like order of 10 to the minus eighth, the difference between this and this if N is ten. So, so it converges to one over E extremely fast, okay? So, that's just an illustration, I like this problem because, well, it's kind of cute, it it illustrates one over E, it illustrates inclusion exclusion, it illustrates to symmetry, and those are all things we're going to see a lot of. So, okay, so that's that problem. Um, so now I said we'd define independence, and it's very easy to define, it's not so easy to understand fully the definition.

[10:40]So, so we already have some intuition about what does it mean to have independent events, that that is like one of them gives you no information about the other one, basically. Uh, but that but that definition isn't, you know, is too vague to actually verify are you know, are these events independent or not? So, if we have two events A and B, we say that they're independent. We're assuming that we're, you know, we have a sample space and and a probability function P that that that that, you know, that that we're working with. So it's within the context of one experiment and one probability P that satisfies the axioms we've been talking about, okay?

[11:23]They're independent if and the definition is just that the probability of A intersect B equals P of A, P of B. So, so that's pretty intuitive, it just says that what's the probability that both A and B occur? Well, the probability that A occurs times the probability that B occurs, they have nothing to do with each other, it's independent, so, so we just multiply. That's the definition. Uh, we'll we'll see some equivalent forms of it later. Um, this board is very noisy. Um, I wanted to contrast this with with disjoint. Because that's a common mistake and that and that's a terrible, disastrous blunder. Uh, this is completely different from disjointness.

[12:13]If I say that A and B are disjoint, that means that if A occurred, then B cannot possibly occur, right? That that's opposite to independent.

[12:26]Independence says if we know that A occurs, it tells us nothing whatsoever about whether B occurs. Disjointness says if A occurred, then B can't occur, so, so, so they're completely different concepts, but for some reason they get confused a lot, so that's always mentioning that. Um, that's for two events, let let's say, let's just define what happens with more than two. If we have if we have events A, B, C, and are independent, if this is this is the definition of what does it mean for three events to be independent. Well, first of all, we want any two of them to be independent, we kind of weird to say that all three are independent, but but A and B are not independent. So, so we have P of A, B, and by the way, some sometimes sometimes it's fun to write commas instead of intersections. And that that doesn't cause any problems in in until unless we start bringing in unions and things like that. So, sometimes I'll write P of A and because I'm interpreting intersection as and, so sometimes I'll just write it as a comma. Mathematically this means the intersection, but intuitively it means what's the probability that both A and B occur, okay? So, so this says A and B are independent, and of course we want A and C to be independent, and of course we want B and C to be independent.

[13:52]And then the question is, is that enough?

[13:57]If we just had these equations, we'd say that's that's pairwise independent, that is, we have three events, and if we take any two of them they're independent, and the question is whether that's enough. And the answer is no, we also need one more equation, P of A, B, C, equals P of A, P of B, P of C.

[14:26]So, the kind of a slogan that sometimes people sometimes say is, independence means multiply, that's a little too vague, though, right? Because it depends what you're doing, but but if you want, if you have independent events and you and you want the probability of all of them happening, then you just multiply, okay? So, clearly we want this to be true. And there's a question of, you know, do do these pairwise ones imply this one? And and the answer is that it doesn't in general. Um, so so there there's some examples of of of things like this in the strategic practice and I would suggest that first, I I think it's just good practice with the definitions to to try to construct your own counter example. And if you get stuck, you can look at the solution in the strategic practice, too. And I might come back to this this later, but, but it would be better to think about it first and I want and I want to get to conditioning. Um, but in general, um, we can't get rid of any of these conditions. And you also can't say that this one will imply these ones, we just need pairwise independence and we need all three of them at once, okay?

[15:39]So, so that's just the definition. Similarly for for events A1 up to AN, so if we have N events, I'm not going to write down all the equations, because it's completely analogous. Once you understand the case with three of them, you could do N of them, okay? But we would say, well, first of all, any two of these have to be independent and any three of them have to be independent and any four of them, so no matter how many of them you take and you do something like this, it's just the prod.

[16:15]So, hopefully you all see the pattern here.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript