CS224d: Deep Learning for NLP Lecture1 概率复习(1)

本文主要是介绍CS224d: Deep Learning for NLP Lecture1 概率复习(1)，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

因为平时考试，我的报告分数特别低，主要是因为我的英文写作能力特别差，为了练习英文写作，部分博客(比较简单的内容)将用英文写作，望大家闲来无聊时，看看我的博客，指正错误。

Abstract

The main content is the review of cumulative distribution functions(CDFs), probability mass functions(PMFs) and probability density functions(PDFs).

Axioms of Probability

$P(A) \geq 0,$ for all $A \in \mathcal{F},$ $\mathcal{F}$ is a set of events
$P(\Omega) = 1,$ $\Omega$ represents a sample space
If $A_1, A_2, ...$ are disjoint events, then $P(\cup_iA_i) = \sum_iP(A_i)$

Conditional Probability and Independence

The conditional probability of any event A given B is defined as,

P (A | B) = P ( A \cap B ) P ( B )

$P(A|B)=\frac{P(A\cap B)}{P(B)}$

P(A|B) $P(A|B)$ means the probability measure of A when the event B occurs. We can see the event B as a universal set. Two events are called independent if and only if

P(A∩B)=P(A)P(B) $P(A\cap B)=P(A)P(B)$ or

P(A|B)=P(A) $P(A|B)=P(A)$ . Therefore, independence is equivalent to saying that observing B does not have any effect on the probability of A.

Random Variables

We denote random variables using upper case letters $X(\omega)$ or simply $X$ . We denote the value that a random variable may take on using lower case letter $x$ . Random variables are divided into discrete random variable and continuous random variable. When $X$ can take on only a finite number of values, it is known as a discrete random variable. The probability of the set associated with a random variable $X$ taking on some specific value $k$ is:

P (X = k) = P ({ω : X (ω) = k})

$P(X=k)=P(\{\omega : X(\omega)=k\})$ When

X $X$ can take on a infinite number of possible values, it is called a continuous random variable. The probability that

X $X$ takes on a value between two real constants

a $a$ and

b $b$ is:

P (a \leq X \leq b) = P ({ω : a \leq X (ω) \leq b})

$P(a\leq X\leq b)=P(\{\omega : a \leq X(\omega) \leq b\})$

Cumulative Distribution Functions

A cumulative distribution function is a function $F_X:\mathbb{R}\rightarrow [0,1]$ which specifies a probability measure as,

F X (x) = P (X \leq x)

$F_X(x)=P(X\leq x)$ Properties:
-

0≤FX(x)≤1 $0 \leq F_X(x) \leq 1$
-

limx→−∞FX(x)=0 $lim_{x \to-\infty}F_X(x)=0$
-

limx→∞FX(x)=1 $lim_{x \to\infty}F_X(x)=1$
-

x≤y⇒FX(x)≤FX(y) $x \leq y \Rightarrow F_X(x) \leq F_X(y)$

Probability Mass Functions

When $X$ is a discrete random variable, a simpler way to represent the probability of a random variable is to directly specify the probability of each value that the random variable can assume. A probability mass function is a function $p_X:\Omega \rightarrow \mathbb{R}$ such that: