1.1 Exact Bayes Classifier
We would like to classify categorical output $(k_1,k_2,...,k_3)$ given some attributes$(x_1, x_2, ..., x_n)$
For example, we would like to predict the output is $k_1$ or $k_2$ given three attributes $A,B,C$
If $P(k_1|A, B, C)$ > $P(k_2|A, B, C)$
we would like to say A, B, C are more likely to belong to $k_1$; vice versa
Notation:
If A exists, A; if A does not exist, -A
If B exists, B; if B does not exist, -B
If C exists, C; if C does not exist, -C
Then, if we apply Bayes" Theorm,
$$P(k_1|A, B, C)$$
=$$frac{P(k_1)P(A,B,C|k_1)}{P(A,B,C)}$$
By applying total probability law,
$ Longrightarrow$
$$frac{P(k_1)P(A,B,C|k_1)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
However, to calculate $P(A,B,C|k_1)$ needs $2^i$ spaces, where i = 3 in this case,
to calculate $P(A,B,C|k_2)$ needs another $2^2$ spaces
The frequency table is like below:
Freuency | A, B, C | A, B, -C | A, -B, C | A, -B, -C | -A, B, C | -A, B, -C | -A, -B, C | -A, -B, C | |
---|---|---|---|---|---|---|---|---|---|
k1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
k2 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 |
Therefore, we introduce Naive Bayes Algorithm to reduce the storing space and computational speed.
# 1.2 Naive Bayes Classifier
We assume class conditional independence, so that
$P(A,B,C|k_1)$ is equal to $P(A|k_1)P(B|k_1)P(C|k_1)$
$P(A,B,C|k_2)$ is equal to $P(A|k_2)P(B|k_2)P(C|k_2)$
And now, we need only 2in records, where i is the number of attributes, and i being number of categorical output we will predict
Freuency | A | -A | B | -B | C | -C | |
---|---|---|---|---|---|---|---|
k1 | 1 | 2 | 3 | 4 | 5 | 6 | |
k2 | 7 | 6 | 5 | 4 | 3 | 2 |
Therefore, our problem
$$P(k_1|A, B, C)$$
= $$frac{P(k_1)P(A,B,C|k_1)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$Longrightarrow$
$$frac{P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]}{P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]+P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]} (i)$$
$$P(k_2|A, B, C)$$
= $$frac{P(k_2)P(A,B,C|k_2)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$Longrightarrow$
$$frac{P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]}{P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]+P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]} (ii)$$
We notice that (i),(ii) share the same numerator, we can focus only on the denominator
$$P(k_1|A, B, C)$$
= $$frac{P(k_1)P(A,B,C|k_1)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$propto$
$$P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]$$
$$P(k_2|A, B, C)$$
= $$frac{P(k_2)P(A,B,C|k_2)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$propto$
$$P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]$$
If $P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]$ > $P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]$, we say that A, B, C are more likely to belong to $k_1$; vice versa
1.3 why not P(C | A, B) = P(C | A) * P(C | B)From 1.2, we know that from Naive Bayes Algorithm, we assume class conditional independence, so that
$P(A,B | C)$ = $P( A | C) * P(B | C)$
buy why not diretly say that
$P(C | A, B)$ = $P(C | A) * P(C | B)$
This is because it happens only when P(C) = 0 or P(1), meaningless
$$P(C | A, B)=P(C | A) * P(C | B)$$
$Longrightarrow$
$$frac{P(C,(A,B))}{P(A,B)} = frac{P(C,A)}{P(A)}*frac{P(C,B)}{P(B)} $$
$Longrightarrow if B and C are class conditional independent$
$$frac{P(C)*P(A)*P(B)}{P(A)*P(B)} = frac{P(C)*P(A)}{P(A)}*frac{P(C)*P(B)}{P(B)} $$
$Longrightarrow$
$$P(C) = P(C)*P(C)$$
where only possible if P(C) = 0, or, 1
Therefore, we use Bayes therom to swap $P(C | A, B)$ to $P(A, B | C) $ before applying naive bayes algorithm
2. ExampleConsider the following 4 SMS messages:
message | Label | |
---|---|---|
I am not coming | ham | |
Good work | ham | |
Do you need viagra | spam | |
win an IMac | spam |
Let $p(spam)$ be the probability of a new SMS message being "spam"
Let $p(ham)$ be the probability of a new SMS message being "ham"
Therefore
$$p(spam)= 0.5$$
$$p(ham)=0.5$$
Each de-capitalised keyword are put into two rows(word row, and -word row):
we mark the number of ham massage that the certain keyword exists on the (word row, ham column);
we mark the number of ham massage that the certain keyword does not exist on the (-word row, ham column);
we mark the number of spam massage that the certain keyword exists on the (word row, spam column);
we mark the number of spam massage that the certain keyword does not exist on the (-word row, spam column);
We can construct a frequency table following:
Frequency | Ham | Spam |
---|---|---|
-am | 1 | 2 |
-an | 2 | 1 |
-coming | 1 | 2 |
-do | 2 | 1 |
-good | 1 | 2 |
-i | 1 | 2 |
-imac | 2 | 1 |
-need | 2 | 1 |
-not | 1 | 2 |
-viagra | 2 | 1 |
-win | 2 | 1 |
-work | 1 | 2 |
-you | 2 | 1 |
am | 1 | 0 |
an | 0 | 1 |
coming | 1 | 0 |
do | 0 | 1 |
good | 1 | 0 |
i | 1 | 0 |
imac | 0 | 1 |
need | 0 | 1 |
not | 1 | 0 |
viagra | 0 | 1 |
win | 0 | 1 |
work | 1 | 0 |
you | 0 | 1 |
Then, to record the likelihoods P(W|ham), P(-W|ham), P(W|spam) and P(-W|spam), we divide each entry in ham column by 2 (the total number of ham messages), and divide each entry in spam column by 2 (the total number of spam messages).
In addition, to prevent the likelihood of 1 and 0, we replace any likelihood smaller than 0.05 (larger than 0.95) with 0.05 (0.95) by using one of the variants of the Laplace estimator.
Therefore, we get the following likelihood table:
Probability of the row name given the column name | Ham | Spam |
---|---|---|
-am | 0.50 | 0.95 |
-an | 0.95 | 0.50 |
-coming | 0.50 | 0.95 |
-do | 0.95 | 0.50 |
-good | 0.50 | 0.95 |
-i | 0.50 | 0.95 |
-imac | 0.95 | 0.50 |
-need | 0.95 | 0.50 |
-not | 0.50 | 0.95 |
-viagra | 0.95 | 0.50 |
-win | 0.95 | 0.50 |
-work | 0.50 | 0.95 |
-you | 0.95 | 0.50 |
am | 0.50 | 0.05 |
an | 0.05 | 0.50 |
coming | 0.50 | 0.05 |
do | 0.05 | 0.50 |
good | 0.50 | 0.05 |
i | 0.50 | 0.05 |
imac | 0.05 | 0.50 |
need | 0.05 | 0.50 |
not | 0.50 | 0.05 |
viagra | 0.05 | 0.50 |
win | 0.05 | 0.50 |
work | 0.50 | 0.05 |
you | 0.05 | 0.50 |
If
$$P(ham | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$
is greater than
$$P(spam | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$
we say that the message "coming home" is more likely to be a ham message; vice versa.
According to Bayes’ Theorem,
$$P(ham | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$
$Bayes" Therom Longrightarrow$
$$frac{P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham)}{P( -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)}$$
$Total probability law Longrightarrow$
$$frac
{P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham)}
{P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham)
+P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)} $$
$naive bayes simplify Longrightarrow$
$$ frac{P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]}{P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]+P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam) P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]}$$ = $$frac {0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)}{0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)+0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)}$$ = $$frac{0.00545576012}{0.00545576012+0.00015112908}=0.97304582369$$ Alternatively, we can focus only on the propensities which are proportional to posterior probability. $$P(ham | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$ $propto$ $$P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]$$ = $$ 0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)$$ = $$0.00545576012$$ Similarly, According to Bayes’ Theorem, $$P(spam | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$ $Bayes" Therom Longrightarrow$ $$frac{P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)}{P( -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)}$$ $Total probability law Longrightarrow$ $$frac {P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)} {P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham) +P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)} $$ $naive bayes simplify Longrightarrow$ $$frac{P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam) P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]}{P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]+P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam) P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]}$$ = $$frac{0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)}{0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)+0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)} $$
=
$$frac{0.00015112908}{0.00545576012+0.00015112908}=0.0269541763$$
Alternatively, we can focus only on the propensities which are proportional to posterior probability.
$$P(spam | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$
$propto$
$$P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam)
P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]$$
=
$$ 0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)$$
=
$$0.00015112908$$
Since probability 0.00545576012 > 0.00015112908, or propensity 0.97304582369 > 0.0269541763, we conclude that the message "coming home" is more likely to be a ham message.
2.3.2 For message"Get Viagra now":$$P(ham | -i , -am , -not , -coming , -good , -work , -do , -you , -need , viagra , -win , -an , -imac)$$
$propto$
$$P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(-coming|ham)P(-good|ham)P(-work|ham)
P(-do|ham)P(-you|ham)P(-need|ham)P(viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]$$
=
$$0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.05*0.95*0.95*0.95)=0.00028714526$$
$$P(spam | -i , -am , -not , -coming , -good , -work , -do , -you , -need , viagra , -win , -an , -imac) $$
$propto$
$$P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(-coming|spam)P(-good|spam)P(-work|spam)
P(-do|spam)P(-you|spam)P(-need|spam)P(viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)] $$
=
$$0.5*(0.95*0.95*0.95*0.95*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)=0.00287145269$$
To calculate the probability of we divide 0.00028714526 and 0.00287145269 by (0.00028714526+0.00287145269), respectively, and the probability is 0.09090908831 and 0.90909091168, which again suggested "Get Viagra now" is more likely to be a spam message.
Since the propensity 0.00028714526 < 0.00287145269, or the probability 0.09090908831< 0.90909091168, we believe that the message "Get Viagra now" is more likely to be a spam message.
文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/41029.html
摘要:在决策理论中,贝叶斯推断与主观概率密切相关,通常被称为贝叶斯概率。特征分布的假设被称为朴素贝叶斯分类器的事件模型。多项式朴素贝叶斯对于一个多项分布事件模型,样本表示了一个特定事件出现的频率,由多项分布产生,其中,表示事件发生的频率。 Code: https://github.com/tmac1997/u... Naive Bayes Bayes theorem(贝叶斯法则) 在概率论和...
摘要:本篇内容为机器学习实战第章基于概率论的分类方法朴素贝叶斯程序清单。朴素贝叶斯优点在数据较少的情况下仍然有效,可以处理多类别问题。参考链接机器学习实战笔记之四基于概率论的分类方法朴素贝叶斯不足之处,欢迎指正。 本篇内容为《机器学习实战》第 4 章 基于概率论的分类方法:朴素贝叶斯程序清单。所用代码为 python3。 朴素贝叶斯优点:在数据较少的情况下仍然有效,可以处理多类别问题。 ...
阅读 2222·2021-11-22 09:34
阅读 1334·2021-10-11 10:59
阅读 4425·2021-09-22 15:56
阅读 3269·2021-09-22 15:08
阅读 3400·2019-08-30 14:01
阅读 772·2019-08-30 11:16
阅读 1128·2019-08-26 13:51
阅读 2905·2019-08-26 13:43