Boosting(提升)
集成学习的一种方法。相比于bagging的并行式,boosting是序列式或者串行的方式,各个基分类器间有依赖关系。
类似于我们人类的学习方式。在学习一些知识的过程中,通过某种方式比如月考期中期末考试,有些知识点我们确认已经较为掌握,可有的知识点我们通过测验发现做错了,自己掌握的并不好,因此,会着重去练习犯错的知识点,以期降低错误率。
对于原始训练集,第一次我们训练了一个弱学习器1,统计了错误的情况,即这次训练的效果。根据这次的效果,给予犯错误的样本更高的权重,得到第二次训练的数据集的权值分布,用来训练弱学习器2。通过不断的迭代,我们得到了多个弱学习器,将它们的预测结果加权融合,就得到了最终的强学习器。
那么,按照以上思路,有几个关键问题:1.每一轮遍历如何改变数据的权值或概率分布?2.如何加权弱分类器的预测结果得到强分类器?
1 AdaBoost算法
1.1 开始
原始训练集T = T=T={( x 1 , y 1 ) , ( x 2 , y 2 ) , ( x 3 , y 3 ) . . . ( x n , y n ) {(x_1,y_1),(x_2,y_2),(x_3,y_3)...(x_n,y_n)}(x1,y1),(x2,y2),(x3,y3)...(xn,yn)},y i ∈ Y = y_i\in Y=yi∈Y={-1,+1}
选定基分类器G
进行循环M次(也可以设置阈值),即m = 1 , 2 , 3... M m=1,2,3...Mm=1,2,3...M
D m = ( w m , 1 , w m , 2 , w m , 3 . . . , w m , n ) D_m=(w_{m,1},w_{m,2},w_{m,3}...,w_{m,n})Dm=(wm,1,wm,2,wm,3...,wm,n)
初始化:第一次训练时权值一样,即w 1 i = 1 n , i = 1 , 2 , 3... n w_{1i}=\frac{1}{n},i=1,2,3...nw1i=n1,i=1,2,3...n
D 1 = ( w 11 , w 12 , w 13 . . . w 1 n ) D_1=(w_{11},w_{12},w_{13}...w_{1n})D1=(w11,w12,w13...w1n)
1.2 中间
以进行到第m次为例:
使用m-1次训练完更新后的权值分布进行训练,得到基分类器G m G_mGm:
选取让误差率最低的阈值来设计基分类器
G m ( x ) : χ → { − 1 , + 1 } G_m(x):\chi \to \{-1,+1\}Gm(x):χ→{−1,+1}
G m ( x ) G_m(x)Gm(x)在训练集上的分类误差率e m e_mem:
即本轮预测错误的样本数据的权重和;
I ( G m ( x i ) ≠ y i ) I(G_m(x_i)\not=y_i)I(Gm(xi)=yi)为判别函数,表示G m ( x i ) ≠ y i G_m(x_i)\not=y_iGm(xi)=yi成立则I II为1,反之为0;
e m < 0.5 e_m<0.5em<0.5;
e m = P ( G m ( x i ) ≠ y i ) = ∑ i = 1 n w m i I ( G m ( x i ) ≠ y i ) = ∑ G m ( x i ) ≠ y i w m i e_m=P(G_m(x_i)\not=y_i)\\ =\sum^n_{i=1}w_{mi}I(G_m(x_i)\not=y_i)\\ = \sum_{G_m(x_i)\not=y_i}w_{mi}em=P(Gm(xi)=yi)=i=1∑nwmiI(Gm(xi)=yi)=Gm(xi)=yi∑wmi
损失函数:
为了求解后面两个参数α m \alpha_mαm、w m + 1 , i w_{m+1,i}wm+1,i,需要定义adaboost的损失函数同时最小化损失函数。对于不同的损失函数,两个参数的求法可能不同。
这里选择指数损失函数
对于任意样本点( x i , y i ) (x_i,y_i)(xi,yi),预测值为f ( x i ) f(x_i)f(xi):
L ( y i , f ( x i ) ) = e − y i f ( x i ) L(y_i,f(x_i))=e^{-y_if(x_i)}L(yi,f(xi))=e−yif(xi)
对于样本集合:
L ( y i , f ( x i ) ) = ∑ i = 1 N e − y i f ( x i ) L(y_i,f(x_i))=\sum_{i=1}^Ne^{-y_if(x_i)}L(yi,f(xi))=i=1∑Ne−yif(xi)
期望表示:
y是样本的实际类别,f ( x ) f(x)f(x)是预测的类别,x的权重服从D分布,E代表期望
L ( y i , f ( x i ) ) = E x D [ e − y f ( x ) ] L(y_i,f(x_i))=E_{x\text~D}[e^{-yf(x)}]L(yi,f(xi))=Ex D[e−yf(x)]
证明合理性:
证明f ( x ) f(x)f(x)可以使损失函数最小化,则可以让上式对f ( x ) f(x)f(x)求偏导等于0
L ( y , f ( x ) ) = e − y f ( x ) P ( y = 1 ∣ x ) + e − y f ( x ) P ( y = − 1 ∣ x ) L(y,f(x))=e^{-yf(x)}P(y=1|x) +e^{-yf(x)}P(y=-1|x)L(y,f(x))=e−yf(x)P(y=1∣x)+e−yf(x)P(y=−1∣x)
∂ L ( y , f ( x ) ) ∂ f ( x ) = − e − f ( x ) P ( y = 1 ∣ x ) + e f ( x ) P ( y = 1 ∣ x ) = 0 \frac{\partial L(y,f(x))}{\partial f(x)}=-e^{-f(x)}P(y=1|x) +e^{f(x)}P(y=1|x)=0∂f(x)∂L(y,f(x))=−e−f(x)P(y=1∣x)+ef(x)P(y=1∣x)=0
P ( y = 1 ∣ x ) = e 2 f ( x ) P ( y = − 1 ∣ x ) P(y=1|x)=e^{2f(x)}P(y=-1|x)P(y=1∣x)=e2f(x)P(y=−1∣x)
f ( x ) = 1 2 ln P ( y = − 1 ∣ x ) P ( y = 1 ∣ x ) f(x)=\frac{1}{2}\ln\frac{P(y=-1|x)}{P(y=1|x)}\\f(x)=21lnP(y=1∣x)P(y=−1∣x)
s i g n ( f ( x ) ) = s i g n ( 1 2 ln P ( y = − 1 ∣ x ) P ( y = 1 ∣ x ) ) sign(f(x))=sign(\frac{1}{2}\ln\frac{P(y=-1|x)}{P(y=1|x)})sign(f(x))=sign(21lnP(y=1∣x)P(y=−1∣x))
当P ( y = 1 ∣ x ) P(y=1|x)P(y=1∣x)>P ( y = − 1 ∣ x ) P(y=-1|x)P(y=−1∣x)时,s i g n ( f ( x ) ) sign(f(x))sign(f(x))=1;反之为-1。这种结果正好符合我们想要的分类规则,同时也符合最小化时分类错误率也最小。
加法模型(additive model):
b ( x ; γ m ) b(x;\gamma_m)b(x;γm)为基函数,γ m \gamma_mγm是基函数的参数,β m \beta_mβm是基函数的系数
按照前面说的思路不难看出,我们最终要得到的强学习器就是一个加法模型
f ( x ) = ∑ m = 1 M β m b ( x ; γ m ) f(x)=\sum^M_{m=1}\beta_mb(x;\gamma_m)f(x)=m=1∑Mβmb(x;γm)
最小化损失函数:
min β m , γ m ∑ i = 1 N L ( y i , ∑ m = 1 M β m b ( x i ; γ m ) ) \min_{\beta_m,\gamma_m}\sum_{i=1}^NL\left(y_i,\sum_{m=1}^M\beta_mb(x_i;\gamma_m) \right)βm,γmmini=1∑NL(yi,m=1∑Mβmb(xi;γm))
前向分步算法(Forward stagewise algorithm):
由于最小化损失函数很复杂,同时adaboost学习的模型是基分类器组成的加法模型,所以可以使用该算法。
思路是从前向后,一步一步学习,每次只学一个基函数和他的系数,通过加法模型,逐渐逼近优化目标函数式即我们要的最终学习器。
每次需要最小化的损失函数:
( β m , γ m ) = a r g min β , γ ∑ i = 1 N L ( y i , f m − 1 ( x i ) + β b ( x i ; γ ) ) (\beta_m,\gamma_m)=arg\min_{\beta,\gamma}\sum_{i=1}^NL\left(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma) \right)(βm,γm)=argβ,γmini=1∑NL(yi,fm−1(xi)+βb(xi;γ))
在adaboost里运用该算法:
w m i = e − y i f m − 1 ( x i ) w_{mi}=e^{-y_if_{m-1}(x_i)}wmi=e−yifm−1(xi),与最小化无关,但依赖于f m − 1 ( x ) f_{m-1}(x)fm−1(x)
( α m , G m ( x ) ) = a r g min α , G ∑ i = 1 N e − y i ( f m − 1 ( x i ) + α G ( x i ) ) = a r g min α , G ∑ i = 1 N w m i e − y i α G ( x i ) (\alpha_m,G_m(x))=arg\min_{\alpha,G}\sum_{i=1}^Ne^{-y_i(f_{m-1}(x_i)+\alpha G(x_i) )}\\=arg\min_{\alpha,G}\sum_{i=1}^N w_{mi}e^{-y_i\alpha G(x_i)}(αm,Gm(x))=argα,Gmini=1∑Ne−yi(fm−1(xi)+αG(xi))=argα,Gmini=1∑Nwmie−yiαG(xi)
G m ( x ) G_m(x)Gm(x)的系数α m \alpha_mαm:
即G m ( x ) G_m(x)Gm(x)在最终分类器中的权重,分类误差越小的基分类器在最终分类器中的作用应该越大,即权重越大。
a r g min α ∑ i = 1 N w m i e − y i α G ( x i ) = ∑ y i = G m ( x i ) w m i e − α + ∑ y i ≠ G m ( x i ) w m i e α = ( e α − e − α ) ∑ i = 1 N w m i I ( y i ≠ G ( x i ) ) + e − α ∑ i = 1 N w m i arg\min_{\alpha}\sum_{i=1}^N w_{mi}e^{-y_i\alpha G(x_i)}=\sum_{y_i=G_m(x_i)} w_{mi}e^{-\alpha}+\sum_{y_i\not=G_m(x_i)} w_{mi}e^{\alpha}\\ =(e^{\alpha}-e^{-\alpha})\sum_{i=1}^N w_{mi}I(y_i\not=G(x_i))+e^{-\alpha}\sum_{i=1}^N w_{mi}\\argαmini=1∑Nwmie−yiαG(xi)=yi=Gm(xi)∑wmie−α+yi=Gm(xi)∑wmieα=(eα−e−α)i=1∑NwmiI(yi=G(xi))+e−αi=1∑Nwmi
对α m \alpha_mαm求偏导:
∂ L ∂ α m = ( e α m + e − α m ) ∑ i = 1 N w m i I ( y i ≠ G ( x i ) ) − e − α m ∑ i = 1 N w m i = 0 \frac{\partial L}{\partial \alpha_m}=(e^{\alpha_m}+e^{-\alpha_m})\sum_{i=1}^N w_{mi}I(y_i\not=G(x_i))-e^{-\alpha_m}\sum_{i=1}^N w_{mi}=0∂αm∂L=(eαm+e−αm)i=1∑NwmiI(yi=G(xi))−e−αmi=1∑Nwmi=0
e α m ∑ i = 1 N w m i I ( y i ≠ G ( x i ) ) = e − α m ∑ i = 1 N w m i − e − α m ∑ i = 1 N w m i I ( y i ≠ G ( x i ) ) e^{\alpha_m}\sum_{i=1}^Nw_{mi}I(y_i\not=G(x_i))=e^{-\alpha_m}\sum_{i=1}^N w_{mi}-e^{-\alpha_m}\sum_{i=1}^Nw_{mi}I(y_i\not=G(x_i))eαmi=1∑NwmiI(yi=G(xi))=e−αmi=1∑Nwmi−e−αmi=1∑NwmiI(yi=G(xi))
e α m ∗ e m = e − α m ∗ ( 1 − e m ) e^{\alpha_m}*e_m=e^{-\alpha_m}*(1-e_m)eαm∗em=e−αm∗(1−em)
两边取对数,进一步整理:
α m = 1 2 ln 1 − e m e m \alpha_m =\frac{1}{2}\ln\frac{1-e_m}{e_m}αm=21lnem1−em
更新样本权重D m + 1 = ( w m + 1 , 1 , w m + 1 , 2 , w m + 1 , 3 . . . , w m + 1 , n ) D_{m+1}=(w_{m+1,1},w_{m+1,2},w_{m+1,3}...,w_{m+1,n})Dm+1=(wm+1,1,wm+1,2,wm+1,3...,wm+1,n):
本轮分配错误的话即y i ≠ G m ( x i ) y_i\not=G_m(x_i)yi=Gm(xi)则y i G m ( x i ) = − 1 y_iG_m(x_i)=-1yiGm(xi)=−1;反之为1。
对于e x e^{x}ex有,当x>0时,e x > 1 e^{x}>1ex>1;x<0时,e x < 1 e^{x}<1ex<1
w ˉ m + 1 , i = e − y i f m ( x i ) = e − y i ( f m − 1 ( x i ) + α m G m ( x i ) ) = w ˉ m , i e − y i α m G m ( x i ) \bar w_{m+1,i}=e^{-y_if_m(x_i)} \\ =e^{-y_i(f_{m-1}(x_i)+\alpha_m G_m(x_i) )}\\ =\bar w_{m,i}e^{-y_i\alpha_mG_m(x_i)}wˉm+1,i=e−yifm(xi)=e−yi(fm−1(xi)+αmGm(xi))=wˉm,ie−yiαmGm(xi)
权重归一化:
规 范 化 因 子 : Z m = ∑ i = 1 n w m i e − y i α m G m ( x i ) w m + 1 , i = w m , i e − y i α m G m ( x i ) Z m , i = 1 , 2 , 3 , . . . , n 规范化因子:Z_m=\sum^n_{i=1}w_{mi}e^{-y_i\alpha_mG_m(x_i)}\\ w_{m+1,i}=\frac{w_{m,i}e^{-y_i\alpha_mG_m(x_i)}}{Z_m} ,i=1,2,3,...,n规范化因子:Zm=i=1∑nwmie−yiαmGm(xi)wm+1,i=Zmwm,ie−yiαmGm(xi),i=1,2,3,...,n
1.3 结束
迭代完成后,组合分类器:
前进分步算法每次更新的加法模型f m ( x ) = f m − 1 ( x ) + β m b ( x ; γ m ) f_m(x)=f_{m-1}(x)+\beta_mb(x;\gamma_m)fm(x)=fm−1(x)+βmb(x;γm)
f ( x ) = f M ( x ) = ∑ m = 1 M β m b ( x ; γ m ) = ∑ m = 1 M α m G m ( x ) f(x)=f_M(x)=\sum^M_{m=1}\beta_mb(x;\gamma_m)\\=\sum^M_{m=1}\alpha_mG_m(x)f(x)=fM(x)=m=1∑Mβmb(x;γm)=m=1∑MαmGm(x)
最终分类器:
为了分类只有-1和+1
G ( x ) = s i g n ( f ( x ) ) G(x)=sign(f(x))G(x)=sign(f(x))
2 代码
# 以李航老师的《统计学习方法》(第二版)的p158例子为例
import numpy as np
x = [0,1,2,3,4,5,6,7,8,9]
y = [1,1,1,-1,-1,-1,1,1,1,-1]
def adaboost(M,x):
m = 1
D = [1/len(x)]*len(x) # 初始化
f = [0]*len(x)
while m <= M: # 迭代次数
n = 0
G = []
Z = 0
em_min = 1 # 初始最低误差率
v_best = 0 # 初始最佳划分点
mark = 0
# 确定最佳划分点
for v in np.linspace(0.5,9.5,10):
em1 = 0 # 初始误差率
em2 = 0
for i in range(0,len(x)):
if (x[i] < v and y[i] != 1) or (x[i] > v and y[i] != -1): # 误差率等于分类错误的对应权重和
em1 += D[i]
for j in range(0,len(x)):
if (x[j] > v and y[j] != 1) or (x[j] < v and y[j] != -1): # 误差率等于分类错误的对应权重和
em2 += D[i]
em = min(em1, em2)
if em_min > em: # 记录最低误差率和划分点
em_min = em
v_best = v
if em1>em2:
mark = 1
else:
mark = 0
# 求系数a
a = 0.5*np.log((1-em_min)/em_min)
# 求基本分类器
for i in range(0,len(x)) :
if mark == 0:
if x[i] < v_best:
G.append(1)
else:
G.append(-1)
else:
if x[i] < v_best:
G.append(-1)
else:
G.append(1)
# 求规范化系数Z
for i in range(0,len(x)):
Z += D[i]*np.e**(-y[i]*a*G[i])
# 更新权重
for i in range(0,len(x)):
D[i] = D[i]*np.e**(-y[i]*a*G[i])/Z
# 求分类器
for i in range(0,len(G)):
f[i] = f[i] + a*G[i]
if f[i] < 0 and y[i] != -1 :
n+=1
elif f[i] > 0 and y[i] != 1 :
n+=1
# 记录
with open('record.txt','a+',encoding='utf-8') as fp:
fp.write(f'第{m}次:'+'em:'+f'{em_min}'+'\n'+'v_best:'+f'{v_best}'+'\n'+'a:'+f'{a}'+'\n'+'G:'+str(G)+'\n'+'Z:'+str(Z)+'\n'+'D:'+str(D)+'\n'+'f:'+str(f)+'\n'+'误分类点个数:'+str(n)+'\n')
fp.close()
m+=1
# 最终分类器
for i in range(0,len(f)):
if f[i] < 0 :
f[i]= -1
else:
f[i]= 1
return f
adaboost(3,x)
# 结果
# [1, 1, 1, -1, -1, -1, 1, 1, 1, -1]
第1次:
em:0.30000000000000004
v_best:2.5
a:0.4236489301936017
G:[1, 1, 1, -1, -1, -1, -1, -1, -1, -1]
Z:0.9165151389911682
D:[0.07142857142857142, 0.07142857142857142, 0.07142857142857142, 0.07142857142857142, 0.07142857142857142, 0.07142857142857142, 0.16666666666666663, 0.16666666666666663, 0.16666666666666663, 0.07142857142857142]
f:[0.4236489301936017, 0.4236489301936017, 0.4236489301936017, -0.4236489301936017, -0.4236489301936017, -0.4236489301936017, -0.4236489301936017, -0.4236489301936017, -0.4236489301936017, -0.4236489301936017]
误分类点个数:3
第2次:
em:0.21428571428571427
v_best:8.5
a:0.6496414920651304
G:[1, 1, 1, 1, 1, 1, 1, 1, 1, -1]
Z:0.8206518066482896
D:[0.04545454545454546, 0.04545454545454546, 0.04545454545454546, 0.16666666666666669, 0.16666666666666669, 0.16666666666666669, 0.10606060606060606, 0.10606060606060606, 0.10606060606060606, 0.04545454545454546]
f:[1.073290422258732, 1.073290422258732, 1.073290422258732, 0.22599256187152872, 0.22599256187152872, 0.22599256187152872, 0.22599256187152872, 0.22599256187152872, 0.22599256187152872, -1.073290422258732]
误分类点个数:3
第3次:
em:0.18181818181818185
v_best:5.5
a:0.752038698388137
G:[-1, -1, -1, -1, -1, -1, 1, 1, 1, 1]
Z:0.7713892158398703
D:[0.12499999999999996, 0.12499999999999996, 0.12499999999999996, 0.10185185185185185, 0.10185185185185185, 0.10185185185185185, 0.0648148148148148, 0.0648148148148148, 0.0648148148148148, 0.12499999999999996]
f:[0.3212517238705951, 0.3212517238705951, 0.3212517238705951, -0.5260461365166083, -0.5260461365166083, -0.5260461365166083, 0.9780312602596657, 0.9780312602596657, 0.9780312602596657, -0.3212517238705951]
误分类点个数:0
# sklearn
from sklearn.ensemble import AdaBoostClassifier
# 参数请参考官方文档