Section I: Brief Introduction on LDA
Linear Discriminat Analysis (LDA) can be used as a technique for feature extraction to increase the computational efficiency and reduce the degree of overfitting due to the curse of dimensionality in non-regularized model. The general concept behind LDA is very similar to PCA. Whereas PCA attempts to find the orthogonal component axes of maximum variance in a dataset, the goal in LDA is to find the feature subspace that optimizes class separability. In contrast with PCA, LDA is a supervised algorithm.
Personal Views
LDA是一种监督学习算法,其在于利用类别标签来计算类间距离和类内部距离矩阵后,以类似于PCA算法计算两者矩阵整体的特征值和特征向量。因此,从某种角度,LDA和PCA是略微相似的。
FROM
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.
Section II: Code Bundle
代码:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from PCA.visualize import plot_decision_regions
#Section 1: Prepare data
plt.rcParams['figure.dpi']=200
plt.rcParams['savefig.dpi']=200
font = {'family': 'Times New Roman',
'weight': 'light'}
plt.rc("font", **font)
#Section 2: Load data and split it into train/test dataset
wine=datasets.load_wine()
X,y=wine.data,wine.target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)
st=StandardScaler()
X_train_std=st.fit_transform(X_train)
X_test_std=st.transform(X_test)
#Section 3: Use LDA to feature reduction
lda=LDA(n_components=2)
X_train_lda=lda.fit_transform(X_train_std,y_train)
print("Eigenvalue Ratios in Descending Order",lda.explained_variance_ratio_)
lr=LogisticRegression()
lr.fit(X_train_lda,y_train)
plot_decision_regions(X_train_lda,y_train,classifier=lr)
plt.xlabel('LD 1')
plt.ylabel('LD 2')
plt.title("LDA - Test Dataset")
plt.legend(loc='upper left')
plt.savefig('./fig1.png')
plt.show()
结果:

Eigenvalue Ratios in Descending Order [0.68259828 0.31740172]
由上运行结果可以得知,Linear Discriminant成分至多为类别数量与1的差值。这一点,可以通过设置LDA的n_component=特征原始数量,来分析。有趣的是,对于数据集的特征为13,设置n_component=13,运行后特征值仍为2个,即酒类别为3。
Eigenvalue Ratios in Descending Order [0.68259828 0.31740172]
参考文献:
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.