特征选择过滤器 -chi2（卡方统计量）

文章目录

- - 函数
  - 参数说明

函数

sklearn.feature_selection.chi2（X，y ）

计算每个非负要素与类之间的卡方统计量。

该分数可用于从X中选择测试卡方统计量具有最高值的n_features特征，该特征必须仅包含非负特征，例如布尔值或频率（例如，文档分类中的术语计数），相对于类。

回想一下，卡方检验可测量随机变量之间的相关性，因此使用此功能可以“淘汰”最有可能与类别无关，因此与分类无关的特征。

参数说明

Parameters
----------
X：{array-like, sparse matrix} of shape (n_samples, n_features)
   Sample vectors.
   特征矩阵。

y：array-like of shape (n_samples,)
   Target vector (class labels).
   目标向量（类标签）。

Returns
-------
chi2：array, shape = (n_features,)
	  每个特征的chi2统计信息。

pval：array, shape = (n_features,)
	  每个特征的p值。

笔记

该算法的复杂度为O（n_classes * n_features）。

原文链接：https://blog.csdn.net/weixin_46072771/article/details/106188581