python 机器学习_机器学习相关的python库介绍

微信公众号：yale记
关注可了解更多的教程问题或建议，请公众号留言。

背景介绍

顾名思义，机器学习是计算机编程的科学，通过它可以从不同类型的数据中学习。Arthur Samuel给出的更一般的定义是 - “机器学习是一个研究领域，它使计算机无需明确编程即可学习。”它们通常用于解决各种类型的生活问题。

在过去，人们习惯于通过手动编码所有算法和数学和统计公式来执行机器学习任务。这使得该过程耗时，乏味且低效。但是在现代，与过去各种python库，框架和模块相比，它变得非常简单和高效。今天，Python是这项任务中最流行的编程语言之一，它已经取代了业界的许多语言，其中一个原因是它拥有大量的库。机器学习中使用的Python库是：

Numpy
Scipy
Scikit-learn
Theano
TensorFlow
Keras
PyTorch
Pandas
Matplotlib

Numpy

NumPy是一个非常流行的python库，用于大型多维数组和矩阵处理，借助大量高级数学函数。它对机器学习中的基础科学计算非常有用。它对线性代数，傅立叶变换和随机数能力特别有用。像TensorFlow这样的高端库在内部使用NumPy来操纵Tensors。

# Python program using NumPy

# for some basic mathematical

# operations

import numpy as np

# Creating two arrays of rank 2

x = np.array([[1, 2], [3, 4]])

y = np.array([[5, 6], [7, 8]])

# Creating two arrays of rank 1

v = np.array([9, 10])

w = np.array([11, 12])

# Inner product of vectors

print(np.dot(v, w), "")

# Matrix and Vector product

print(np.dot(x, v), "")

# Matrix and matrix product

print(np.dot(x, y))

Output:

219[29 67][[19 22] [43 50]]

更多关于Numpy请访问Numpy官网https://numpy.org/

SciPy

SciPy是机器学习爱好者中非常受欢迎的库，因为它包含用于优化，线性代数，集成和统计的不同模块。SciPy库和SciPy堆栈之间存在差异。SciPy是构成SciPy堆栈的核心软件包之一。SciPy对图像处理也非常有用。

# Python script using Scipy

# for image manipulation

from scipy.misc import imread, imsave, imresize

# Read a JPEG image into a numpy array

img = imread('D:/Programs / cat.jpg') # path of the image

print(img.dtype, img.shape)

# Tinting the image

img_tint = img * [1, 0.45, 0.3]

# Saving the tinted image

imsave('D:/Programs / cat_tinted.jpg', img_tint)

# Resizing the tinted image to be 300 x 300 pixels

img_tint_resize = imresize(img_tint, (300, 300))

# Saving the resized tinted image

imsave('D:/Programs / cat_tinted_resized.jpg', img_tint_resize)

Original image:

Tinted image:

Resized tinted image:

更多关于SciPy请访问官网https://www.scipy.org/

Scikit-learn

Skikit-learn是经典ML算法中最受欢迎的ML库之一。它建立在两个基本的Python库之上，即NumPy和SciPy。Scikit-learn支持大多数有监督和无监督的学习算法。Scikit-learn也可以用于数据挖掘和数据分析，这使它成为一个开始使用ML的好工具。

# Python script using Scikit-learn

# for Decision Tree Clasifier

# Sample Decision Tree Classifier

from sklearn import datasets

from sklearn import metrics

from sklearn.tree import DecisionTreeClassifier

# load the iris datasets

dataset = datasets.load_iris()

# fit a CART model to the data

model = DecisionTreeClassifier()

model.fit(dataset.data, dataset.target)

print(model)

# make predictions

expected = dataset.target

predicted = model.predict(dataset.data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

Output:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best') precision recall f1-score support 0 1.00 1.00 1.00 50 1 1.00 1.00 1.00 50 2 1.00 1.00 1.00 50 micro avg 1.00 1.00 1.00 150 macro avg 1.00 1.00 1.00 150weighted avg 1.00 1.00 1.00 150[[50 0 0] [ 0 50 0] [ 0 0 50]]

更多关于Scikit-learn请访问官网https://scikit-learn.org/

Theano

我们都知道机器学习基本上是数学和统计学。Theano是一个流行的python库，用于以有效的方式定义，评估和优化涉及多维数组的数学表达式。它是通过优化CPU和GPU的利用率来实现的。它广泛用于单元测试和自我验证，以检测和诊断不同类型的错误。Theano是一个非常强大的库，已经在大规模计算密集型科学项目中使用了很长时间，但是简单易用，足以供个人用于他们自己的项目。

# Python program using Theano

# for computing a Logistic

# Function

import theano

import theano.tensor as T

x = T.dmatrix('x')

s = 1 / (1 + T.exp(-x))

logistic = theano.function([x], s)

logistic([[0, 1], [-1, -2]])

Output:

array([[0.5, 0.73105858], [0.26894142, 0.11920292]])

更多关于Theano请访问http://deeplearning.net/software/theano/

TensorFlow

TensorFlow是一款非常受欢迎的开源库，用于Google脑力团队在谷歌开发的高性能数值计算。顾名思义，Tensorflow是一个涉及定义和运行涉及张量的计算的框架。它可以训练和运行可用于开发多个AI应用程序的深度神经网络。TensorFlow广泛应用于深度学习研究和应用领域。

# Python program using TensorFlow

# for multiplying two arrays

# import `tensorflow`

import tensorflow as tf

# Initialize two constants

x1 = tf.constant([1, 2, 3, 4])

x2 = tf.constant([5, 6, 7, 8])

# Multiply

result = tf.multiply(x1, x2)

# Initialize the Session

sess = tf.Session()

# Print the result

print(sess.run(result))

# Close the session

sess.close()

Output:

[ 5 12 21 32]

更多关于TensorFlow请访问官网https://www.tensorflow.org/

Keras

Keras是一个非常流行的Python机器学习库。它是一个高级神经网络API，能够在TensorFlow，CNTK或Theano之上运行。它可以在CPU和GPU上无缝运行。Keras让ML初学者真正构建和设计神经网络。Keras最棒的一点就是它可以轻松快速地进行原型设计。

官网地址：https://keras.io/

PyTorch

PyTorch是一个流行的基于Torch的Python开源机器学习库，它是一个开源的机器学习库，在C中用Lua中的包装器实现。它拥有广泛的工具和库选择，支持计算机视觉，自然语言处理(NLP)和更多ML程序。它允许开发人员使用GPU加速在Tensors上执行计算，还有助于创建计算图。

# Python program using PyTorch

# for defining tensors fit a

# two-layer network to random

# data and calculating the loss

import torch

dtype = torch.float

device = torch.device("cpu")

# device = torch.device("cuda:0") Uncomment this to run on GPU

# N is batch size; D_in is input dimension;

# H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data

x = torch.randn(N, D_in, device = device, dtype = dtype)

y = torch.randn(N, D_out, device = device, dtype = dtype)

# Randomly initialize weights

w1 = torch.randn(D_in, H, device = device, dtype = dtype)

w2 = torch.randn(H, D_out, device = device, dtype = dtype)

learning_rate = 1e-6

for t in range(500):

# Forward pass: compute predicted y

h = x.mm(w1)

h_relu = h.clamp(min = 0)

y_pred = h_relu.mm(w2)

# Compute and print loss

loss = (y_pred - y).pow(2).sum().item()

print(t, loss)

# Backprop to compute gradients of w1 and w2 with respect to loss

grad_y_pred = 2.0 * (y_pred - y)

grad_w2 = h_relu.t().mm(grad_y_pred)

grad_h_relu = grad_y_pred.mm(w2.t())

grad_h = grad_h_relu.clone()

grad_h[h < 0] = 0

grad_w1 = x.t().mm(grad_h)

# Update weights using gradient descent

w1 -= learning_rate * grad_w1

w2 -= learning_rate * grad_w2

Output:

0 47168344.01 46385584.02 43153576.0.........497 3.987660602433607e-05498 3.945609932998195e-05499 3.897604619851336e-05

更多关于PyTorch请访问 https://pytorch.org/

Pandas

Pandas是一个流行的Python数据库分析库。它与机器学习没有直接关系。我们知道数据集必须在训练前准备好。在这种情况下，Pandas非常方便，因为它是专门为数据提取和准备而开发的。它提供高级数据结构和各种数据分析工具。它提供了许多用于摸索，组合和过滤数据的内置方法。

# Python program using Pandas for

# arranging a given set of data

# into a table

# importing pandas as pd

import pandas as pd

data = {"country": ["Brazil