numpy 课程
Hi there,
嗨,您好,
Today we will take a look at the NumPy Python library. NumPy is used for numerical processing. Keep in mind that some basic Python knowledge is required. I recommend that you follow along with a jupyter notebook so that you can see the output of the code and also experiment using different inputs.
今天,我们将看看NumPy Python库 。 NumPy用于数值处理。 请记住,需要一些基本的Python知识。 我建议您同时阅读Jupyter笔记本,这样您就可以看到代码的输出,也可以尝试使用不同的输入。
In this article you will learn how to:
在本文中,您将学习如何:
- Create numpy arrays.创建numpy数组。
- Generate random numbers, and how to set a seed.生成随机数,以及如何设置种子。
- Perform operations using arrays.使用数组执行操作。
- How to reshape an array.如何重塑数组。
From an N-dimensional array how to:
从N维数组如何:
- Get a single element.获取一个元素。
- Get a row/column.获取行/列。
- Slice.片。
- Do masking.做遮罩。
Along the way, we will see some tips and tricks you can use to make coding more efficient and easy.
在此过程中,我们将看到一些技巧和窍门,您可以使用这些技巧和窍门使编码更高效,更容易。
I hope you will enjoy it ?
我希望你会喜欢?
First do pip install numpy, then import it in your Jupyter-notebook:
首先执行pip install numpy ,然后将其导入到Jupyter笔记本中:
import numpy as npQuick note: In each block of code, you see both the code and the output. You can also check my GitHub repository here if you want to download the Jupiter-notebook.
快速说明:在每个代码块中,您都可以看到代码和输出。 如果要下载Jupiter笔记本,也可以在此处检查我的GitHub存储库。
创建numpy数组 (Creating numpy arrays)
Creating arrays can be done using a list, or built-in functions. Let’s see how each of them works.
可以使用列表或内置函数来创建数组。 让我们看看它们各自的工作原理。
A.使用列表创建numpy数组 (A. Creating numpy arrays using a list)
We can create an array using a list.
我们可以使用列表创建数组。
We first create a list,
我们首先创建一个列表,
my_list = [0,1,2,3]
my_list
[0, 1, 2, 3]and then convert it to a numpy array.
然后将其转换为numpy数组。
my_array = np.array(my_list)
my_array
array([0, 1, 2, 3])B.使用内置函数创建numpy数组 (B. Creating numpy arrays using built-in functions)
Numpy has many built-in functions that provide a fast and efficient way to create arrays.
Numpy具有许多内置函数,这些函数提供了快速高效的创建数组的方法。
招: (Trick:)
To see all available functions, type the name of your array and then press tab.
要查看所有可用功能,请键入阵列的名称,然后按Tab键。
my_array. # press tab1.设置内置功能。 (1. Arange built-in function.)
Imagine you want to create an array of size 5 which has numbers from 0 to 4. We could do it like this:
假设您要创建一个大小为5的数组,该数组的数字从0到4。我们可以这样做:
np.array([0,1,2,3,4])
array([0, 1, 2, 3, 4])But let’s say that we need to create an array of size 100 which has all the numbers from 0 to 99. It would probably be very painful to do this using the previous method. This is where arange comes to save the day.
但是,假设我们需要创建一个大小为100的数组,该数组的所有数字从0到99。使用以前的方法执行此操作可能会非常痛苦。 这是arange拯救世界的地方。
Here we simply need to specify the starting point, and the endpoint (note that the endpoint will not be included in the array).
在这里,我们只需要指定起点和终点(请注意,终点将不包含在数组中)。
np.arange(0, 100)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])We can also specify a step size. Let’s say we want every second element of this array, we can simply do this by adding the step size in the arange function:
我们还可以指定步长。 假设我们要此数组的每个第二个元素,我们可以简单地通过在arange函数中添加步长来做到这一点:
np.arange(0, 100, 2)
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])2. Linspace内置函数,创建线性间隔的数组 (2. Linspace built-in function, to create linearly spaced arrays)
Linspace takes as input a starting point, an endpoint, and the number of points evenly spaced between them which also indicates the length of the array.
Linspace将起点,终点和在它们之间均匀间隔的点数作为输入,这也表明了数组的长度。
Let’s see an example:
让我们来看一个例子:
# From 1 to 10, 5 elements, evenly spaced between them.
np.linspace(1, 10, 5)
array([ 1. , 3.25, 5.5 , 7.75, 10. ])Here the starting point is 1 and the endpoint is 10. Linspace creates an array with 5 evenly spaced numbers between the start and endpoint.
这里的起点是1,终点是10。Linspace创建了一个数组,在起点和终点之间有5个均匀间隔的数字。
3.随机内置函数创建随机数组 (3. Random built-in function to create random arrays)
What if we don’t care about what numbers our array will have? Well, we simply put random numbers there, right? The random.randint function does exactly that.
如果我们不关心数组会有什么数字怎么办? 好吧,我们只是在这里放随机数,对吗? random.randint函数正是这样做的。
It takes as inputs a starting point, an endpoint (which is not included in the array), and a size of the shape (no worries, we will look into it in a bit).
它以起点,终点(数组中未包含)和形状的大小(不担心,稍后我们将对其进行研究)作为输入。
np.random.randint(0, 4, (2,2))
array([[2, 0],
[0, 2]])Ok but now I want to generate the same random numbers with you! How can we do it?
好的,但是现在我想和您生成相同的随机数! 我们该怎么做?
Simple, we use a seed! If you type in one cell:
简单,我们使用种子! 如果您输入一个单元格:
np.random.seed(12)
np.random.seed(12)
np.random.randint(0, 10, 20)
np.random.randint(0,10,20)
we will get the same random numbers!
我们将获得相同的随机数!
np.random.seed(12)
same_arr = np.random.randint(0, 10, 20)
same_arr
array([6, 1, 2, 3, 3, 0, 6, 1, 4, 5, 9, 2, 6, 0, 5, 8, 2, 9, 3, 4])4.来自正态(高斯)分布的样本 (4. Sample from a normal (Gaussian) distribution)
What if we need a sample of size 100 from the Gaussian distribution, with mean 0 and standard deviation 1? Well, we use again .random but here instead of .randint we use .normal!
如果我们需要高斯分布的样本,大小为100,均值为0,标准差为1,该怎么办? 好吧,我们再次使用.random,但是这里我们使用.normal代替.randint!
np.random.seed(101)
norm_arr = np.random.normal(0, 1, 100)
norm_arr
array([ 2.70684984e+00, 6.28132709e-01, 9.07969446e-01, 5.03825754e-01,
6.51117948e-01, -3.19318045e-01, -8.48076983e-01, 6.05965349e-01,
-2.01816824e+00, 7.40122057e-01, 5.28813494e-01, -5.89000533e-01,
1.88695309e-01, -7.58872056e-01, -9.33237216e-01, 9.55056509e-01,
1.90794322e-01, 1.97875732e+00, 2.60596728e+00, 6.83508886e-01,
3.02665449e-01, 1.69372293e+00, -1.70608593e+00, -1.15911942e+00,
-1.34840721e-01, 3.90527843e-01, 1.66904636e-01, 1.84501859e-01,
8.07705914e-01, 7.29596753e-02, 6.38787013e-01, 3.29646299e-01,
-4.97104023e-01, -7.54069701e-01, -9.43406403e-01, 4.84751647e-01,
-1.16773316e-01, 1.90175480e+00, 2.38126959e-01, 1.99665229e+00,
-9.93263500e-01, 1.96799505e-01, -1.13664459e+00, 3.66479606e-04,
1.02598415e+00, -1.56597904e-01, -3.15791439e-02, 6.49825833e-01,
2.15484644e+00, -6.10258856e-01, -7.55325340e-01, -3.46418504e-01,
1.47026771e-01, -4.79448039e-01, 5.58769406e-01, 1.02481028e+00,
-9.25874259e-01, 1.86286414e+00, -1.13381716e+00, 6.10477908e-01,
3.86030312e-01, 2.08401853e+00, -3.76518675e-01, 2.30336344e-01,
6.81209293e-01, 1.03512507e+00, -3.11604815e-02, 1.93993231e+00,
-1.00518692e+00, -7.41789705e-01, 1.87124522e-01, -7.32845148e-01,
-1.38292010e+00, 1.48249550e+00, 9.61458156e-01, -2.14121229e+00,
9.92573453e-01, 1.19224064e+00, -1.04677954e+00, 1.29276458e+00,
-1.46751402e+00, -4.94095358e-01, -1.62534735e-01, 4.85808737e-01,
3.92488811e-01, 2.21490685e-01, -8.55196041e-01, 1.54199041e+00,
6.66319321e-01, -5.38234626e-01, -5.68581361e-01, 1.40733825e+00,
6.41805511e-01, -9.05099902e-01, -3.91156627e-01, 1.02829316e+00,
-1.97260510e+00, -8.66885035e-01, 7.20787599e-01, -1.22308204e+00])招! (Trick!)
Type ?np.random.normal and you will get informations about how to use this function. This helps a lot if you don’t want to remember what inputs each function needs and what the function returns.
输入?np.random.normal ,您将获得有关如何使用此功能的信息。 如果您不想记住每个函数需要什么输入以及函数返回什么,这会很有帮助。
?np.random.normalLet’s check if the mean of the norm_arr array is close to 0 and the standard deviation close to 1.
让我们检查norm_arr数组的均值是否接近0,标准偏差是否接近1。
norm_arr.mean()0.166369880423112
norm_arr.std()1.0338189430873386If you increase the sample size these numbers will get closer and closer to 0 and 1 respectively.
如果增加样本数量,则这些数字将分别越来越接近0和1。
Let’s find the minimum and maximum number from the norm_arr array and at which index each of these are!
让我们从norm_arr数组中找到最小和最大数,以及每个数字在哪个索引处!
norm_arr.min() # minimun value-2.1412122910809264
norm_arr.max() # maximun value2.706849839399938
norm_arr.argmin() # index of the minimun value75
norm_arr.argmax() # index of the maximun value0The last step is to put everything in order. Let’s sort the norm_arr array!
最后一步是将一切整理妥当。 让我们对norm_arr数组进行排序!
np.sort(norm_arr)
array([-2.14121229e+00, -2.01816824e+00, -1.97260510e+00, -1.70608593e+00,
-1.46751402e+00, -1.38292010e+00, -1.22308204e+00, -1.15911942e+00,
-1.13664459e+00, -1.13381716e+00, -1.04677954e+00, -1.00518692e+00,
-9.93263500e-01, -9.43406403e-01, -9.33237216e-01, -9.25874259e-01,
-9.05099902e-01, -8.66885035e-01, -8.55196041e-01, -8.48076983e-01,
-7.58872056e-01, -7.55325340e-01, -7.54069701e-01, -7.41789705e-01,
-7.32845148e-01, -6.10258856e-01, -5.89000533e-01, -5.68581361e-01,
-5.38234626e-01, -4.97104023e-01, -4.94095358e-01, -4.79448039e-01,
-3.91156627e-01, -3.76518675e-01, -3.46418504e-01, -3.19318045e-01,
-1.62534735e-01, -1.56597904e-01, -1.34840721e-01, -1.16773316e-01,
-3.15791439e-02, -3.11604815e-02, 3.66479606e-04, 7.29596753e-02,
1.47026771e-01, 1.66904636e-01, 1.84501859e-01, 1.87124522e-01,
1.88695309e-01, 1.90794322e-01, 1.96799505e-01, 2.21490685e-01,
2.30336344e-01, 2.38126959e-01, 3.02665449e-01, 3.29646299e-01,
3.86030312e-01, 3.90527843e-01, 3.92488811e-01, 4.84751647e-01,
4.85808737e-01, 5.03825754e-01, 5.28813494e-01, 5.58769406e-01,
6.05965349e-01, 6.10477908e-01, 6.28132709e-01, 6.38787013e-01,
6.41805511e-01, 6.49825833e-01, 6.51117948e-01, 6.66319321e-01,
6.81209293e-01, 6.83508886e-01, 7.20787599e-01, 7.40122057e-01,
8.07705914e-01, 9.07969446e-01, 9.55056509e-01, 9.61458156e-01,
9.92573453e-01, 1.02481028e+00, 1.02598415e+00, 1.02829316e+00,
1.03512507e+00, 1.19224064e+00, 1.29276458e+00, 1.40733825e+00,
1.48249550e+00, 1.54199041e+00, 1.69372293e+00, 1.86286414e+00,
1.90175480e+00, 1.93993231e+00, 1.97875732e+00, 1.99665229e+00,
2.08401853e+00, 2.15484644e+00, 2.60596728e+00, 2.70684984e+00])Now we have an array that starts with the minimum value and goes up to the maximum.
现在,我们有了一个从最小值开始到最大的数组。
小费: (Tip:)
- To check the dimensions of an array use the name of your array and then .ndim要检查数组的尺寸,请使用数组的名称,然后输入.ndim
- To check the shape of an array use the name of your array and then .shape要检查数组的形状,请使用数组的名称,然后输入.shape
But what are the dimensions and the shape of an array?
但是数组的尺寸和形状是什么?
Let’s look at an example.
让我们来看一个例子。
Imagine what you have a nested list looking like this:
想象一下,您有一个嵌套列表,如下所示:
[ [1,2], [3,4], [5,6], [7,8] ]
[[1,2],[3,4],[5,6],[7,8]]
Here, you have one list, and inside that list, you have 4 more lists, each of them has 2 elements. Great. Now imagine having an even bigger list, containing 3 times the previous one! And we transform all of these lists into numpy arrays.
在这里,您有一个列表,在该列表内,您还有4个列表,每个列表都有2个元素。 大。 现在想象有一个更大的列表,包含上一个列表的三倍! 然后,我们将所有这些列表转换为numpy数组。
small_array = np.array([1,2])
small_arrayarray([1, 2])
medium_array = np.array([[1,2],[3,4],[5,6],[7,8]])
medium_arrayarray([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
big_array = np.array([ [[1,2],[3,4],[5,6],[7,8]] , [[1,2],[3,4],[5,6],[7,8]], [[1,2],[3,4],[5,6],[7,8]] ])
big_arrayarray([[[1, 2],
[3, 4],
[5, 6],
[7, 8]],
[[1, 2],
[3, 4],
[5, 6],
[7, 8]],
[[1, 2],
[3, 4],
[5, 6],
[7, 8]]])We learned that to get the dimensions from an array we use .ndim, so let’s try it out!
我们了解到要使用.ndim从数组中获取尺寸,所以让我们尝试一下!
small_array.ndim1
medium_array.ndim2
big_array.ndim3To see that shape of an array use .shape
要查看数组的形状,请使用.shape
small_array.shape(2,)
medium_array.shape(4, 2)
big_array.shape(3, 4, 2)What we see here is that the dimensions have to do with how nested the lists are!
我们在这里看到的是维度与列表的嵌套程度有关!
The shape of an array is the length of the lists used to create the array. In the big_array, we have 3 lists, each of them has 4 lists nested in them and each of these 4 lists is of length 2. So the shape is: (3, 4, 2).
数组的形状是用于创建数组的列表的长度。 在big_array中,我们有3个列表,每个列表中嵌套有4个列表,这4个列表中的每个列表的长度均为2。因此形状为:(3,4,2)。
5.零数组 (5. Arrays of zeros)
To create an array which has zero at each index we need to specify:
要创建一个在每个索引处都为零的数组,我们需要指定:
- The length of the array.数组的长度。
- The shape (as a tuple). When no shape is specified, the default is (array’s length, ).形状(作为元组)。 如果未指定形状,则默认值为(数组的长度)。
The output is an array with floats of 0's.
输出是浮点值为0的数组。
# (3,) shape numpy array with 3 zeros
array1 = np.zeros(3)
array1array([0., 0., 0.])
# (2,3) shape numpy array with 6 zeros
array2 = np.zeros((2,3))
array2array([[0., 0., 0.],
[0., 0., 0.]])6.一组 (6. Arrays of ones)
As with the .zeros function, the same things apply here. To create an array which has one at each index we need to specify:
与.zeros函数一样,此处适用相同的条件。 要创建一个在每个索引处都有一个的数组,我们需要指定:
- The length of the array.数组的长度。
- The shape (as a tuple). When no shape is specified, the default is (array’s length, ).形状(作为元组)。 如果未指定形状,则默认值为(数组的长度)。
The output is an array with floats of 1's.
输出是浮点数为1的数组。
# (3,) shape array with 3 ones
np.ones(3)array([1., 1., 1.])
# (2,3) shape array with 6 ones
np.ones((2,3))array([[1., 1., 1.],
[1., 1., 1.]])7.完整的内置功能 (7. Full built-in function)
O’s and 1’s are maybe not the only numbers that we probably want to use. What if we want to create an array which has at each index is the number 4? We can use the built-in function full.
O和1可能不是我们可能要使用的唯一数字。 如果我们想创建一个在每个索引处具有数字4的数组怎么办? 我们可以完全使用内置功能。
First, we specify the shape of the array and then the number that we want at each index.
首先,我们指定数组的形状,然后指定每个索引处所需的数字。
all_4 = np.full((2, 3), 4)
all_4
array([[4, 4, 4],
[4, 4, 4]])至此,您知道了创建numpy数组的两种方法。 (At this point, you know two ways to create numpy arrays.)
The first way is using a list, and the second-way using built-in functions. Now it’s time to discover what we can do with these arrays! First, we will take a look at how to perform operations, then how to reshape the array, and finally how to access specific elements from an array.
第一种方法是使用列表,第二种方法是使用内置函数。 现在是时候发现我们可以对这些阵列进行处理了! 首先,我们将研究如何执行操作,然后如何重塑数组,最后如何从数组访问特定元素。
运作方式 (Operations)
Let’s say we have two arrays called a and b with the same shape and length.
假设我们有两个形状和长度相同的数组,分别称为a和b 。
We can:
我们可以:
- Take the sum of the corresponding indexes.取相应索引的总和。
- Add at each index the element value.在每个索引处添加元素值。
The same things apply to subtraction, multiplications, and division.
减法,乘法和除法也一样。
# Array with 3 random numbers. a = np.random.randint(0, 10, 3)aarray([1, 9, 9])b = np.random.randint(0, 10, 3)
barray([2, 0, 2])
a + b # addition of two arraysarray([ 3, 9, 11])
a + 1 # adding 1 to each element of aarray([ 2, 10, 10])重塑 (Reshape)
Imagine that we have an array of length 9.
假设我们有一个长度为9的数组。
first_arr = np.arange(0,9)
print(first_arr)
first_arr.shape
[0 1 2 3 4 5 6 7 8]
(9,)The goal here is to create a new array, containing the same data but with a new shape! Let’s choose (3,3) shape. We can use the reshape built-in function.
这里的目标是创建一个新数组,其中包含相同的数据,但是具有新的形状! 让我们选择(3,3)形状。 我们可以使用reshape内置函数。
sec_arr = first_arr.reshape(3,3)
print(sec_arr)
sec_arr.shape
[[0 1 2]
[3 4 5]
[6 7 8]]
(3, 3)So now we have split the first array in 3 parts, creating a (3,3) shaped array! Can you guess the dimension of the second array? How many things do you see nested inside the array? Let’s check:
因此,现在我们将第一个数组分为三个部分,创建了一个(3,3)形状的数组! 您能猜出第二个数组的维数吗? 您看到嵌套在数组中的东西有多少? 让我们检查:
sec_arr.ndim
2It’s like the medium_array we saw previously.
就像我们之前看到的medium_array一样。
从数组访问特定元素 (Access specific elements from an array)
We created arrays, of various dimensions and shapes. But how can we access the elements that we want?
我们创建了各种尺寸和形状的数组。 但是,我们如何获取所需的元素?
First, think of a 2-dim array, having 5 nested arrays, of 5 elements each. You can have this array as a matrix in your head.
首先,考虑一个2维数组,该数组有5个嵌套数组,每个数组有5个元素。 您可以将此数组作为矩阵放在脑海中。
mat = np.arange(0, 25).reshape(5, 5)
mat
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])获取矩阵的第一个数组。 (Get the first array of the matrix.)
We have a nested array, so when we type mat[0] this represents the first dimension and the first array (because of the 0).
我们有一个嵌套的数组,所以当我们输入mat [0]时,它代表第一个维度和第一个数组(因为0)。
Note that the indexing in python starts at 0!
请注意,python中的索引从0开始!
mat[0]
array([0, 1, 2, 3, 4])获取矩阵的最后一个数组。 (Get the last array of the matrix.)
Instead of counting how long your array is you can use -1.
不用计算数组的长度,您可以使用-1。
mat[-1]
array([20, 21, 22, 23, 24])获取一个元素 (Get a single element)
Now let’s say we want to get the value 0 from this matrix. With mat[0] we have from the first dimension the first array. Now with mat[0,0] we represent second dimension and the element 0 from the matrix.
现在,我们要从该矩阵中获取值0。 使用mat [0],我们从第一个维度获得第一个数组。 现在使用mat [0,0]表示第二维和矩阵中的元素0。
mat[0,0]
0获取第三行 (Get the 3rd row)
When we want to access all the elements we can use the symbol : . So here we say from the first dimension I want everything and from the second dimension I only want the 3rd element.
当我们要访问所有元素时,我们可以使用符号:。 因此,这里我们说从第一个维度开始,我想要一切,从第二个维度开始,我只想要第三个元素。
mat[:,2]
array([ 2, 7, 12, 17, 22])获取第二列 (Get the 2nd column)
Or else: from the first dimension the second array and from the second dimension, everything. Combining these two we are left with the second column.
否则:从第一个维度开始第二个数组,从第二个维度开始,一切。 将这两部分结合在一起,剩下第二列。
mat[1,:]
array([5, 6, 7, 8, 9])切片 (Slicing)
Here we want to get a slice or a piece of the matrix. Let’s say we want a 3x3 matrix with the values from the upper left corner from the original matrix. We can do it like this:
在这里,我们要获取矩阵的一部分或一部分。 假设我们想要一个3x3矩阵,其值位于原始矩阵的左上角。 我们可以这样做:
mat[0:3, 0:3]
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])掩蔽 (Masking)
Masking is useful if we want to specify some limits. Let’s say we only want from the matrix the values which are lower than 5.
如果我们要指定一些限制,则遮罩很有用。 假设我们只希望矩阵中的值小于5。
mat < 5 # matrix with array([[ True, True, True, True, True],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]])获取实际值 (Get the actual values)
To get the actual values and not an array with boolean values we can do it like this:
要获取实际值,而不是具有布尔值的数组,我们可以这样做:
my_filer = mat < 5
mat[my_filer]
array([0, 1, 2, 3, 4])Or just
要不就
mat[mat<5]
array([0, 1, 2, 3, 4])万岁! (Hooray!)
With that, you actually completed the numpy crash course! Be proud of yourself that you did it! I hope this was useful for you, you can play around with it using different inputs and I’m looking forward to seeing you in the pandas crash course.
这样,您实际上完成了numpy崩溃课程! 为自己做到了而感到自豪! 我希望这对您有用,您可以使用不同的输入来尝试使用它,我期待在熊猫速成课程中与您相见。
Thanks for reading, stay safe, and be happy.
感谢您的阅读,保持安全并感到高兴。
翻译自: https://towardsdatascience.com/numpy-crash-course-6e2906feb175
numpy 课程