MIT | 数据分析、信号处理和机器学习中的矩阵方法 笔记系列 Lecture 1: The Column Space of A Contains All Vectors Ax

本系列为MIT Gilbert Strang教授的"数据分析、信号处理和机器学习中的矩阵方法"的学习笔记。

  • Gilbert Strang & Sarah Hansen | Sprint 2018
  • 18.065: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
  • 视频网址
  • Markdown源文件暂未开源,如有需要可联系 981591477@qq.com
  • 笔记难免存在问题,欢迎联系上述邮箱指正
  • 关注下面的公众号,回复“ 矩阵方法 ”,即可获取 本系列完整的pdf笔记文件~

Lecture 0: Course Introduction
Lecture 1 The Column Space of A AA Contains All Vectors A x AxAx
Lecture 2 Multiplying and Factoring Matrices
Lecture 3 Orthonormal Columns in Q QQ Give Q ′ Q = I Q'Q=IQQ=I
Lecture 4 Eigenvalues and Eigenvectors
Lecture 5 Positive Definite and Semidefinite Matrices
Lecture 6 Singular Value Decomposition (SVD)
Lecture 7 Eckart-Young: The Closest Rank k kk Matrix to A AA
Lecture 8 Norms of Vectors and Matrices
Lecture 9 Four Ways to Solve Least Squares Problems
Lecture 10 Survey of Difficulties with A x = b Ax=bAx=b
Lecture 11 Minimizing ||x|| Subject to A x = b Ax=bAx=b
Lecture 12 Computing Eigenvalues and Singular Values
Lecture 13 Randomized Matrix Multiplication
Lecture 14 Low Rank Changes in A AA and Its Inverse
Lecture 15 Matrices A ( t ) A(t)A(t) Depending on t tt, Derivative = d A / d t dA/dtdA/dt
Lecture 16 Derivatives of Inverse and Singular Values
Lecture 17 Rapidly Decreasing Singular Values
Lecture 18 Counting Parameters in SVD, LU, QR, Saddle Points
Lecture 19 Saddle Points Continued, Maxmin Principle
Lecture 20 Definitions and Inequalities
Lecture 21 Minimizing a Function Step by Step
Lecture 22 Gradient Descent: Downhill to a Minimum
Lecture 23 Accelerating Gradient Descent (Use Momentum)
Lecture 24 Linear Programming and Two-Person Games
Lecture 25 Stochastic Gradient Descent
Lecture 26 Structure of Neural Nets for Deep Learning
Lecture 27 Backpropagation: Find Partial Derivatives
Lecture 28 Computing in Class [No video available]
Lecture 29 Computing in Class (cont.) [No video available]
Lecture 30 Completing a Rank-One Matrix, Circulants!
Lecture 31 Eigenvectors of Circulant Matrices: Fourier Matrix
Lecture 32 ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule
Lecture 33 Neural Nets and the Learning Function
Lecture 34 Distance Matrices, Procrustes Problem
Lecture 35 Finding Clusters in Graphs
Lecture 36 Alan Edelman and Julia Language



Lecture 1: The Column Space of A Contains All Vectors Ax 矩阵的列空间

stitch 缝补

  • stitching images together

bashful 害羞的,腼腆的

  • We are getting videotaped. So, if anybody is bashful, sit in the far back.

1.1 Ax: A Matrix multipy a vector

1.1.1 x is a specific vector x是一个给定向量的情况

  • What is the deal in linear algebra?
    • Matrix × \times× vector
    • and ⇒ \Rightarrow Matrix × \times× Matrix
    • 如何正确理解矩阵与向量的乘法?
  • Example:
      • A = [ 2 1 3 3 1 4 5 7 12 ] A = \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix}A=2351173412
    • A与一个向量相乘:
      • A x = [ 2 1 3 3 1 4 5 7 12 ] [ x 1 x 2 x 3 ] = ? Ax = \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = ?Ax=2351173412x1x2x3=?
    • How to look at the answer?
      • 两种思路:从行的角度或从列的角度
        • think of the rows of A or think the columns of A
      • 思路1:行的角度 (think of the rows)
        • 也是the standard way to multiply
        • the dot product 点积!
          • row ⋅ \cdot x
        • But this is the low-level way!
        • The good way is see in vector-wise ↓ \downarrow
      • 思路2:列的角度 (vector-wise)
        • A x = [ 2 1 3 3 1 4 5 7 12 ] [ x 1 x 2 x 3 ] = x 1 [ 2 3 5 ] + x 2 [ 1 1 7 ] + x 3 [ 3 4 12 ] Ax = \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = x_1\begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix} + x_2 \begin{bmatrix} 1 \\ 1 \\ 7 \end{bmatrix} + x_3 \begin{bmatrix} 3 \\ 4 \\ 12 \end{bmatrix}Ax=2351173412x1x2x3=x1235+x2117+x33412
        • It isa combination of vectors!
        • andof course it produces a vector!
        • 本门课的目的之一
          • thinking of a matrix as a whole thing
            • 而不是just a bunch of 9 (m × n m\times nm×n) numbers
          • 比如:1个矩阵 乘上 1个向量 就能得到 另1个向量
          • 一旦提到Ax:
            • Ax is a combination of the columns of A!

1.1.2 x is the whole R 3 R^3R3 x是R 3 R^3R3所有向量的集合时的情况

  • All Ax
    • ⇒ \Rightarrow 由a big bunch of vector构成了一个空间
    • called Column Space of A: C(A)
      • 矩阵A的向量空间,记为C(A)
      • Column Space C(A) = Ax, for all x in R 3 R^3R3
  • What does this column space C(A) look like?
    • It depends on the matrix!
      • sometimes the column space C(A) would be the whole of R 3 R^3R3
      • sometimes C(A) is a smaller set in R 3 R^3R3
    • For this matrix
      • A = [ 2 1 3 3 1 4 5 7 12 ] A = \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix}A=2351173412
      • just get a plane! (instead of the whole R 3 R^3R3)
    • 什么时候the matrix’s column space would only be a line?
      • e.g.,
        • A 1 = [ 1 3 8 1 3 8 1 3 8 ] A_1 = \begin{bmatrix} 1 & 3 & 8 \\ 1 & 3 & 8 \\ 1 & 3 & 8 \end{bmatrix}A1=111333888
        • A 1 x = [ 1 3 8 1 3 8 1 3 8 ] [ x 1 x 2 x 3 ] = [ 1 1 1 ] x 1 + [ 3 3 3 ] x 2 + [ 8 8 8 ] x 3 A_1 x = \begin{bmatrix} 1 & 3 & 8 \\ 1 & 3 & 8 \\ 1 & 3 & 8 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} x_1 + \begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix} x_2 + \begin{bmatrix} 8 \\ 8 \\ 8 \end{bmatrix} x_3A1x=111333888x1x2x3=111x1+333x2+888x3
      • 因此,C ( A 1 ) C(A_1)C(A1) = line
      • ⇒ \Rightarrow rank(A) = 1
    • 什么时候C(A) would be a plane?
      • e.g.,
        • A 2 = [ 2 1 3 3 1 4 5 7 12 ] A_2 = \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix}A2=2351173412
      • the third column is the sum of 前两列
      • 因此 C ( A 2 ) C(A_2)C(A2) is a plane
      • ⇒ \Rightarrow rank(A) = 2
        • the third column is dependent

  • A special way towrite those rank = 1 matrices
    • A 1 = [ 1 3 8 1 3 8 1 3 8 ] = [ 1 1 1 ] [ 1 3 8 ] = u v T A_1 = \begin{bmatrix} 1 & 3 & 8 \\ 1 & 3 & 8 \\ 1 & 3 & 8 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \begin{bmatrix} 1 & 3 & 8 \\ \end{bmatrix} = u v^TA1=111333888=111[138]=uvT
    • 秩为1的矩阵能够被写为a clome times a row!

  • 回到rank(A) = 2的情况,How many independent columns have I got here?

    • Ans: 2, the third = first + second
    • Those two columns would be a basis for the column space
      • Everything in the column space is a combination of the basis
    • The number(#) of independent columns is the rank
  • How to find a basis for C(A) in the most natural way?

    • 继续看矩阵A 2 A_2A2
      • A 2 = [ 2 1 3 3 1 4 5 7 12 ] A_2 = \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix}A2=2351173412
    • What is a basis?
      • 1 A basis is independent columns
        • 因此,A 2 A_2A2的all three columns together would not be a basis
      • 2 A basis 的combination 必须能够fill the space C(A)
    • 下面开始找basis C
      • 初始化:C = [ ] C = []C=[]
      • 先看第一列,不是0,所以可以写入:
        • C = [ 2 3 5 ] C = \begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix}C=235
      • 再看第二列
        • [1 1 7] is a different dirction from [2 3 5]
          • 二者独立
        • 所以把第二列也加入到C CC
        • C = [ 2 1 3 1 5 7 ] C = \begin{bmatrix} 2 & 1\\ 3 & 1\\ 5 & 7 \end{bmatrix}C=235117
      • 再看第三列
        • 虽然第三列与第一列和第二列都各自independent
        • 第三列是第一列和第二列的combination!!
      • 结束,最终的C = [ 2 1 3 1 5 7 ] C = \begin{bmatrix} 2 & 1\\ 3 & 1\\ 5 & 7 \end{bmatrix}C=235117
    • And we can get the column rank rank(A) = 2
  • 如何从C (a basis of C(A)) to reconstruct A?

    • Like A = C × R A = C \times RA=C×R
    • 已知: A的所有列都是C中columns的combination
    • 因此,就可以得到R:
      • A 2 = [ 2 1 3 3 1 4 5 7 12 ] = C R = [ 2 1 3 1 5 7 ] [ 1 0 1 0 1 1 ] A_2 = \begin{bmatrix} 2 & 1 & 3\\ 3 & 1 & 4\\ 5 & 7 & 12 \end{bmatrix} = CR = \begin{bmatrix} 2 & 1\\ 3 & 1\\ 5 & 7 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1\\ 0 & 1 & 1\\ \end{bmatrix}A2=2351173412=CR=235117[100111]
    • 其实就是矩阵的 矩阵的最大秩分解:

1652682066349----Matrix_Gilbert_note.png

1652682088835----Matrix_Gilbert_note.png

1.1.3 矩阵的秩、行秩与列秩的关系

  • The first great theorem in Linear Algebra:

    • Column Rank = Row Rank = Matrix Rank
      • If the columns are dependent, then the columns are dependent
    • 如何证明?
      • What is the row rank?
        • The row rank is the dimension of the row space
          • row space: all combinations of the rows!
          • row space of A AA: 也是the column space of A T ⇒ C ( A T ) A^T \Rightarrow C(A^T)ATC(AT)
        • 对于n × n n\times nn×n的矩阵A AA, A AA的row space和column space都是R n R^nRn的子空间(含R n R^nRn)
      • 下面:只需要证明R RR中的两行构成了a basis for row space即可R = [ 1 0 1 0 1 1 ] R = \begin{bmatrix} 1 & 0 & 1\\ 0 & 1 & 1\\ \end{bmatrix}R=[100111]
        • 那么R的行数就是a basis中的independent vectors数量,就是row rank!!
          • 已知:the columns of C CC is a basis of column space
          • 所以 行秩 = R的行数 = C的列数 = 列秩
        • 证明方法:证明R的两行independent 且 它们的combination 能够得到 all the rows of A
          • 显然,R的两行Independent
          • 且combination能够得到 all the rows of A:
            • [2 1 3] = 2*[1 0 1] + 1*[0 1 1]
            • [3 1 4] = 3*[1 0 1] + 1*[0 1 1]
            • [5 7 12] = 5*[1 0 1] + 7*[0 1 1]
            • 且系数恰好就是C
            • 事实上,由等式A = CR存在, 就可以直接证明 A的所有行都可由R的行组合得到(线性表出)
  • Another name of Column Space of Matrix A:

    • Range Space (值域空间)
  • How to sample a Matrix:

    • Sample: 对于一个特别大的矩阵,想要挑选其中的几行几列代表整个矩阵
    • 如何去做?
    • Ax: x = rand(m,1)
      • x是一个m维的随机向量
      • Ax is in the column space
    • If youwant a random vector in the column space
      • 不建议直接从A的columns中随机选择一列
        • 相当于 x = [0, 0, …, 0, 1, 0, …, 0]
      • 而是通过随机的vector x 获得A的columns的combination
  • Question:Is ABCx in the column space of A?

    • Ans: Yes!
    • BCx is a vector – just A times something

1.2 下面考虑 矩阵与矩阵相乘

  • 首先:A = C R A = CRA=CR
    • C: directly is the columns in A
    • R: but R is not the rows in A
      • R is called the row reduced echelon form of the matrix
        • 矩阵的行简化阶梯型
    • 另一个matrix factorization
      • A = C U R ^ A = C U \hat{R}A=CUR^
      • C: 就是之前的C, is literally the columns of A
      • R ^ \hat{R}R^: also literally the rows fo A
      • U: 不得不夹在C和R ^ \hat{R}R^中使等式成立
    • 将在后续讲解
  • 考虑A × B A\times BA×B
    • Low-level (easy/ beginner’s) way:
      • 就是The rows of A 点乘 The columns of B, 每个内积组成结果矩阵的一个元素
      • dot products: r ⋅ c r \cdot crc
      • 时间复杂度 (m × n m\times nm×n矩阵A 乘以 m × p m\times pm×p矩阵B)
        • 每完成1行1列:需要n次乘法
        • 共需要完成m*p次1行1列
        • 所以 总时间复杂度O(mnp)
    • Deeper way:
      • Columns of A × \times× Rows of B
        • 作为one piece of the answer
      • 然后将结果组合起来 (add)
      • $AB = \Sigma_k^{n} $ the kth column of A × \times× the kth row of B
      • 时间复杂度:(m × n m\times nm×n矩阵A 乘以 m × p m\times pm×p矩阵B)
        • 每完成1列1行:需要mp次
        • 共需要n次1列1行
        • 所以 时间复杂度是一样的!只是顺序不同

  • Something about the course (not important for us)
    • 相比于linear algebra, 这门课更special的地方在于online homework
      • python
    • Acknowledgement
    • Julia

版权声明:本文为qazwsxrx原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。