跳转至

吴恩达公开课-02

0. Install Octave

brew tap homebrew/science
brew update && brew upgrade
brew install octave

1. Multivariate Linear Regression

1.1 Multiple Features

  • m: number of training examples
  • x^{(i)}: i_{th} training example
  • x_j^{(i)}: value of the feature j in i_{th} training example

h_\theta(x)=\theta_0+\theta_1x_1+...+\theta_nx_n, 添加x_0^{(i)}=1,则

  • h_\theta(x)=[\theta_0, \theta_1, ..., \theta_n][x_0, x_1, ..., x_n]^T=\theta_Tx

1.2 Gradient Deacent for Multiple Varibles

推广到多变量的梯度下降法

J_\theta(x)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta x^i-y^i)^2

\theta_j=\theta_0-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta x^i-y^i)x_j^i

这里x_j^{(i)}是一个向量

1.3 Gradient Descent in Practice I - Feature Scaling

一些关于梯度下降的实用技巧

当不同特征规模相似时,梯度下降算法收敛速度更快

以两个特征为例,当特征相差过多时,J(\theta)图像会变成一个扁长的椭圆,收敛速度更慢

因此需要Scale Feature,让J(\theta)图像更像正圆,比如\frac{x_i}{s_i},使在[-1, 1]范围内

Mean Normalization: 用\frac{x_i-\mu_i}{s_i}代替x_i,s: range(max - min), \mu: Average Value

1.4 Gradient Descent in Practice II - Learning Rate

Learning Rate: \alpha

如果J一直上升或者不收敛,应该采用更小的\alpha

对于Linear Regression,可以证明当\alpha足够小的时候,J一定每次迭代都会下降

1.5 Features and Polynomial Regression

对于Polynomial Regression,也可以以Linear Regression的方式来进行回归计算

例如h_\theta(x)=\theta_0+\theta_1x^2+\theta_2x^3

可以把x^2, x^3看作其它feature,x_2x_3,还可以进行Feature Scaling

更高级的方法,就是自动寻找可能的model,而不是靠人为观察

2. Computing Parameters Analytically

2.1 Normal Equation

  1. 构造一个m*(n+1)矩阵X,包含所有x值,因为包含x_0,所以是n+1
  2. m维向量y,记录所有y值
  3. \theta=(X^TX)^{-1}X^Ty\theta是n+1维向量
% pinv: inverse matrix function
pinv(X`*X)*X`*y

不需要进行Feature Scaling

n很大时,梯度下降依然可以工作,而此时Normal Equation会很慢,矩阵逆转复杂度为O(n^3)

n超过10000或更大时,可能开始考虑梯度下降

在一些复杂算法里,Normal Equation并不好用,梯度下降使用更多

2.2 Normal Equation Noninvertibility

不可逆性

有些矩阵不可逆,那么对于Normal Equation的X^TX,如何保证可逆?

  • pinv: pseudo-inverse, 即使矩阵不可逆,也可以得到结果
  • inv: inverse

方阵A可逆的充要条件为det(A)\ne0,一般在Normal EquationA^TA不可逆时,因为以下原因:

  1. Redundant features,即两个feature很相关,例如线性关系
  2. 过多特征,(n>>m)

3. Submitting Programming Assignments

submit()

4. Octave Tutorial

4.1 Basic Operations

% not equals
1 ~= 2
1 && 0
1 || 0
xor(1, 0)
A=[1 2; 3 4; 5 6;]
% randn: Guassian Distribution

4.2 Moving Data Around

% clear screen: Ctrl+k
% load text file
load('filename')
% list all variables
who
% list detail of variables
whos
% save file
save filename variable
% clear all variables
clear
% x(row, column)
x(1, 2)
x(:, :)
% add a column to A
A = [A [1; 2; 3]]
% A(:) put all elements of A into a single vector
% C = [A B] if A and B are matrixs with same size

4.3 Computing on Data

% matrix times matrix
A * B
% each element in a matrix times element in another matrix
A .* B
% each element get squared in A
A .^ 2
% each element in a vector divides 1
1 ./ V
% each element in a matrix divides 1
1 ./ A
log(V)
exp(V)
abs(V)
% Transpose of matrix A
A'
A < 3
% maximum value of each column of matrix A
max(A)
% returns magic square
magic(3)

4.4 Plotting Data

t = [0:0.01:0.98];
y1 = sin(8*pi*t);
y2 = cos(8*pi*t);
plot(t, y1);
hold on;
plot(t, y2, 'r');
xlabel('time');
ylable('value');
legend('sin', 'cos');
title('my plot');
print -dpng 'myplot.png'

figure(1); plot(t, y1);
figure(2); plot(t, y2);
subplot(1, 2, 1); % divides plot a into 1x2 arid, access first element
plot(t, y1);
subplot(1, 2, 2); plot(t, y2);
axis([0.5 1 -1 1]);
clf; % clear figure
% plot a color image for a matrix
imagesc(A), colorbar, colormap gray;

语句之间用','连接表示同时执行

4.5 Control Statements: for, while, if

% for statement
for i = 1:10,
disp(i)
end;
% while statement
while i <= 5,
disp(i);
i = i+1;
end;
% if statement
if i == 1,
    disp('The value is one');
elseif i == 2,
    disp('The valeu if two')';
else
    disp('The value is not one or two');
end;
% define function, create functionName.m
function y = squareThisNumber(x)
y = x^2;
% 可以通过addpath(path)来添加函数路径
% 可以返回vector
function [y1, y2] = squareAndCubeThisNumber(x)
y1 = x^2;
y2 = x^3;

4.6 Vectorization

J_\theta(x)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta x^i-y^i)^2

J = sum((X * theta - y) .^ 2) / (2*m);

\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_ \theta(x^{(i)})-y^{(i)})x_j^{(i)}

theta = theta - (alpha / m) * X' * (X * theta - y);