吴恩达公开课-02

0. Install Octave¶

brew tap homebrew/science
brew update && brew upgrade
brew install octave

1. Multivariate Linear Regression¶

1.1 Multiple Features¶

$m$ : number of training examples
$x^{(i)}$ : $i_{th}$ training example
$x_j^{(i)}$ : value of the feature j in $i_{th}$ training example

$h_\theta(x)=\theta_0+\theta_1x_1+...+\theta_nx_n$ , 添加 $x_0^{(i)}=1$ ，则

$h_\theta(x)=[\theta_0, \theta_1, ..., \theta_n][x_0, x_1, ..., x_n]^T=\theta_Tx$

1.2 Gradient Deacent for Multiple Varibles¶

推广到多变量的梯度下降法

$J_\theta(x)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta x^i-y^i)^2$

$\theta_j=\theta_0-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta x^i-y^i)x_j^i$

这里 $x_j^{(i)}$ 是一个向量

1.3 Gradient Descent in Practice I - Feature Scaling¶

一些关于梯度下降的实用技巧

当不同特征规模相似时，梯度下降算法收敛速度更快

以两个特征为例，当特征相差过多时， $J(\theta)$ 图像会变成一个扁长的椭圆，收敛速度更慢

因此需要Scale Feature，让 $J(\theta)$ 图像更像正圆，比如 $\frac{x_i}{s_i}$ ，使在 $[-1, 1]$ 范围内

Mean Normalization: 用 $\frac{x_i-\mu_i}{s_i}$ 代替 $x_i$ ，s: range(max - min), $\mu$ : Average Value

1.4 Gradient Descent in Practice II - Learning Rate¶

Learning Rate: $\alpha$

如果J一直上升或者不收敛，应该采用更小的 $\alpha$

对于Linear Regression，可以证明当 $\alpha$ 足够小的时候，J一定每次迭代都会下降

1.5 Features and Polynomial Regression¶

对于Polynomial Regression，也可以以Linear Regression的方式来进行回归计算

例如 $h_\theta(x)=\theta_0+\theta_1x^2+\theta_2x^3$

可以把 $x^2$ , $x^3$ 看作其它feature， $x_2$ 和 $x_3$ ，还可以进行Feature Scaling

更高级的方法，就是自动寻找可能的model，而不是靠人为观察

2. Computing Parameters Analytically¶

2.1 Normal Equation¶

构造一个m*(n+1)矩阵X，包含所有x值，因为包含 $x_0$ ，所以是n+1
m维向量y，记录所有y值
$\theta=(X^TX)^{-1}X^Ty$ ， $\theta$ 是n+1维向量

% pinv: inverse matrix function
pinv(X`*X)*X`*y

不需要进行Feature Scaling

n很大时，梯度下降依然可以工作，而此时Normal Equation会很慢，矩阵逆转复杂度为O $(n^3)$

n超过10000或更大时，可能开始考虑梯度下降

在一些复杂算法里，Normal Equation并不好用，梯度下降使用更多

2.2 Normal Equation Noninvertibility¶

不可逆性

有些矩阵不可逆，那么对于Normal Equation的 $X^TX$ ，如何保证可逆？

pinv: pseudo-inverse, 即使矩阵不可逆，也可以得到结果
inv: inverse

方阵A可逆的充要条件为 $det(A)\ne0$ ，一般在Normal Equation $A^TA$ 不可逆时，因为以下原因:

Redundant features，即两个feature很相关，例如线性关系
过多特征，(n>>m)

3. Submitting Programming Assignments¶

submit()

4. Octave Tutorial¶

4.1 Basic Operations¶

% not equals
1 ~= 2
1 && 0
1 || 0
xor(1, 0)
A=[1 2; 3 4; 5 6;]
% randn: Guassian Distribution

4.2 Moving Data Around¶

% clear screen: Ctrl+k
% load text file
load('filename')
% list all variables
who
% list detail of variables
whos
% save file
save filename variable
% clear all variables
clear
% x(row, column)
x(1, 2)
x(:, :)
% add a column to A
A = [A [1; 2; 3]]
% A(:) put all elements of A into a single vector
% C = [A B] if A and B are matrixs with same size

4.3 Computing on Data¶

% matrix times matrix
A * B
% each element in a matrix times element in another matrix
A .* B
% each element get squared in A
A .^ 2
% each element in a vector divides 1
1 ./ V
% each element in a matrix divides 1
1 ./ A
log(V)
exp(V)
abs(V)
% Transpose of matrix A
A'
A < 3
% maximum value of each column of matrix A
max(A)
% returns magic square
magic(3)

4.4 Plotting Data¶

t = [0:0.01:0.98];
y1 = sin(8*pi*t);
y2 = cos(8*pi*t);
plot(t, y1);
hold on;
plot(t, y2, 'r');
xlabel('time');
ylable('value');
legend('sin', 'cos');
title('my plot');
print -dpng 'myplot.png'

figure(1); plot(t, y1);
figure(2); plot(t, y2);
subplot(1, 2, 1); % divides plot a into 1x2 arid, access first element
plot(t, y1);
subplot(1, 2, 2); plot(t, y2);
axis([0.5 1 -1 1]);
clf; % clear figure
% plot a color image for a matrix
imagesc(A), colorbar, colormap gray;

语句之间用','连接表示同时执行

4.5 Control Statements: for, while, if¶

% for statement
for i = 1:10,
disp(i)
end;
% while statement
while i <= 5,
disp(i);
i = i+1;
end;
% if statement
if i == 1,
    disp('The value is one');
elseif i == 2,
    disp('The valeu if two')';
else
    disp('The value is not one or two');
end;
% define function, create functionName.m
function y = squareThisNumber(x)
y = x^2;
% 可以通过addpath(path)来添加函数路径
% 可以返回vector
function [y1, y2] = squareAndCubeThisNumber(x)
y1 = x^2;
y2 = x^3;

4.6 Vectorization¶

$J_\theta(x)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta x^i-y^i)^2$

J = sum((X * theta - y) .^ 2) / (2*m);

$\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_ \theta(x^{(i)})-y^{(i)})x_j^{(i)}$

theta = theta - (alpha / m) * X' * (X * theta - y);