吴恩达公开课-02
0. Install Octave¶
brew tap homebrew/science
brew update && brew upgrade
brew install octave
1. Multivariate Linear Regression¶
1.1 Multiple Features¶
- m: number of training examples
- x^{(i)}: i_{th} training example
- x_j^{(i)}: value of the feature j in i_{th} training example
h_\theta(x)=\theta_0+\theta_1x_1+...+\theta_nx_n, 添加x_0^{(i)}=1,则
- h_\theta(x)=[\theta_0, \theta_1, ..., \theta_n][x_0, x_1, ..., x_n]^T=\theta_Tx
1.2 Gradient Deacent for Multiple Varibles¶
推广到多变量的梯度下降法
\theta_j=\theta_0-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta x^i-y^i)x_j^i
这里x_j^{(i)}是一个向量
1.3 Gradient Descent in Practice I - Feature Scaling¶
一些关于梯度下降的实用技巧
当不同特征规模相似时,梯度下降算法收敛速度更快
以两个特征为例,当特征相差过多时,J(\theta)图像会变成一个扁长的椭圆,收敛速度更慢
因此需要Scale Feature,让J(\theta)图像更像正圆,比如\frac{x_i}{s_i},使在[-1, 1]范围内
Mean Normalization: 用\frac{x_i-\mu_i}{s_i}代替x_i,s: range(max - min), \mu: Average Value
1.4 Gradient Descent in Practice II - Learning Rate¶
Learning Rate: \alpha
如果J一直上升或者不收敛,应该采用更小的\alpha
对于Linear Regression,可以证明当\alpha足够小的时候,J一定每次迭代都会下降
1.5 Features and Polynomial Regression¶
对于Polynomial Regression,也可以以Linear Regression的方式来进行回归计算
例如h_\theta(x)=\theta_0+\theta_1x^2+\theta_2x^3
可以把x^2, x^3看作其它feature,x_2和x_3,还可以进行Feature Scaling
更高级的方法,就是自动寻找可能的model,而不是靠人为观察
2. Computing Parameters Analytically¶
2.1 Normal Equation¶
- 构造一个m*(n+1)矩阵X,包含所有x值,因为包含x_0,所以是n+1
- m维向量y,记录所有y值
- \theta=(X^TX)^{-1}X^Ty,\theta是n+1维向量
% pinv: inverse matrix function
pinv(X`*X)*X`*y
不需要进行Feature Scaling
n很大时,梯度下降依然可以工作,而此时Normal Equation会很慢,矩阵逆转复杂度为O(n^3)
n超过10000或更大时,可能开始考虑梯度下降
在一些复杂算法里,Normal Equation并不好用,梯度下降使用更多
2.2 Normal Equation Noninvertibility¶
不可逆性
有些矩阵不可逆,那么对于Normal Equation的X^TX,如何保证可逆?
- pinv: pseudo-inverse, 即使矩阵不可逆,也可以得到结果
- inv: inverse
方阵A可逆的充要条件为det(A)\ne0,一般在Normal EquationA^TA不可逆时,因为以下原因:
- Redundant features,即两个feature很相关,例如线性关系
- 过多特征,(n>>m)
3. Submitting Programming Assignments¶
submit()
4. Octave Tutorial¶
4.1 Basic Operations¶
% not equals
1 ~= 2
1 && 0
1 || 0
xor(1, 0)
A=[1 2; 3 4; 5 6;]
% randn: Guassian Distribution
4.2 Moving Data Around¶
% clear screen: Ctrl+k
% load text file
load('filename')
% list all variables
who
% list detail of variables
whos
% save file
save filename variable
% clear all variables
clear
% x(row, column)
x(1, 2)
x(:, :)
% add a column to A
A = [A [1; 2; 3]]
% A(:) put all elements of A into a single vector
% C = [A B] if A and B are matrixs with same size
4.3 Computing on Data¶
% matrix times matrix
A * B
% each element in a matrix times element in another matrix
A .* B
% each element get squared in A
A .^ 2
% each element in a vector divides 1
1 ./ V
% each element in a matrix divides 1
1 ./ A
log(V)
exp(V)
abs(V)
% Transpose of matrix A
A'
A < 3
% maximum value of each column of matrix A
max(A)
% returns magic square
magic(3)
4.4 Plotting Data¶
t = [0:0.01:0.98];
y1 = sin(8*pi*t);
y2 = cos(8*pi*t);
plot(t, y1);
hold on;
plot(t, y2, 'r');
xlabel('time');
ylable('value');
legend('sin', 'cos');
title('my plot');
print -dpng 'myplot.png'
figure(1); plot(t, y1);
figure(2); plot(t, y2);
subplot(1, 2, 1); % divides plot a into 1x2 arid, access first element
plot(t, y1);
subplot(1, 2, 2); plot(t, y2);
axis([0.5 1 -1 1]);
clf; % clear figure
% plot a color image for a matrix
imagesc(A), colorbar, colormap gray;
语句之间用','连接表示同时执行
4.5 Control Statements: for, while, if¶
% for statement
for i = 1:10,
disp(i)
end;
% while statement
while i <= 5,
disp(i);
i = i+1;
end;
% if statement
if i == 1,
disp('The value is one');
elseif i == 2,
disp('The valeu if two')';
else
disp('The value is not one or two');
end;
% define function, create functionName.m
function y = squareThisNumber(x)
y = x^2;
% 可以通过addpath(path)来添加函数路径
% 可以返回vector
function [y1, y2] = squareAndCubeThisNumber(x)
y1 = x^2;
y2 = x^3;
4.6 Vectorization¶
J_\theta(x)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta x^i-y^i)^2
J = sum((X * theta - y) .^ 2) / (2*m);
\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_ \theta(x^{(i)})-y^{(i)})x_j^{(i)}
theta = theta - (alpha / m) * X' * (X * theta - y);