吴恩达公开课-03
1. Classification and Representation¶
1.1 Classification¶
y -> {0, 1}
Linear Regression并不适用Classification问题
Logistic Regression: 0\le h_\theta(x)\le1
虽然叫Regression,但其实是一个Classification Algorithm
1.2 Hypothesis Representation¶
h_\theta(x)=g(\theta^Tx)
g(z)=\frac{1}{1+e^{-z}} (Sigmoid function/Logistic function)
h_\theta(x)=\frac{1}{1+e^{-\theta^Tx}},所以h(x)-0.5=0.5-h(-x),即函数图像关于y=0.5对称
含义: h_\theta(x)=P(y=1\vert x;\theta)
和Linear Regression相似,得到以上h_\theta(x)后,所做的就是估计\theta的值
1.3 Decision Boundary¶
\theta^Tx=0: Decision Boundary
即分割两类点的Boundary,以两维为例,画x2-x1图,那么\theta^Tx=0就是一条直线,这条直线就是Decision Boundary
对non-linear decision boundary,可以像linear regression中,引入一些多项式项,如h_\theta(x)=g(\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_1^2+\theta_4x_2^2)
Training Set -> \theta -> Decision Boundary
2. Logistic Regression Model¶
2.1 Cost Function¶
同样,也是有了m个Training Example后,确定\theta,需要确定一个Cost Function,然后进行最小化
如果采用和Linear Regression同样的欧氏距离,所得的Cost Function non-convex,无法确保到达全局最小值
y=1时:
- h_\theta(x)\to0,Cost(h(x), y)\to\infty
- h_\theta(x)\to1,Cost(h(x), y)\to0
y=0时,则相反
- h_\theta(x)\to0,Cost(h(x), y)\to0
- h_\theta(x)\to1,Cost(h(x), y)\to\infty
2.2 Simplified Cost Function and Gradient Descent¶
J: Convex Function
最小化J(\theta):梯度下降,即重复
推导:
与Linear Regression中梯度下降公式相同
同样可以通过Feature Scaling让Logistic Regression的梯度下降运行更快
2.3 Advanced Optimization¶
Optimization Algorithm: Minimize J(\theta)
相比梯度下降,以下算法不需要选择\alpha,而且运行更快
需要提供J(\theta)和\frac{d}{d\theta_j}J(\theta)的函数
1. Conjugate Gradient
2. BFGS
3. L-BFGS
3. Multiclass Classification¶
3.1 Multiclass Classification: One-vs-all¶
one-vs-all
对多种类的分类,看做多个双种类的分类问题,分别应用Logistic Regression
分类时,应用k个h函数,取最大值
4. Solving the Problem of Overfitting¶
4.1 The Problem of Overfitting¶
Underfit: high bias
Overfit: High variance e.g. high polynomial fail to generalize to new examples
Addressing Overfitting
- Reduce number of features
- Regularization: reduce magnitude of \theta_j
4.2 Cost Function¶
例如对4次的high polynomial,通过penalize将\theta_3 \theta_4变小,比如给J(\theta)加上1000\theta_3^2+1000\theta_4^2
Regularization: Small values for parameters \theta_j
4.3 Regularized Linear Regression¶
1. Gradient Descent
这里是j=1开始,就是说不考虑常数项
对于\theta_{j>0}
2. Normal Equation
原来: \theta=(X^TX)^{-1}X^Ty
加入Regularization,即加一个矩阵 \theta=(X^TX+\lambda \left[\begin{matrix} 0\\ &1\\ &&1\\ &&&\ddots\\ &&&&1 \end{matrix}\right] )^{-1}X^Ty
并且,加上这个矩阵后,括号内矩阵一定可逆
4.4 Regularized Logistic Regression¶
1. Gradient Descent
加上\theta项 J(\theta)=\frac{1}{m}[\sum(-ylogh-(1-y)log(1-h))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2
所以梯度下降时,对\theta_{j>0},之前
现在 \theta_j=\theta_j-\alpha[\frac{1}{m}\sum_{i=1}^mx_j(h-y)+\frac{\lambda}{m}\theta_j]
2. Advanced Optimization Techniques
在给出J(\theta)和\frac{d}{d\theta_j}J(\theta)时,进行修改