Machine Learning on Guocheng's Space
https://wei170.github.io/tags/machine-learning/
Recent content in Machine Learning on Guocheng's SpaceHugo -- gohugo.ioenMon, 22 Jul 2019 00:00:00 +0000Week 7 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-7/
Mon, 22 Jul 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-7/Additional Note for Improving Deep Neural Network
https://wei170.github.io/blog/coursera/ml/improve-dnn/
Sun, 07 Jul 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/improve-dnn/Practical aspects of Deep Learning Regularization What we learn in Week 3 is L2 Regularization.
L1 Regularization is without the square of the $\theta$.
Implementation tip: if you implement gradient descent, one of the steps to debug gradient descent is to plot the cost function J as a function of the number of iterations of gradient descent and you want to see that the cost function J decreases monotonically after every elevation of gradient descent with regularization.Week 6 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-6/
Sat, 06 Jul 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-6/Deciding What to Try Next Errors in your predictions can be troubleshooted by:
Getting more training examples Trying smaller sets of features Trying additional features Trying polynomial features Increasing or decreasing $\lambda$ Model Selection and Train/Validation/Test Sets Test Error $$ J_{test}(\Theta) = \dfrac{1}{2m_{test}} \sum_{i=1}^{m_{test}}(h_\Theta(x^{(i)}_{test}) - y^{(i)}_{test})^2 $$
Just because a learning algorithm fits a training set well, that does not mean it is a good hypothesis. The error of your hypothesis as measured on the data set with which you trained the parameters will be lower than any other data set.Week 5 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-5/
Wed, 03 Jul 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-5/Neural Network Cost Function $$ \begin{gather} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2 \end{gather} $$
Some notations:
L = total number of layers in the network $s_l$ = number of units (not counting bias unit) in layer l K = number of output units/classes Note:
The double sum simply adds up the logistic regression costs calculated for each cell in the output layer The triple sum simply adds up the squares of all the individual Θs in the entire network.Week 4 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-4/
Mon, 01 Jul 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-4/Non-linear Hypotheses If create a hypothesis with r polynominal terms from $n$ features, then there will be $\frac{(n+r-1)!}{r!(n-1)!}$. For quadratic terms, the time complexity is $O(n^{2}/2)$. Not pratical to compute.
Neural networks offers an alternate way to perform machine learning when we have complex hypotheses with many features.
Neurons and the Brain There is evidence that the brain uses only one “learning algorithm” for all its different functions. Scientists have tried cutting (in an animal brain) the connection between the ears and the auditory cortex and rewiring the optical nerve with the auditory cortex to find that the auditory cortex literally learns to see.Week 3 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-3/
Sat, 29 Jun 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-3/Classification Now we are switching from regression problems to classification problems. Don’t be confused by the name “Logistic Regression”; it is named that way for historical reasons and is actually an approach to classification problems, not regression problems.
Binary Classification Problem y can take on only two values, 0 and 1
Hypothesis Representation We could approach the classification problem ignoring the fact that y is discrete-valued, and use our old linear regression algorithm to try to predict y given x.Week 2 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-2/
Fri, 28 Jun 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-2/Mutiple Features Linear regression with multiple variables is also known as multivariate linear regression.
The notation for equations:
$$ x_j^{(i)} = \text{value of feature } j \text{ in the }i^{th}\text{ training example} $$
$$ x^{(i)} = \text{the input (features) of the }i^{th}\text{ training example} $$
$$ m = \text{the number of training examples} $$
$$ n = \text{the number of features} $$
The multivariable form of the hypothesis function:Week 1 - Machine Learning
https://wei170.github.io/blog/coursera/ml/ml-stanford-1/
Thu, 27 Jun 2019 00:00:00 +0000https://wei170.github.io/blog/coursera/ml/ml-stanford-1/The Hypothesis Function $$\hat{y} = h_\theta(x) = \theta_0 + \theta_1 x$$
Cost Function To measure the accuracy of the hypothesis function. This takes an average (actually a fancier version of an average) of all the results of the hypothesis with inputs from x’s compared to the actual output y’s.
$$J(\theta_0, \theta_1) = \dfrac{1}{2m} \displaystyle \sum _{i=1}^m \left( \hat{y}_i- y_i \right)^2 = \dfrac{1}{2m} \displaystyle \sum _{i=1}^m \left (h _\theta(x_i) - y_i \right)^2$$