[勉強 ノート] Summary: Gradient Descent vs Ordinary Least Squares

Note: I’m not the expert in the machine learning area, so this is not a serious tutorial for the machine learning techniques. If you want to learn more about it, I highly recommend you to view some other video on youtube or coursera. e.g., Andrew ng’s tutorial series or 3Blue1Brown’s video if you want to have an overview of this area, or coding train, if you want to have fun with it :p

links are provided below.

According to the tutorials that introduced by the coding train, this time I’m trying to make a simple machine learning recipe (or you can call it algorithms) trying to predict some value based on the given input (in this stage, it still cannot called “learning”, but it is kind of the sprout of machine learning).
The scenario is simple; we want to predict the output based on some random data sets ( this means there’s no strong relationship between those data).
e.g. an simple ice cream sold relationships vs tempeture

20190201184813.jpg

(1.0)

From the commons sense, we know it as the temperature -> higher, the more ice creams will be sold, so we can make the prediction by make a line or curve, that fits into those data, as my guess.

20190201185203.jpg

(1.1)

(in this case i choused the line, in order to make things easier)

so this time, i can give any temperature once i got the correct line function, that our simplest model wow~

however, how to figure out the function of this line, this is the problem, there are so many methods to achieve that, but this post, I will only focused on the techniques that introduced by the coding trains which are:

  • Ordinary Least Squares (OLS)
  • Gradient Descent (GD)

Ordinary Least Squares (OLS)「最小二乗法さいしょうにじょうほう

The first technique is called Ordinary Least Squares, the idea of this methods is to compares all the distance of these data, and trying to find a line that minimizes those points.

so a diagram demonstration will be like this:

20190201191937.jpg

(1.2)

so how to find “My guess” line, there is serveral steps:

Step 1: Find the squares of these values

so if we think these distances from the point to line as (d0,d1,d2,….dn), the sum of those discences suqare will become


so expend it, will becomes (d0^2 + d1^2 + d2^2 + … + dn^2).

you might wondering why square it, that is because for the distances, since we don’t really care about negative or positive values, square them up will get rid of all the negative values.

Steps 2: Represent the model mathematically

NOTE: you might skip the sentences below if you think is useless.

========= USELESS PART START =========

Well, here comes from the point, the computer can’t understand what it looks like, you can’t just tell the computer there is a line and wish the magic will happen, instead what you want to do is try to represent it by the math. O (*≧▽≦)ツ┏━ (I know math sounds a bit scary  but that is really how things work in computer, and if you spend more time really play around with the math, you will find math is actually very 「tsundere (ツンデレ)」,<- I don’t know how to say it in English, but whatever =w=) and in here I’m trying to have fun with it .

========= USELESS PART ENDS =========

so the firstly things to do is to find the equation that can be represented a line, for anyone who have completed the high school, or even the middle school, we know that the general equation of a line is

(where the m is the gradient or the slope and the b is the intercept of y axis when x = 0)

By using this equation, we can easily to represent a line in 2D space with any gradient or y intercept, so this means the Machine learning Recipes ( or you can call it algorithm) will have two parameters that we can control, that’s m and b !

from now on, we can do whatever we want to this line, so the next step is to some how find a way to adjust these two parameters in order to make it fit in to the data set like shows in diagram (1.1).

Steps 3: Adjust the parameters according to … what magic? ✧(≖ ◡ ≖✿)

well, i just kidding, the way to adjust these parameters is by using 「Ordinary Least Squares」最小二乗法 as i mentioned above, so we need to find out the how to evaluate m and b.

the way to find out the how slop or the gradient the line is, is to use the ordinary least squares formula:

(Source From https://sci-pursuit.com/math/statistics/least-square-method.html)

this will give the best gradient that fits into these data sets, so we can emulate the m easily, so problem solved 🙂

then how can i emulate the b?, well that’s get back to previous equation of the line, so the equation for a line is

then for the b, we can rewrite this equation.

however in here we are considering the single line that can fit into the whole data sets, so we need to change this equation a little bit, so it becomes like this:

(where the x and y in here with a bar in top means the average「that means sum all the values up and divided by how many of them」)

Finally

Since we got what we needs (the equation of line and the formulas for emulate the m and b), all we left to do is HAPPY CODING!!! o(〃’▽’〃)o

Solution (Using processing.js)

let data = []
let circleSize = 10
let m = 1;
let b = 0;
function setup() {
    createCanvas(800,600)
    background(50)
}
function mousePressed() {
    //console.log(mouseX,mouseY)
    let x = map(mouseX,0,width,0,1)
    let y = map(mouseY,0,height,1,0)
    let point = createVector(x,y)
    data.push(point)

}

function draw() {
    background(50)
    for(let i = 0; i < data.length; i++) {
        let x = map(data[i].x,0,1,0,width)
        let y = map(data[i].y,0,1,height,0)
        fill(255)
        stroke(255)
        ellipse(x,y,circleSize,circleSize)
    }
    linearRegression()
    drawLine()
}

function drawLine() {
    let x1 = 0;
    let x2 = 1;
    let y1 = m * x1 + b;
    let y2 = m * x2 + b;
    //Scale
    x1 = map(x1,0,1,0,width)
    x2 = map(x2,0,1,0,width)
    y1 = map(y1,0,1,height,0)
    y2 = map(y2,0,1,height,0)
    stroke(255)
    line(x1,y1,x2,y2)
}

function linearRegression() {
    let sumOfX = 0
    let sumOfY = 0
    for(let i = 0; i < data.length; i++) { 
        sumOfX += data[i].x
        sumOfY += data[i].y
    }
    let averageX = sumOfX / data.length
    let averageY = sumOfY / data.length

    let above = 0
    let below = 0
    for(let i = 0; i < data.length; i++) {
        above += (data[i].x - averageX) * (data[i].y - averageY)
        below += (data[i].x - averageX) * (data[i].x - averageX) 
    }


    m = above / below
    b = averageY - m * averageX
}

[勉強 ノート] Summary: Gradient Descent vs Ordinary Least Squares」への2件のフィードバック

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です

このサイトはスパムを低減するために Akismet を使っています。コメントデータの処理方法の詳細はこちらをご覧ください