Since two months ago I’m researching about machine learning and its algorithms. The goal is get a good unsupervised and clustering algorithm to analyze every android applications and predict what application you want to install or use in a particular time. The first step is learn and understand the theory of machine learning. For this, I began to study the Machine Learning Course of Stanford. It’s a great and practical course with videos and material to help understand the classes.

The first model that I have studied is linear regression. This model consist in have a relation between two or more variables. For example, in my example I have a training data about the prices of the houses and its size in square meters. This training data is used to build a linear regression model to predict the prices of the house give the size of the house. As you can see in the following figure, the black dots show the training data (I did web crawling to get real data). The blue line represents the trend line of the model, and the red dots show the predicts for two size of houses.

You can see the prediction values that are represented by red dots in the figure.

For a house with100 meters, we predict a price of202906.39 eurosFor a house with175 meters, we predict a price of354343.54 euros

You can use Octave or R to practice and implement machine learning algorithms. Personally, I prefer to use a great and FLOSS library that I found, scitik-learn. This library has several implementations of linear models: LinearRegression, RidgeRegression, Lasso, Elastic Net, …. For this example, I have used the SGDRegressor model but, I want to test other models more smooth like Lasso or Elastic models.

You can get the source code of this example in my GIT repository