Hi - Jim Hudson. OakTreeStaffing.com. Today, we're going to kind of wonder through Azure Machine Learning. Azure Machine Learning is an amazing new tool from Microsoft that allows us to feed data to a series of predictive models and basically foretell the future. If you have data and you have questions that you would like answered, if you want to predict future performance based on past history, Azure Machine Learning is an amazing tool.
So, we're going to go to the studio and take a look at creating a predictive model and then also how we can use it. So, I've gone to studio.azureml.net and I've created a free account. All you'll need is an email address to create your free account. And essentially what I've done is I've gone in and I've got some price data here. So, if I just go "Visualize" my price data, the basic idea is, I have 205 rows of price data about a bunch of new cars and essentially what I want to be able to do is I want to be able to predict the price of a car based on a certain attributes about the data. So, we've got all sorts of things -- like the number of doors and the number of wheels and where the engine is located. Essentially what we want to do is to predict price. Now, in my original data set, I actually have the price -- and this is sometimes why this is referred to as "Supervised Learning." We're doing predictive analytics and we're doing a subset of predictive analytics called "Supervised Learning." The idea is that we have some data that we already know the answer to. I know how much this car costs. This car costs $13,495. But the idea is, we're going to use this data to train a model so that when we're finished that programmatic model will be able to predict -- if we feed it the same kind of data -- it will be able to predict the price of a given car based on that data. So, what we've done is we've cited -- we've logged into Azure Machine Learning Studio, we've created a new experiment, and we've cited our data. And the next thing we want to do is we want to look at our data and decide, "Alright, so what columns do I think might be interesting?" Honestly, when you go to do this outside of a classroom, this is really where you're going to spend a lot of your time because Features Selection is as much art as it is science. But, we've gone in and we've taken a look at this tool. We've decided out of the, you know, 20 or so columns which ones we're going to use. So, the next thing we've done -- and again, this is because it's Supervised Learning, we've grabbed a split data module. A split data module is going to allow me to send most of the data to train the model, but also keep some data back to be able to score the model.
So, we've split the data, we've set it to 70% data and 30% scoring. We've connected it to a train model module and then we've also chosen a linear regression algorithm. And there are lots of different algorithms that you can choose with Azure Machine Learning, but linear regression is really good for just predicting a discrete numeric value based on a variety of data inputs. So, we've chosen a linear regression model, we've hooked it to the train model module and then we've taken the other output of our split data module and we've connected it to a score model. And then also connected it to an evaluator model. So, when we're finished and we run this, what it's going to allow us to do is it's going to adjust the data, its going to train my algorithmic model -- my algorithmic linear regression model -- and then it's going to score it and evaluate it. And once we're finished then, we can go in and we can go, "Hey, I want to look at my evaluation results." And click Visualize. And this is just one of the amazing about this. It allows us to see the error, it allows us to see our coefficient determination. There's obviously a lot of detail behind this, but at the end of the day, the coefficient determination is one of the values that we're looking for. This tells us how accurate it is. And, essentially what we've got -- and with the coefficient of determination, we want this value to be as close to one as possible. .87 is not perfect, but it's getting us close. And one of the great things that we can do with this is -- one of the things that makes Machine Learning so amazing is the idea that, as I get more data, I can continue to train this model and over time, it will become more accurate.
So, at this point, we've got something that, at least, is working and is giving us a pretty good output. So, what we want to do is we want to set up our web service. So, the next thing we would do is we would go to set up our web service, we would create our predictive experiment, and once we're finished with that, then we would publish a web service. Now, once we publish the web service, what we've done is we've generated a couple of REST endpoints. Representational State Transfer, of course is really becoming the common denominator of application programming. It allows us to take virtually any service and just hang it on a web endpoint. So, essentially, we have -- once we're finished with this and once we've published it to our web service, then we have a couple of endpoints. Once is the request response endpoint. A request response endpoint would allow us to sit down and use either an application or Excel or whatever and basically post one line of data to this web service and get a response back really quickly. But we also have something called the Batch Execution REST end point. The Batch Execution REST endpoint then would allow us to submit a batch -- hundreds or thousands or millions of records to our web service and then get a prediction based on the trained model that we've created. So, what we've done is we've published this, it's ready to go, and now we want to test it. So, I've gone to the web page and I've downloaded kind of the Excel test environment and I can see that when I open the Excel file that the web service provides for me that the Excel 2013-- actually, excuse me, Excel 2016 it is connecting to the web service. And it has some kind of test data if I'm really just trying to test this and really begin to get my head around how the tool works. So, I'm going to click Use Sample Data and its just going to grab some sample data for me. And then I'm going to say, "Hey, use this sample data." And then finally I have to decide, "OK, where do I want my test output to go?" So, I'm just going to give it cell here that basically identifies, "Hey, here's where I'd love for my output to go." And it's telling me, "Hey, it's going to override the values." I click Predict and in the background, what it's done is, from my Excel spreadsheet here, it has round-tripped that REST endpoint on Azure. It's fed it this test data and its come back with these scored labels -- which are predictions based on this input data of how much a particular car would cost.
Now, this has been just an incredibly simple, very fast example of what we can do with Azure Machine Learning, but we can use this to train models to predict virtually anything. We can do classification. We can do clustering. We can do lots of different algorithm types. We can even write our own algorithms using R -- the R statistical language in order to predict future performance based on past history.
Want More? Check out this brand new class we've just started offering:
Perform Cloud Data Science with Azure Machine Learning