Alright hello everybody and welcome to the third machine learning and second regression tutorial video. Where we left off, I was asking whether or not the adjusted close column would be a feature or a label. And the answer is really a feature and possibly none of the above. Um… It could be a label if we hadn’t already kind of decided that we are using the high minus low percent um… or the percent change. For example so, you could use adjusted close as a label if, say, at the beginning of the day you were trying to predict what the close might be that day. But in this case with the given features that we have chosen, you really wouldn’t even know this value. Um, you wouldn’t know the high minus low and you wouldn’t know the percent change until the close had already occurred. So, if you trained a classifier to predict this value, um, that would be an incredibly biased classifier. So, just kind of you have to start thinking of these things. Is this even possible in the real world? Cause you can kind of find yourself doing things uh that seem like a great idea at the time, but then it’s actually not even possible to do. So, in our case, adjusted close will either be a feature, or not, none of the above in the sense that actually what we will do is take like the last 10 values of the adjusted close and that’s a feature. And, that’s most representative of when we actually go and dig in and write the algorithm ourselves , um, you would take maybe the last 10 values, and try to predict the future value. Anyway, more on that later. So the last, uh the last tutorial we did features, and now in this tutorial we are gonna define a label. So, since I just got done telling you that this is not gonna be a label, what actually would be the label? Well, it would be at some point in the future, the price. Okay. And the only price column we have anymore is adjusted close. Um, but what we wanna do is actually get the adjusted close in the future , maybe the next day, maybe the next 5 days, something like that. So, we need to bring in some new information, basically to get that information up into the future. So, let’s go ahead and close this out and begin working on that. So, first of all, um we want to…, we are gonna take, we are not gonna print the head anymore. And, first of all, the, we are gonna say forecast_column or col, is just gonna be equal to adjusted close. I’ll explain why we are gonna do that in a second but basically it’s just a variable and later on, you could change this variable to be a different forecast column. So, you might not be working with stock prices, there’s other things that you can use when you are regression on, of course machine learning other than stock prices. So, in the future, if you aren’t, you’re gonna just, you’ll be able to use very similar code. You’ll obviously change the code leading up to this point, but you just change forecast column to be whatever you want it in the future. And I’ll show you why when we get to the code, why that’s gonna be useful. Um now, what we’re gonna say is just in case there is not, not, uh missing data, so df dot fill na. So fill na is just fill any, as for not available or in pandas term it’s gonna be actually a na in most cases and that’s not a number. So now we are gonna fill na with a specific value we’re gonna do – 99,999 and we’re gonna say inplace equals true. So with machine learning, you can’t work with na data. So you actually have to replace the na data with something and, or you can get rid of that entire column, but you don’t want to get rid of data in machine learning in the real world you actually will find that you miss a lot of data. You are lacking maybe one column, but you have got the other columns, and you don’t wanna sacrifice data if you don’t have to. So you can do this and it will be treated as an outlier in your dataset. And again this is just one more reason why going through and doing the algorithm by hand. will help you understand so much better what kind of effect that is gonna have on the algorithm. So, you’ll be thankful that we go through it. And then basically you’ll learn through each algorithm why, uh…, what doing something like that will do? So anyways, that’s the choice, that’s the best choice in my opinion rather than getting rid of data. Now, we are gonna forecast out. This is a regression algorithm, generally you use regression to forecast out. You don’t have to but generally that’s what you are doing. So I am gonna define forecast out as the equal to being the int value of math dot ceiling, um, and the ceiling will be point 1 times the length of the df. So, first of all, what are we doing there? And also we need to import math. But, first, what are we doing there? math dot ceil will take anything and get to the ceiling. So let’s say the length of the dataframe was a number that was gonna return a decimal point, that was gonna be like point 2, right? Let’s say that was gonna happen. Math dot ceil will round that up to 1. So, math dot ceil rounds everything up to the nearest whole So, um, and then we are making it an integer value, um, just so, cause I think math dot ceil returns a float and we don’t really want it to be a float either. But anyway, uh, this will be the number of days out, so basically what we are gonna do here is we are gonna try to predict out 10 percent of the dataframe and you’ll see that actually when we go out and do this, it’s not like you’ll just get 1 point 10 percent out, you can get tomorrow’s price and the next days price and so on. Um, you’re just using data that came 10 days ago to predict today. Ok. So, um, feel free to change that, right? Maybe you want point 01, right? Maybe you want to just predict like tomorrow’s price or something. You can play around with that if you want. We are just making stuff up basically as we go. So if you wanna change that, by all means change it. So let’s go ahead and go to the top and import math before we forget. Okay, so now, we need a, the actual, so we’ve got labels, oh I am sorry we have got features, right? these are features, or these are our features and now we need that label, so now that we have forecast out we can create that label. So we’re gonna say df, and then the label column. The label will be the equal of df forecast column, so that’s why we used forecast column. that way if later on you decide to change something you’ll be able to just change this variable rather than all the feature variables. So it’ll be equal to the df forecast column and then we are gonna do a dot shift minus forecast out. That’s why we needed it to be an int cause we are basically shifting the columns. So, what we’ve done is we are shifting the columns negatively. So it’ll go, basically if you have a column here it’ll get shifted up, the spreadsheet almost. This way, each row, the label column for each row will be adjusted close price 10 days into the future. Okay? So that’s our label, so our features are these attributes of what, in our mind, may cause the adjusted close price in 10 days to change or 10 percent. So actually this will be much greater than 10 days cause we didn’t even specify the timeframe So, we can tinker with this number later, it’s really not that important. Um, regression, you aren’t gonna get rich on just this algorithm, I promise you. But it’s actually good, you’ll find this actually not a bad model of stock price. And as you add more useful features, it can get, it can get pretty good. But, anyway, um, so now we have our label column and let’s go ahead and print df dot head again. So this just prints like the first 5 rows of the dataframe. Again if there’s anything we are doing with pandas that you are like “What’s going on?”. Ask and I can point you to tutorial because I’ve got I’ve done tutorials based on everything that I am gonna be doing. Um, ok, so these are, our each of these column features and then we finally have a label column that we’ve kind of, this is time into the future, um, for our data. So, now what we’re gonna go ahead and do is… in fact let me do a df dot, let’s do a df dot tail and also let’s just do a df dot drop na and then in place equals true. Cause those are some awful high numbers, for 10 percent out. Interesting. So I guess prices changed that much by that shift. So let’s try a smaller shift. Um, fascinating, that that would be 10 percent out. That’s a little better. Maybe, maybe we’ll use that point 01. Let’s use that one, cause the other ones were just so huge. So let’s go back to head and see if, if that number. So if, if you are not following, I am just comparing the forecast price to the adjusted close price. So of course when the when the stock price opens, this is actually a significant percent change, right, from 50 to 66, but the stock just came out and of course google does very well in time. So, so yeah, but anyway, yeah I think I’ll go with point 01 for now. Or oo oo we’ll, we’ll mess with bot whenever we go to predict some stuff Anywway. That’s it for this one. So we have done features, we got our label and now we’re actually ready to train, test, predict and actually run this algorithm on some realistic data so Stay tuned for that. If you have questions, comments, concerns, whatever up to this point Feel free to leave them below. Otherwise I was allways thankfull whatch these instructions and it’s till next time.