Hi, this is Harold Dyck again with JMP. I want to talk about regression today using JMP. And, I want to use the data set called “Hollywood Movies” to show you how to find it. Clicking on the “HELP” button, Sample data Since I am looking for Regression, I am clicking here on “Regression”, and “Hollywood Movies” Right here, our variables are here on the left. We have 14 variables and 136 observations – 136 movies. And we have the variables like name of the movie, which studio produced, Rotten Tomatoes Score And Audience Score, Theme, Genre, The number of the theaters opening on the weekend, Box office Income per theater. Domestic Gross in millions, Foreign Gross, World Gross in millions of dollars, Production Budget in millions Profitability is a ratio of the World Gross divided by the Production Budget, and we have finally the Opening Weekend Gross I’m not gonna use all of these variables, but I’ve selected some to look at, Let’s start by clicking on “Analyze” and click “Distribution” and do a little bit of data auditing, Here is the “Rotten Tomatoes Score”, holding the CONTROL key down, I’m gonna click on the Number of Theaters on Opening Weekend, the Production Budget, and the Opening Weekend Gross. Those 4 variables, remember, I have my CONTROL key down and clicking on the “OK”. I get all the pictures of different distributions. One thing to notice, we got some outliers and hovering over a data point, we can see Opening weekend Gross is pretty high Harry Potter and the Deathly Hallows Part 2, and look at the other variable, Those are other movies, and, without doing too much exploratory analysis, I’m going to try to predict the World Opening Gross We can World Gross, based on these 4 variables here. So, let’s do Analyze, Fit Y by X. I’m only gonna be doing simple regression here, and here is the World Gross as my y-variable, and I’m gonna use Rotten Tomatoes score the Number of theaters on the opening weekend, Production Budget, and the Opening Weekend Gross will be my potential X-variables. I’ll be doing 4 simple regressions here, and you will notice because all those variables are with those blue triangle how many they are continuous variables, both of my y-variable and x-variable are contiguous, I’m gonna get scatter plots which I can do regression. I click on OK, and there are my 4 plots. Y-variable’s the same, and each of the 4 plots I have different X-variable. And, each of the 4 plots, I want to do a regression. For simple regression, regressing World Gross on each of the 4 independent variables individually. So, for simple regression, holding the CONTROL key down, allows me to apply the same operation to all 4 the graphs around there. I get the estimated regression equation in each of the 4 cases, and I can look around and see how it fits. Just eyeballing it. The variable of number of Theaters Opening Weekend doesn’t look very linear that might pose a problem. This variable for the Production Budget looks like the variation above the regression line is not constant. it’s not uniform. This breadth becomes larger. Above the line, we have larger Production Budget Some of that happens here, but not so much on the Opening Weekend Gross. Let’s concentrate on the Opening Weekend Gross, independent variable’s trying to predict the World Gross. You noticed R-squares 81.6% meaning about 81% of variation in y-direction has been explained by this regression. First thing I look at is the sign of this slope coefficient, the estimated slope coefficient, there is a positive slope coefficient meaning the larger is the Opening Weekend Gross, the larger is the world Gross, positive relationship. Estimated coefficient here means that for each one unit increase in Opening Weekend Gross, remember, it’s measured in millions, So, for each additional million dollar in Opening Weekend Gross, World Gross will go up by 7.7 times that. If I have an additional one million increase in Opening Weekend Gross, my World Gross will go up by 7.7 million dollars. Another thing I look at is the overall F and individual t, they give me the same information for a simple regression. When we look at the multiple regression, then we will get the different meanings of the individual t and overall F. In both cases, you see that variable is highly significant, the t-statistic is very very large, and I would say that this variable’s a good variable to include into our regression equation. I like to plot residuals. I am going to plot residuals for all of 4 regressions. I’ll hold CONTROL key down, and oops, not to click on this red triangle, CONTROL key down, Plot Residuals. Let’s look at the first plot across there for each of the 4 graphs. And, what I noticed is again, this doesn’t look very linear on this equation, the theaters on Opening Weekend. And, this variable, the spread, we have zero on blue dot on line, and finally the 4th regression again you see some variation about the line Since it’s getting larger, as the predicted viver was getting large. We look at the shape of the residuals here, we like it to be fairly normally distributed to be able to make good influences. Not too bad, but a little bit concern. Another plot that I look at is this Normal Quantile Plot more discussion in your book, but I’m looking for points, especially in the tail area that might go beyond the dots on line. The closer the points are to the straight line, that red line in the middle, and if they fall within the red lines around that straight line, and I think that distribution of the residual’s fairly normal. So, there you have it, an example of regression using JMP. Hope you enjoy it. Bye bye!