Here is an attempt to analyze why Leonardo DiCaprio didn’t win the Oscar.I’m not a super fan of him nor the Oscar Awards, but some meme inspired me with an idea for an analysis.
I’m currently taking MITx Analytics Edge from edx.org. It’s a really cool course focusing more on analytics than in syntax or coding. I never plan to do the problem sets and read all the lectures but I found myself enjoying it, and doing all the problem sets. Hence, I’ll be trying to do an analysis every other week to try to use what I learn from this course.
I might bore you with the details of my story so lets start with the findings first, and if its up to you if you want to keep reading.
According to our model: The odds of a nominee winning the best actor award based in the nomination of Art Direction on the same movie is 0.45, which is less than 1; This means if a movie is nominated for Art Direction it is less likely that the nominee will win the best actor for the same movie. Accordingly, if the nominee won its previous nominations for best actor, the odds of winning is 0.25; Again, it is less likely for the nominee to win the Oscars. However, if the actor won the Golden globe male lead actor award for drama, the odds are high 15.82. Same thing if the actor won the Screen actor’s guild award, the odds are 12.05.
The model accurately predicted that Philip Seymour Hoffman won the 2005 academy award for best actor, with a 97.27% probability for the film, Capote. Capote was not nominated for Art Direction and he didn’t have any nominations (and winning before 2005). He won Best Actor for drama in Golden Globe and won the Screen Actor’s guild for male actor.
I think this data is used to predict the 2006 Oscar Awards and the data is collected prior to the ceremony, so I did the liberty of using year 2006 as a second test set. For the 2006 test set, the model accurately predict that Forest Whitaker won the best actor award with the movie, The Last King of Scotland.
Now what do we have for Leonardo Di Caprio? In 2004, Di Caprio is nominated; won the golden globe award for best actor- drama but not the screen actors guild award. He also did not have any previous nominations, which is good in reference to the model, but the movie, The Aviator is nominated for Art direction.
In 2006, Blood Diary, the movie is not nominated for art direction, he also did not win previous nomination, but he did not win screen actors guild nor Golden Globe award. With this model, let’s try to think about the recent movie and nomination, The Wolf of Wall Street. The wolf of wall street was not nominated for art direction, again leo did not win previous nomination. Though he won Golden Globe for best actor, category is in comedy and not in drama, which contradicts our model. He also did not win the Screen Actors Guild Award.
Anyhow, the internet still loves leo, yey. As for me, I like him in Titanic, Inception, The Beach and Catch Me If You Can.
got this from buzzfeed:
Let’s go back to the Analysis.
My Idea is to analyze whether nominations from other categories affects the Best Actor Award. I would also like to add the movies, their categories, actors age, how many award winning movies has the actor been a part of, and other factors. However, data isn’t available for those variables, and gathering these data would require more time and to code my own scraper. I am aware of the danger zone .
Let’s just say my goal for this analysis is to learn about the methods and try to tell story with what I found out. Ill be better next time and try to avoid danger zone and use some useful data. Surprisingly, I still got an accurate prediction.
I found the data from this site. In the data I have 1631 observations and 62 variables. Variables are nominations from other categories. It also includes Previous nominations, golden globe nominations and winnings, and screen actor guilds.
Choosing the data:
I’m only interested in studying the best actor variables, so I’m gonna remove unnecessary variables and will focus on the movies where one of the lead actor is the nominated. I used observations from year before 2005 as my train set and observations from year 2005 as my test set. I’ve included another test set, which is year 2006, since this data was used to predict the 79th Oscar Awards. I did try other combinations of training and test set, but I found this the best because it uses more data for the train set to have a more accurate prediction in the test set.
The best explanation I heard about the baseline model is it is the best stupid guess for an outcome, from here. So my stupid guess is that if number of nominations(excluding best actor nomination) is higher than 5, it is more likely that the nominee will win.
I got an accuracy of 35.96%. This means that if a nominated movie has more than 5 nominations it could predict that the best actor is gonna win with 35.96% accuracy.
Building regression models:
After building logistic models and removing multi collinear variables, I found out that the significant factors are the following: if the movie is nominated for Art Direction; previous lead actor win the nominations; male lead actor wins Golden Globe for drama; and Male nominee won the screen actor’s guild award.
[^1]: Andrew Gelman; Jennifer Hill, 2007, “Replication data for: Data Analysis Using Regression and Multilevel/Hierarchical Models”,
Here is the code for my analysis. Analytics Edge is so coooool.