Project 2: Metacritic User Rating Prediction

July 15, 2023

Why it matters?

Oftentimes, you see a bad review of a movie from critics and think that you won’t enjoy the movie. However, when you go to a theater and watch the movie for yourself, you find out you actually enjoy the movie more than you thought. This happened to me when I watched Extraction 2, Transformers: Rise of the Beasts, and Guardians of the Galaxy Vol. 3. The opposite situation also happened, and sometimes, critic ratings are similar to user ratings such as Fast X and John Wick 4.

So it seems like critics and users rate the same movie differently, and that leads to different overall ratings. It makes sense because while users generally watch movies for enjoyment, critics rate movies in terms of art. Thus, if users solely believe in what critics say about a movie and watch it, they might not get what they want from watching the movie.

What if we can predict users’ ratings with text features from critic reviews? If this is possible, this could empower users to make more informed decisions about which movies to watch, ultimately giving them a better cinematic experience.

Overview

Websraped movie info, critic reviews, and user reviews from Metacritic website using BeautifulSoup with Python and stored in SQLite database.
Extracted text sentiment features using nltk.sentiment.SentimentIntensityAnalyzer and keywords using KeyBERT.
Analyzed scraped data using plotly to analyze trends associated with user ratings.
Improved mean absolute error by 37.69% when predicting user rating using features from critic reviews compared to assuming user ratings are equal to critic ratings.

Data Collection

Webscraped using this script.

Original dataset

1. `Movie` table

The following are sample rows from the Movie table that contains movie information from Metacritic as of 2023 June 16.

movie_id	movie_title	movie_year	movie_summary	movie_critic_rating	movie_user_rating	movie_runtime	movie_genre	movie_subgenre	movie_sub2genre	movie_rating	num_critic_pos	num_critic_mix	num_critic_neg	num_user_pos	num_user_mix	num_user_neg
-9222320398758708885	Generation War	2014	In Berlin in 1941, on the eve of Hitler’s inva…	57	7.4	270	Drama	History	War	Not Rated	4	10	0	34	7	6
-9219956025103179742	Gun Crazy	1950	A well meaning crack shot husband is pressured…	74	7.8	87	Drama	Thriller	Romance	Passed	11	3	0	4	1	0
-9218594262706343994	Iluzija	2006	Tortured by a tumultuous home life and school …	61	8.1	107	Drama	None	None	Not Rated	5	2	1	14	0	3
-9216529779443115283	Fired Up!	2009	Shawn and Nick are top scorers on the Ford Hig…	31	5.7	90	Comedy	None	None	PG-13	2	7	9	19	12	10
-9212162946756563061	Uncharted	2022	Street-smart Nathan Drake (Tom Holland) is rec…	45	6.2	116	Action	Adventure	None	PG-13	7	33	4	451	169	175

2. `Critics` table

The following are sample rows from the Critics table that contains critic review information of movies from the Movie table.

critic_review_id	movie_id	critic_name	critic_review	critic_date	critic_rating
145643	-1562356960487678241	Joshua Land	The result may be better suited for classroom …	None	60
163070	-8573132570105916711	Peter Bradshaw	This is a strong, fierce, heartfelt movie.	Nov 30, 2022	80
208027	-433637604116680176	Amy Nicholson	The screenplay, adapted by Peter Craig, Hillar…	Dec 9, 2021	40
191435	-1173456695746148219	Stephanie Zacharek	Disappointingly tame.	None	50
104342	-4403818285690920111	Tim Grierson	Despite its shortcomings, American Made can be…	Sep 28, 2017	72

3. `Users` table

The following are sample rows from the Users table that contains user review information of movies from the Movie table.

user_review_id	movie_id	user_review	user_date	user_rating
74238	-3697458530605139559	Begginers is my favorite romantic movie and on…	Oct 26, 2020	10
11168	4282306499439292174	This movie feels like Imagine walking through …	Jan 25, 2020	9
310577	-6871826492128334529	The movie might lose its focus in the second h…	Aug 27, 2010	6
185273	7131247308218877535	The one I really like was tribute to chadwick …	Feb 2, 2023	5
164346	-781771689375929617	Enough scares to keep you on the edge of your …	Dec 21, 2020	8

Data Preprocessing

1. Extracting Sentiment Scores

With SentimentIntensityAnalyzer from the nltk package, I extracted negative, neutral, positive, and compound scores of combined reviews from each movie.

from nltk.sentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()
polarity = sia.polarity_scores(text)

2. Extracting Keywords

With the KeyBERT from the keybert package, I extracted keywords from combined reviews for each possible rating from 0.0 to 10.0.

kw_model = KeyBERT(model='all-MiniLM-L6-v2')
kw = kw_model.extract_keywords(corpus, 
                               keyphrase_ngram_range=(1, 1), 
                               stop_words='english',
                               use_maxsum=True,
                               top_n=20)

From the code block above,

keyphrase_ngram_range: Controls the ngram range of extracted keywords
use_maxsum: Controls the combination of words from 2 x top_n words so that sum of cosine similarity becomes the maximum value to diversify the result
top_n : Controls the number of keywords to extract

Final Dataframe after Preprocessing

movie_year	movie_critic_rating	movie_user_rating	num_critic_pos	num_critic_mix	num_critic_neg	mean_neg	std_neg	mean_neu	std_neu	mean_pos	std_pos	mean_compound	std_compound	Action	Adventure	Animation	Biography	Comedy	Crime	Documentary	Drama	Family	Fantasy	History	Horror	Music	Musical	Mystery	Romance	Sci-Fi	Sport	Thriller	War	Approved	G	NC-17	NR	Not Rated	PG	PG-13	Passed	R	TV-14	TV-G	TV-MA	TV-PG	Unrated	sophisticated	stylish	skillful	richly	charismatic	write	entertain	easygoing	cinematic	comic	sharp	lyrical	instructive	seductive
2018	80	7.3	22	0	0	0.0898182	0.10236	0.741182	0.175149	0.169	0.119288	0.259555	0.42102	False	False	False	False	False	False	False	False	False	True	False	False	False	False	False	False	True	False	True	False	False	False	False	False	True	False	False	False	False	False	False	False	False	False	0.2915	0	0.3255	0	0.4243	0	0	0	0.3182	0	0	0	0	0
2013	37	4	5	6	10	0.0489524	0.0697262	0.79	0.0803847	0.161095	0.0894974	0.438767	0.415727	True	True	True	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False	False	False	False	False	False	0	0	0	0	0	0.2503	0	0	0	0	0	0	0.2513	0
2016	44	4.8	2	8	4	0.0680714	0.0749127	0.782357	0.150659	0.149429	0.114685	0.222329	0.442291	False	False	False	False	False	False	False	True	False	False	False	False	False	False	True	False	False	False	True	False	False	False	False	False	False	False	False	False	True	False	False	False	False	False	0	0.3621	0	0	0	0	0	0	0	0	0	0	0	0.3782
2006	57	7.7	6	5	1	0.073	0.0795647	0.677917	0.234736	0.249167	0.185589	0.470967	0.343457	False	False	False	False	False	False	True	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False	False	False	False	False	False	False	0	0	0	0.2622	0	0	0.3241	0.2933	0	0	0	0.4796	0	0
2002	48	5.6	10	19	6	0.0549032	0.101778	0.813774	0.136019	0.131323	0.0995089	0.274816	0.465311	False	False	False	False	True	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False	False	False	False	False	0	0	0	0	0	0	0	0	0	0.2505	0.2358	0	0	0

Visualizations

Number of movies

Observations

Data were collected in 2023, June 16th, so 2023 has much fewer movies than years before.
The early and mid-1900s have fewer movies than in recent years. This could be an actual difference in movie production or not all films are recorded from the early and mid-1900s.
There are two noticeable declines. One starts in 2006. Another one starts in 2014.

Number of reviews comparison

Observations

The number of reviews increased around the 1990s. Three possible explanations are 1) the Internet became popular around mid-1990 2) the number of movies on the Metacritic website increases around mid-1990 or 3) the website was created in 1999.
From 1999 to 2004, critics left more reviews than users. This trend changes after 2004.
The critic reviews line plot follows the number of movies by year trend.
The user reviews line plot tends to fluctuate more.
Both line plots show a decline in 2020 (the year covid started). Between the two, the user line plot shows a larger decrease.

Mean rating trend comparison

Caution

User rating ranges from 0 to 10, whereas critic rating ranges from 0 to 100. Thus, to match the range, I multiplied user ratings by 10.

Observations

Mean user rating tends to decline, whereas mean critic rating declines at first and increases after 2000.
For movies before 1970, mean critic ratings tend to be higher than mean user ratings. However, after 1970, the trend reverses. This trend reverses again in 2020.
- This could be because of how critics and users rate different genres differently and different genres were popular at different times. Or it could be because of a difference in what they value from watching a movie.
- Recall that not a lot of movies from the early 1900s are on the Metacritic website. It could be that only critically acclaimed movies are on the website such as Citizen Kane, Vertigo, or Rear Window. All these films have a rating of 100 out of 100 from critics but below 10 out of 10 from users. This might be the reason why mean critic ratings are higher than mean user ratings in the early 1900s.

Rating distribution comparison

Observations

Critics rating distribution is closer to a normal distribution than users rating distribution.
Users’ rating is skewed to the left with a higher overall mean than critics’ rating. For each user and critic, the overall means are 66.27 and 59.4.

Association between critic and user

Observations

There is a positive association between critic and user ratings.
The least squares best-fit line is User rating = 0.0357518 x Critic rating + 4.50315
- For 1 increase in critic rating, we can expect user rating to increase by 0.0357518
It shows heteroscedasticity. The variability is higher when the critic rating is low than when it is high.
The max user rating for all ranges of critic ratings tends to stay the same. However, the increase in the min user rating as the critic rating increases is clearly visible.

Sentiment analysis for combined reviews for each rating

Observations

Since the plot is about collective users’ ratings for each movie, the rating ranges from 0 to 10 with a 0.1 increment.
The compound score starts from 0.4 at a rating of 0, decreases until a rating of around 2, and increases until a rating of 10.
- This can be interpreted as movies that critics are positive about can be rated as 0 or 10 by users.
- Movies that critics are negative about tend to get a user rating between 1 and 2.5.

Prediction

Performed hyperparameter tuning with Optuna using lightgbm.LightGBMRegressor as a model (notebook).

def objective(trial, data = X_train, target = y_train):
    param_grid = {
        "n_estimators": trial.suggest_categorical("n_estimators", [10000]),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "num_leaves": trial.suggest_int("num_leaves", 20, 2000, step=20),
        "max_depth": trial.suggest_int("max_depth", 3, 12),
        "max_bin": trial.suggest_int("max_bin", 200, 300),
        "reg_alpha": trial.suggest_int("reg_alpha", 0, 100, step=5),
        "reg_lambda": trial.suggest_int("reg_lambda", 0, 100, step=5),
        "min_split_gain": trial.suggest_float("min_split_gain", 0, 15),
        "subsample": trial.suggest_float("subsample", 0.2, 1.0, step=0.1),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.2, 1.0, step=0.1),
        "min_child_samples": trial.suggest_int("min_child_samples", 1,500),
    }

    cv = KFold(n_splits=5, shuffle=True)
    cv_scores = np.empty(5)

    for idx, (train_idx, val_idx) in enumerate(cv.split(data, target)):
        X_t, X_v = data.iloc[train_idx], data.iloc[val_idx]
        y_t, y_v = target.iloc[train_idx], target.iloc[val_idx]

        model = lgb.LGBMRegressor(**param_grid)
        model.fit(X_t, y_t, eval_set=[(X_v, y_v)], eval_metric='mape',
                  callbacks=[lgb.early_stopping(stopping_rounds=100),
                             lgb.log_evaluation(period=0)])
        preds = model.predict(X_v)
        cv_scores[idx] = mean_absolute_error(y_v, preds)
    
    return np.mean(cv_scores)


study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=300)

Hyperparameter tuning process

Feature importance

Feature	Importance
movie_year	102
movie_critic_rating	83
num_critic_pos	43
num_critic_neg	28
mean_pos	26
mean_neu	23
std_pos	21
mean_compound	18
…	…
Family	1
Unrated	1

Predicted result

True	Predict
8	6.63514
7.8	7.36846
6.7	6.92814
6.2	6.42496
…	…
4.9	5.79924
6.3	5.84634
6.3	7.45201

Mean absolute error when treating user rating is equal to critic rating

print(mean_absolute_error(user, critic))

>>> 1.368877016551435

Mean absolute error for the test dataset

model = lgb.LGBMRegressor(**best_trial, early_stopping_rounds=100)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)],
          callbacks=[lgb.log_evaluation(period=0)])
y_pred = model.predict(X_test)
print(mean_absolute_error(y_test, y_pred))

>>> 0.8530118478423075

As a result of feature engineering and hyperparameter tuning, I’m able to predict the user rating of a movie with critic reviews with a 0.853 mean absolute error. It is about a 37.69% improvement compared to believing the user rating is equal to the critic rating.

Why it matters?

Overview

Data Collection

Original dataset

1. Movie table

2. Critics table

3. Users table

Data Preprocessing

1. Extracting Sentiment Scores

2. Extracting Keywords

Final Dataframe after Preprocessing

Visualizations

Number of movies

Observations

Number of reviews comparison

Observations

Mean rating trend comparison

Caution

Observations

Rating distribution comparison

Observations

Association between critic and user

Observations

Sentiment analysis for combined reviews for each rating

Observations

Prediction

Hyperparameter tuning process

Feature importance

Predicted result

1. `Movie` table

2. `Critics` table

3. `Users` table