Movies, Ratings, And The World Economy

Norfolk Project
9 min readApr 12, 2022

The IMDb comprehensive dataset on Kaggle is a dataset that contains over 85,000 movies, along with several unique metrics including film ratings, budgets, location, production years, and gross revenue. Using this dataset, along with other resources, we aim to represent the movie industry from an economic and historical perspective. Through the lens of the movie industry, certain important social phenomena can also be observed.

The primary trend of the movie industry that we analyzed was the total number of movies produced over time. First, our group graphed graphing how many movies were released each year over the past 100 years globally to create a clear snapshot of the growth of the movie industry over time.

From the first figure, there are two major trends that can clearly be observed: the first being linear growth from the early 1920s to the 1990s, and the second being exponential growth from the 1990s to the present. However, distinct outliers remain in 2019 and 2020, with both years representing significant declines. The cause of these outliers is likely a combination of political and economic factors, such as COVID-19 limitations.

Looking at the USA graph, one can observe a different trend from that of the global movie production. Movie production in the USA appears to increase far ahead of the global trend before entering a 20-year plateau. Later, the total number of movie releases declined in the 1940s and remained near that level until the early 1960s. Contrary to the world graph, the US starts on an independent major upward trend in the 1980s, whereas the world production totals begin their ascent 10 years later in the 1990s.

The Germany graph above contains a few important deviations from the global trends. One apparent piece of information when looking at this graph is that it is significantly more volatile, and it is slightly harder to identify major trends. Much like the movie industry on a global scale, Germany’s movie production also went through a large expansion that resulted in an explosion in the number of movies being produced. Germany’s movie industry has had 2 long-term expansions and 2 long-term contractions where the industry made consecutive higher highs and higher lows. This contrast to the rest of the global consistent trend upwards over the same time period further shows how the economic volatility in Germany affected different industries in the country. The effect on movie production also provides significant further insight into the internals of the German economy, as the movie industry is often affected differently than other industries depending on the economic circumstances. Additionally, it is noteworthy that the German industry declined from 1920 to 1950 is opposite to the USA’s rapid expansion and plateau in the same period. Moreover, the German movie industry expansion from 1950 to 1970 can be contrasted with the USA’s stagnation and decline in the same time period.

After observing the various trends in the data, possible explanations of these different trends and events throughout history become evident. Starting in the 1920s, the main and practically the only significant player in the movie industry was the USA. The Roaring 20s were a booming period for the USA’s economy, and the entertainment industry was no exception. The Roaring 20s also brought the invention of sound movies, adding an entirely new sensation and excitement to film. Despite the Roaring 20s inevitably turning into the Great Depression, the number of movie releases stayed high as cinema became the most popular form of escapist media for a demoralized country. During this time period, Americans turned up en masse to watch new films, creating a booming market where other industries faltered. Next, we observed a dip in both the USA and the world graph during the 1940s. An easy assumption is that this dip was caused by WWII, and the data from that time period seems to support this conclusion.

The number of world movie and German movie releases dipped as soon as WWII started (around 1939), whereas the number of USA releases doesn’t start dipping until 1942 upon joining the war. Germany’s total movie releases also dipped during WWII, but this is just part of a downward trend continuing from the 1920s. Germany did experience a lot of economic and political turmoil post-WWI, which was probably the main cause for the movie release decline; WWII just accentuated that fact.

Also notable, but a blip in most graphs, was the creation of CGI in 1973. CGI was first used in a US movie called Westworld. This caused many more movies to begin using CGI, which increased the entertainment value of watching movies — all of a sudden, sci-fi and fiction weren’t theoretical, it was immersive. Eventually, however, the hype wore down, and other countries caught on, creating their own movies with CGI. Germany was likely affected more by this movie innovation since it was also catching up economically at this time. This combination of two positive effects caused Germany’s net positive gain to be larger.

Another major trend that all the graphs show is the spike in total movies around the1990s. The most obvious cause of this major and all-encompassing boom was the introduction of the internet. This development, alongside the vast economic expansion brought about by many other advancements of the time period, promoted the widespread production of both lower and higher-budget movies. Furthermore, changes in mainstream film genres continued to provide viewers with a wide range of content that could capture a far wider audience. Entering the 2000s, another important trend benefited the film industry. In contrast to the broad consumer price index, the CPI for home television and computers drastically decreased, despite the United States notching a positive inflation rate for all years in the given time period except for one.

MacroTrends historical CPI

Referring back to the world graph, from the 1990s and on, the total amount of movies per year as a share of total movies forms enters a steady vertical ascent. The main fact to notice here is that the occurrence of dips and spikes have basically disappeared, and it seems as though it is impossible to rain on the movie industries parade. Despite there still being major events in the world that may have temporarily broken the industry’s upward trend, the internet prevented any large effect on the production of movies.

However, 2020 proved to be a good example of the idea that no economic trend is set in stone. The Covid-19 pandemic was unlike previous adverse economic events after the introduction of the internet in terms of economic devastation and physical limitations for movie production. Despite almost all activities being trafficked through the internet in 2020, the internet was not the same saving grace it was in the past 40 years. For similar reasons, 2020 also served to be an incredibly economically favorable year for movie theaters, further contributing to the movie industry’s pain.

Another insightful metric in the data set was movie ratings. When averages by year are graphed against time, we can observe a distinct linear decline since 1920. As of 2020, the average movie rating is roughly 5.5 out of 10, the lowest it’s ever been. However, in 1920, it was 6.3 out of 10, reaching an all-time high in 1924 at 6.8 out of 10. This steady decrease in rating can be attributed to a multitude of factors; however, the trend does not necessarily indicate a decline in movie quality overall.

One of the possible main factors negatively affecting movie ratings is the sheer volume of movie production. This suggests that a mean reversion process is occurring, as the volume of movies increases while the quality returns to average. This also correlates to an increase in the number of votes being cast on the IMDb website, as newer movies tended to have more votes than older ones.

While movie production increased in an exponential fashion, the decreasing linear trend of movie ratings did not change. During the 1990s, the internet was introduced across the globe, allowing for movies and other media to be created and distributed much more easily. This rise in simplicity allowed for smaller studios to effectively produce movies as well. These smaller studios were creating more movies with lower budgets, possibly further contributing to a lower average rating over time. This conclusion can be supported by analyzing time periods and comparing movie budgets to movie ratings. For the time periods between 1980–2000 and 2000–2020, shown below, a distinct bell curve is shown.

In both cases, it is evident that the majority of movies are clustered in the 4–8 score range with lower budget movies filling both the high and low rating tails, while movies with budgets towards the top of the range formed a similar shape with less significant tails. This is to say, higher budget movies did not tend to be too poor or exceptional, while low budget movies occupied the full range of scores, with the low range tail thickening significantly between the two time periods, contributing to the fall in ratings. These conclusions mutually suggest that an increase in lower-budget productions that slowly fill out the lower rating range along with the general increase in lower-budget movies being produced over time contributes to the slow decrease in average rating.

Clearly, ratings are going to stop declining at some point, so when will this be? If we graph a non-linear regression, we can see that the curve starts to level out.

While the fit of the non-linear regression line is slightly less than that of the linear regression, this is a more realistic model. It can be assumed that the average rating will taper off at some point.

Since their inception, movies have dominated the entertainment industry globally.

The place movies hold in the ever-growing entertainment industry gives them a distinct economic function. As a result, they are affected differently by a broad range of economic factors, including short and long-term trends. From its past data and future predictions, it becomes clear that the movie industry provides a unique perspective and is one metric to keep an eye on.

The Kaggle dataset was created through a non-discerning web scraping process. This means that the dataset included missing values, different languages, un-sorted genres, and different currencies. Moreover, there were many inconsistencies with formatting in the dataset that needed to be accounted for in an automated and consistent manner. These concerns were addressed with a strict sorting and regular expression process without the loss of significant portions of data that would affect any trends or conclusions. Specifically, the loss of data for each relevant country accounted for a very small share of the total data for each metric, resulting in a significantly more accurate analysis.

Written by: Beatrix Metral, Robert Rogers, Liam Sheldon, and Phillip Forman. Edited by: Diego Luca Gonzalez Gauss.

Note: this article was originally released on February 15, 2021.

--

--

Norfolk Project

Norfolk Project is a student research group using data science & abridged fields to try and solve real-world problems. Contact us at: contact@norfolkproject.com