Movie Success and Rating Prediction Using Data Mining

A large sum of money is invested annually in the production of several films. The primary objective of our study is to predict, using specific credits (both static and social/dynamic), if the film will be a true hit or a complete bust. There are a lot of set aspects that impact a movie's success or failure, such as the kind, budget, entertainers, chief, creator, creation home, delivery date, and so on. Looking at the film through the lens of online entertainment, we would search for dynamic hashtags that are now trending on Twitter. It is critical to find the attribute relationships and use an information mining calculation to get the result. To find out whether any movie is lucky or not, we apply all the information mining tools. This method is very helpful for those who build things. due to the fact that it gives them the opportunity to review films before to their release, which greatly influences their self-presentation and enhances their outcomes.


I. INTRODUCTION
Movies play a significant role in the entertainment industry's bottom line and are critically acclaimed works of art.Studio heads, producers, and distributors must, therefore, accurately predict a film's box office take and critical reception before to its release.The advent of data mining as a predictive analytics tool has made it possible to sift through mountains of film-related data in search of useful insights.
Data mining is the process of investigating large datasets for relevant patterns, correlations, and trends.Several factors impact a film's success and critical acclaim, and data mining tools may help investigate these factors.First, the project will provide professionals in the film industry with a tool to help them make decisions about production, marketing, and distribution.Second, data mining will get a boost as this project shows how advanced techniques can be used to predict how well a movie will do at the box office.More specific information on the data used for this project will be provided in the parts that follow.

II. RELATED WORKS
Acquiring subject-specific knowledge is the goal of artificial intelligence.The accumulation of knowledge include understanding of dark energy (74% of the cosmos), dark matter (22%), and visible matter (4%).Both biological and non-biological sensors, such as robots, televisions, mobile phones, cameras, microscopes, radar, computers, etc., may be used to collect this information [1].Tools that are increasingly complicated and advanced are needed to handle the ever-increasing amounts of data in contemporary research and industry.There is a continual need for innovative methods and tools to help us convert enormous data into valuable information and understanding, even while data mining technology has made massive data collecting much simpler.Significant progress has been achieved in the realm of data mining since the release of the last version [2].One of the most prominent entertainment industries is the film industry.Although the Nigerian film industry cranks out a plethora of films for the general people to enjoy, only a select few manage to become commercial successes.The advent of success prediction for movies has the potential to revolutionize the business by assisting filmmakers and producers in making more informed choices that ultimately benefit the bottom line.The authors of this paper suggest a methodology for predicting the box office performance of forthcoming films using data mining and machine learning algorithms, with the help of certain predetermined factors.In order to forecast a film's financial success or failure, several factors must be considered.These include the dataset required for data mining, which includes information about the film's writers, directors, actors, and actresses as well as the film's marketing and production budget, target demographic, theater location, film's release date, and the performances of competing films that share the same date.Before purchasing a movie ticket, this model also assists moviegoers in determining if an upcoming film is a blockbuster, hit, or has a high success rating.Internet Movie Database MetaData, after undergoing an initial cleansing and integration procedure, was subjected to data mining algorithms [3].Although networking research has long made use of machine learning and AI, much of this work has been on supervised learning.Unsupervised machine learning has become more popular as of late for enhancing network performance and offering services like traffic engineering, anomaly detection, Internet traffic categorization, and quality of service optimization utilizing unstructured raw network data [4].Lorene Wales's The Complete Guide to Film and Digital Production, now in its third edition with extensive updates and revisions, provides a thorough overview of the many jobs and responsibilities involved in making films and digital videos, from ideation and pre-production through distribution and marketing.With a variety of example checklists, timetables, accounting paperwork, and downloadable forms and templates, Lorene Wales provides a practical strategy that is ideal for projects of any budget and scope.She explains every step and crucial role/position in a film's existence and gives a hands-on approach.In addition, we cover the most recent production-related mobile applications, tax incentives, the DIT, set safety, and a more in-depth discussion of copyright, fair use, and other legal issues in a newly extended chapter.
Documents such as schedules, accounting paperwork, releases, and production checklists are available for download on the accompanying website, which also features video tutorials, a personnel hierarchy, a guide to mobile apps that can be helpful during production, and PowerPoints that instructors can use.[5].

III. METHODOLOGY USED
Random Forest Algorithm: This method builds decision trees using randomly selected dataset samples.Once all of the decision trees have had their outcomes voted on, the final forecast is created.
Naïve Bayer's Algorithm: Naive Bayes predicts the likelihood of different classes supporting different qualities using a same procedure.Text categorization is the typical use of this method.Using the comments on YouTube trailers and teasers, we trained this algorithm to forecast a film's projected rating.
Depending on the dataset and task, one approach may be more appropriate than the other; each has advantages and disadvantages.To find the best algorithm for a certain movie prediction job in terms of accuracy and precision, it's necessary to test and compare many algorithms.Predicting the likelihood of success or failure for future films is the overall goal of the created model, which employs data mining methods, suitable algorithm selection, and historical data.Nevertheless, it should be mentioned that there are still a lot of unknowns in the film business that might affect how accurate the projections are.
Classification techniques: Such technqiues to aid the created model in foretelling the future box office performance of films.The model was built using the correct machine learning approach utilizing 80% of the dataset's data.With the remaining data set, the model is trained.The proposed approach relies on classification methods to forecast the success of upcoming movies.

V. CONCLUSION
After much deliberation, we have come to the conclusion that it is not possible to forecast a new film's anticipated rating based on the comments left on trailers and teasers on YouTube prior to its actual release.The YouTube API pulls out more negative than good comments since the ones with the most remarks show up first.More negative than positive results appear in the model's output on a regular basis.This means that our model isn't always accurate.Although our tried-and-true method works, we can't foretell how a new picture will do in theaters before it opens.
We therefore came to the conclusion that it was possible to forecast a new film's box office performance prior to its debut by analyzing internet data and features.We limited the number of characteristics to 5 to make it easier to handle the large quantity of data and make sure that only the relevant information is utilized and understood by the user.We trained the algorithm to predict the ticket sales of upcoming movies.The model achieved an accuracy level of over 70% across all test sets.So, we concluded that our system is doing its job well and can predict a film's box office performance before its debut.

Fig 1 :Fig 2 :
Fig 1: Prediction Window Details such as the film's title, director, producer, actors, actresses, etc. are shown in the prediction box.These things have a role in determining the future movie's success or failure.In addition, it will provide the movie's rating (out of 10).