Spring 2023 - DSC450 Applied Data Science Projects
Project 1: Sentiment Analysis
Presented by Victoria Sukar with major contributions from Jeff Thomas and Jerock Kalala
For Project 1, we utilized Yelp and Amazon review data from Kaggle that was used to analyze the sentiment of review responses to determine whether the review is overall positive, neutral or negative. Our overall business problem was to determine if customers of restaurants and purchased merchandise are more likely to post a review after a positive experience or a negative experience. Overall, between the Amazon and Yelp data, we found that most people who post restaurant and merchandise reviews are doing so because they had an overall positive experience.

Project 2: Predicting the best MLB Offensive Baseball Players with Machine Learning
Presented by Jeff Thomas with major contributions from Victoria Sukar and Jerock Kalala
For Project 2, we have chosen to predict the best MLB (Major League Baseball) offensive players using data from ESPN.com. We are working with four dataset in the form of csv files, broken down by team statistics, player statistics, and seasons (2021 & 2022). We also used MLB.com to incorporate a feature called ‘POST_SEASON’ in our Team csv files.
We first look at team stats from 2021 to help us determine that Runs are an important part of a team’s winning percentage. We then use Runs as our target to train the 2021 player dataset. Once our machine learning model is built with an acceptable score, we will predict using the 2022 datasets. Overall, our model achieved a 90% accuracy score, and we can validate that our predictions for how many runs a player will score per game in 2022 appear to be accurate. Therefore, we feel that this model is successful in predicting some of the best offensive players in the game.

Project 3: Predicting Walmart Store Sales using Machine Learning – Python
Presented by Jerock Kalala with major contributions from Jeff Thomas and Victoria Sukar
Four Project 3, we have chosen to work on forecasting Walmart sales using data from Kaggle.com. We worked worked with four different datasets covering the period from 2010-02-05 to 2013-07-26. We selected important features available in the datasets to predict sales. As you guessed correctly, our target variable was sales, given by the weekly sales average. After cleaning, transforming, and splitting our data in training and test datasets, we chose the appropriate model, and built our machine-learning model. We were satisfied with its performance after training, and then decided to use it on the testing dataset for prediction. Overall, our model achieved an accuracy score of 97%, meaning there was only a 3% chance of error. With this score, we felt that the model was good, and successfully predicted the sales.

Please reach out anytime with questions, comments and feedback! I’d love to hear your thoughts!