Master of Science (MS), Bowling Green State University, 2017, Applied Statistics (Math)
Sabermetrics is the statistical analysis of baseball. This research was started in the 1950's and since then has become increasingly popular. Over the last couple of years, the availability of data within the sport of baseball has exploded. From mainly three sources, we have access to a vast arrange of statistics. This research investigates the importance of count and the first pitch in baseball. The first pitch determines whether the hitter or the pitcher has the advantage in the at-bat and can set the precedence for the rest of the at-bat.
Exploratory methods are used to investigate and summarize the relationships between various variables through the use of tables, contour plots, scatterplots, and line graphs. As the pitcher's thrown first pitch strike percentage increases, the number of innings pitched per game increases, Walks per Hits per Innings Pitched (WHIP) decreases, walk percentage decreases, and strikeout percentage increases. 64% of the first pitches thrown are either four-seam or two-seam fastballs or sliders, which are all fast pitches. Over 50% of the first pitches are in the strike zone. Singles, doubles, triples, and homeruns are more likely to be hit on the first pitch. Pitchers have the highest pitching statistics when the hitter swings and misses compared to putting the ball in play, a called strike, or a ball on the first pitch. When the first pitch is a ball, the hitters have the highest hitting statistics.
Generalized Additive Models (GAM) and Logistic Regression Models are used to discover the factors significant in predicting the probability that hitters swing. Logistic models were created for all pitches and then first pitches for all players. Next, four logistic models were created for four different players. In the majority of the models, count type (whether the count favored the pitcher, hitter, or was neutral), the distance in feet of the pitch from the center of the strike zone, and if runners were on base or not were significant (open full item for complete abstract)
Committee: James Albert (Advisor); Christopher Rump (Committee Member); John Chen (Committee Member)
Subjects: Statistics