Okay, here’s my blog post about my Corey Seager projections project. Hope it’s useful, or at least mildly entertaining!

Diving into Corey Seager Projections: My Baseball Data Adventure
Alright, so I’ve been messing around with baseball data again. This time, I wanted to take a crack at projecting Corey Seager’s performance for the upcoming season. I’m no professional statistician, just a guy who likes baseball and playing around with numbers, so keep that in mind.
First, I grabbed a bunch of historical data. Like, a ton of it. I downloaded all of Seager’s stats from the past few years – plate appearances, home runs, batting average, you name it. I mostly used Baseball-Reference and Fangraphs for this. They’re usually my go-to spots.
Then came the fun part: cleaning the data. Oh man, this always takes way longer than I expect. There were missing values, weird formatting issues, and just general inconsistencies. I used Python with Pandas to wrangle everything into shape. Lots of fillna()
and astype()
calls, believe me. I pretty much spent a whole evening just getting the data ready to go.
Next up, I decided on what stats I thought were most important. I focused on things like:
- Walk Rate
- Strikeout Rate
- ISO (Isolated Power)
- BABIP (Batting Average on Balls in Play)
These seemed like good indicators of his underlying skill and potential for future performance. I figured those would give me a better idea than just looking at his batting average.

Now, for the actual projection part! I started with a simple weighted average. I gave more weight to the most recent seasons, assuming they were more representative of his current abilities. I played around with different weighting schemes to see what looked the most reasonable. I used a bit of gut feeling too, I won’t lie.
Then, I tried something a bit fancier: a linear regression model. I threw the historical data into Scikit-learn and trained a model to predict his future performance based on the key stats I picked out earlier. I split the data into training and testing sets to get an idea of how well the model was performing. The R-squared wasn’t amazing, but it was good enough for a starting point.
After that, I looked at some publicly available projection systems, like ZiPS and Steamer, to see how my numbers compared. It’s always good to see what the “pros” are thinking. Turns out, my projections were pretty close to theirs, which was encouraging. Maybe I’m not totally clueless! There were some differences, though, and I tried to figure out why. Maybe I was being too optimistic about his BABIP, or maybe the public models were underestimating his power potential.
Finally, I combined everything into a single projection. I averaged my weighted average projection with the linear regression projection, and then adjusted it slightly based on what I learned from comparing it to the public models. I ended up with a range of potential outcomes for Seager’s upcoming season. It’s not perfect, of course, but it’s a decent starting point.

So, what did I learn? Well, I relearned that data cleaning is a pain. I also got a better understanding of how projection systems work, even the simple ones. And most importantly, I had fun! It’s a cool way to combine my love of baseball with my interest in data analysis. I’ll keep tweaking the models as the season goes on, and see how well my projections hold up. Probably not very well, but that’s part of the fun, right?
Anyway, that’s my little baseball data adventure. Hope you found it interesting! Now I’m going to go watch a game.