Okay, so today I decided to dive into something I’ve been curious about for a while: predicting sports outcomes. Specifically, I wanted to see if I could build something that would give me a prediction for Diego Schwartzman’s matches. I’m a big tennis fan, and Diego’s one of my favorite players to watch, so it felt like a fun place to start.

First, I needed data. Lots of it. I spent a good chunk of time just figuring out where to get reliable match stats. I looked at a few different sports websites, trying to find one with a good history of match data and, ideally, some kind of API I could use. I thought about just scraping the data myself, but, man, that seemed like a headache I didn’t want to deal with.
After some digging, I found some data that showed past performances. Great! A good start, I felt. This was going to be the foundation of everything. I downloaded all the available data into a CSV file, which is basically just a big spreadsheet. I opened it up in a text editor just to make sure everything looked okay, and yep, it was all there – opponents, scores, tournament names, the works.
Next up, I had to figure out how I was actually going to use this data. My initial thought was, “Okay, maybe I can just look at his win/loss record against specific opponents and use that to predict future matches.” So, I started playing around with the data in a spreadsheet program, trying to sort and filter things to get a sense of his head-to-head records.
I quickly realized that just looking at head-to-head records wasn’t going to be enough. There were way too many other factors at play. Was the match on clay, hard court, or grass? Was Diego playing at home, or was it an away game (so to speak)? Was he coming off a big win, or a tough loss? All these things seemed like they could have a big impact.
I tried to get fancy and use simple calculating, but it was a mess. I mean, I could do some basic calculations, but I needed something that could handle a lot more complexity. I thought, “there’s gotta be a better way!”

I decided to put it aside to rest and the next step, I’ll try to find more factors to take into the consideration.