Okay, so let me walk you through my experience with the 3M Open Leaderboard 2024. It was a bit of a rollercoaster, but hey, that’s what makes it fun, right?

Getting Started: First off, I heard about the competition through a friend who’s really into data science. I thought, “Why not give it a shot?” I mean, I’ve been playing around with machine learning for a while, so it seemed like a good opportunity to test my skills. I signed up on the platform and started digging into the problem statement and the data they provided.
Data Exploration: The initial dataset was pretty big. So, I started by loading it into pandas (Python, you know) and just tried to get a feel for what was there. I looked at the columns, checked for missing values, and plotted some histograms to see the distributions. Nothing too fancy, just basic data exploration. Found some interesting patterns early on, like certain features having a strong correlation with the target variable.
Feature Engineering: This part took a while. I tried a bunch of different things, combining features, creating new ones based on domain knowledge. I even tried some automated feature engineering techniques using libraries like Featuretools, but honestly, the hand-crafted features seemed to work better. I think that’s because I had to really think about what the data meant.
Model Selection: I started with some simple models like linear regression and decision trees to get a baseline. Then, I moved on to more complex ones like Random Forests and Gradient Boosting Machines (GBM). XGBoost ended up giving me the best performance, so I spent most of my time tuning it. I also experimented with stacking different models, but the improvement wasn’t significant enough to justify the extra complexity.
Hyperparameter Tuning: This is where the real grind began. I used GridSearchCV and RandomizedSearchCV to search for the optimal hyperparameters for XGBoost. It was a time-consuming process, but it definitely paid off in terms of improved accuracy. I also learned a lot about how different hyperparameters affect the model’s performance.

Validation Strategy: I used k-fold cross-validation to evaluate my models. This helped me get a more reliable estimate of their performance on unseen data. I also made sure to use a stratified split to maintain the class distribution in each fold.
Submitting and Iterating: The leaderboard was updated regularly, so I submitted my predictions and tracked my score. Based on the leaderboard feedback, I kept iterating on my model, trying new features, tuning hyperparameters, and debugging any issues. It was a continuous process of improvement.
The Final Result: I didn’t win, obviously, but I did manage to get a pretty decent score. More importantly, I learned a ton about machine learning, data science, and the competition itself. I think the key takeaway is that it’s not just about having the best model, it’s also about understanding the data, engineering good features, and having a solid validation strategy.
Lessons Learned: I’d say the biggest lesson was the importance of careful feature engineering. Also, don’t be afraid to experiment and try new things. And most importantly, have fun! It’s just a competition, after all.