Building a Random Forest by Hand in Python
January 28, 2024#machine-learning
From drug discovery to species classification, credit scoring to cybersecurity and more, the random forest is a popular and powerful algorithm for modeling our complex world. Its versatility and predictive prowess would seem to require cutting-edge complexity, but if we dig into what a random forest actually is, we see a shockingly simple set of repeating steps.




My first programming language was R. I fell in love with the nuance R granted for visualizing data, and how with a little practice it was straightforward to pull off complex statistical analyses. I coded in R throughout my Ph.D., but I needed to switch to Python for my first non-academic job. Picking up a second language went much faster than the first, but there was a lot to get used to when I transitioned.
Writing this in September 2017 after the new first-years have arrived on campus, I realize it’s now been four years since I started the PhD. Back in 2013, I had just finished a year in Germany on a
Year three. If an American PhD takes 5-6 years, then this is when you pass the halfway point. You’re now in the thick of the weeds. The big picture science that originally got you into this PhD gets harder to remember. The questions you set off to answer years ago really need to stand up to the second guesses that come from when you start dedicating hundreds of hours to answering them. It gets hard not to hear those quiet voices asking if there’s a better way to do things: is academia the path of most happiness for me, what’s consulting or industry like, am I actually doing interesting work? And of course, the eternal question of investing in automation versus doing the mind-numbing manual work! (See
The 2nd year of the PhD felt like being a teenager. You’re no longer new to graduate school, and you’re starting to feel the pressure of having something to show for your time here. Within your second year, you go from a PhD student interested in a topic, to a PhD candidate who understands the topic well enough that he can convince others it’s important. This transition has felt a bit like growing up: a loss of naivety and the addition of responsibilities, but also a legitimization of who you are as a researcher. This blog post describes my experiences during this year and what I’ve learned from them.
The first year of the PhD is over… I guess! It doesn’t really feel like your second year until the new first-years arrive in September, and work hasn’t suddenly stopped with the end of the semester, unlike in college. If anything, this summer is when I’ll actually make any progress on experimental ideas I’ve been developing since I first e-mailed my advisor two years ago. But enough time has passed that I think I can share some reflections on my first year that can hopefully help someone else starting or considering starting a PhD in biology.