Learning to Make Software, Part 2


In the first part of this series, I touched on how important mental models are in learning software development. In part 2, I’ll cover another habit that helps new developers learn the ropes more quickly: they apply curiosity relentlessly.

It’s natural to be uncomfortable with ambiguity and uncertainty. Spend a few minutes thinking about things you don’t know and don’t understand, and you start feeling anxious. As a result, a lot of people learn to compartmentalize and avoid thinking about topics that seem complicated or unknown. They learn to live with not knowing and focus on the part of the code, system, or organization they understand.

Spider the edges of your knowledge Graph

The key to effective curiosity is becoming painfully aware of those unknowns. This is how you ask the right questions, look up the right documents, and attempt the right experiments.

For this, you need to not only apply your curiosity relentlessly, but also apply it in every direction. For any given topic, you can explore it in an infinite number of different ways. Think of a function in the standard library of your favourite programming language. How is it implemented under the hood? Why is it designed the way it is? What other similar functions exist? Could you replicate its behaviour with lower-level primitives?

Think of your knowledge as a graph. Each of the questions is an edge, and the things you know or don’t know are nodes. Start from something you know, and ask as many different questions as you can think of. For each answer, keep asking more questions until you reach the boundaries of your knowledge. This is sort of like a mind map, except that instead of mapping what you know, you’re specifically trying to find the places where your knowledge falls short.

As geographers […] crowd into the edges of their maps parts of the world which they do not know about, adding notes in the margin to the effect, that beyond this lies nothing but sandy deserts full of wild beasts, unapproachable bogs, Scythian ice, or a frozen sea

Plutarch, Parallel Lives (1st century)

The quickest learners I’ve met are constantly doing this, intuitively and subconsciously. But if it doesn’t come naturally to you, I recommend doing it on paper. It’s a more deliberate exercise but accomplishes the same thing, and over time you’ll get better at doing it in your head.

Knowledge Graphs in Practice

Let’s work through an example, drawn from my recent experience studying machine learning in Python. One commonly used algorithm for machine learning is random forests. Let’s start there:

What is a random forest? It’s an ensemble of several simpler decision-tree models:

What does ensemble mean? How does that work? The algorithm takes a bunch of randomly-chosen decision trees and averages them:

As soon as I write this down, a question pops into my head. Is the average weighted or are all the trees counted equally? I don’t know the answer to that question, so I jot it down and highlight it for later:

Now’s not the time to go off and find the answer, though. Instead, once you’ve reached the end of a branch of the graph, work your way back and see if you can come up with any other questions. In this case I can’t think of anything else, so I’ll start back at the centre.

I know that you can control the training of ML algorithms like random forests using hyperparameters. I know a few of the hyperparameters used for random forests, so I draw them out. For a few of them (like max_depth and min_samples_leaf), I understand what they’re doing, but for others my understanding is vague at best, so I highlight those. I’m also pretty sure there are more hyperparameters I don’t remember, so I draw in another circle with question marks:

The library I’ve been using for training random forests is Scikit-Learn. Are there others in Python? How about in other languages? I don’t know:

I know that when training random forests, you have to convert all of the features in your dataset to numbers. Why? I hadn’t really thought about it before. Maybe so that the data can be manipulated in numpy (implemented in C), so it’s much faster than if it had to be done in Python? This is an assumption, so I’ll highlight it as something I could confirm:

So far we’ve gone into the details of how random forests work, how they’re implemented, and how to use them. For the last branch I’ll go for breadth instead of depth. Random forests are just one ML algorithm based on decision trees. I’ve heard of at least one other: gradient boosted trees. I don’t know that much about boosted trees, so there’s another question mark:

The only real limit to how detailed one of these maps can be is how much time you have. For this example I’ve gathered a handful of questions. These range from ambiguities in my mental model and finer technical details, to avenues for learning more about related topics.

Reviewing The Questions

Here are all the questions I came up with:

  • Does the averaging of trees in a random forest use weights?
  • What are the min_samples_split and bootstrap hyperparameters and what do they do?
  • What other hyperparameters are available in Scikit-Learn for training random forests?
  • Are there other Python libraries for random forests?
  • Why do features need to be numeric?
  • What are gradient boosted trees and how do they work?

Now review the questions to decide which ones to follow up on. Here are some useful questions to ask: how much time/effort will it take to find this out? Is it at a level of detail that’s useful to me right now? Does it contribute to my mental model or is it a detail I will have to memorize? Let’s review the questions in that light:

  • Does the averaging of trees in a random forest use weights?
    • This is easy to answer with a quick Google search.
  • What are the min_samples_split and bootstrap hyperparameters and what do they do?
    • In some ways this is a low-level detail that I would only need if I was tuning those particular parameters. However, min_samples_split bugs me, because the way trees are split is fundamental to the algorithm, so not understanding it feels like a gap in my mental model. I’ll look it up and try to understand how it works.
  • What other hyperparameters are available in Scikit-Learn for training random forests?
    • I could look up the list of arguments, but it doesn’t feel likely that I’d remember them. Instead, next time I’m training a random forest model, I’ll try to pull in one new hyperparameter and understand it. I’ll talk more about this approach in Part 3 of this series.
  • Are there other Python libraries for random forests?
    • This is background knowledge related to the field in question, and could contribute to my general familiarity with machine learning. Maybe there are newer libraries with interesting features that I could experiment with. I’ll probably do a search and file away what I learn.
  • Why do features need to be numeric?
    • This is interesting, but it feels like a low-level implementation detail. I probably wouldn’t need to care unless I was implementing a random forest algorithm from scratch myself. For now, because I’m only aiming for a rough familiarity with machine learning and not in-depth expertise, I’ll leave this question aside.
  • What are gradient boosted trees and how do they work?
    • This is the question that will take the most time and effort to answer, but is a good candidate as a next topic of study for me. I won’t look it up right now, but next time I’ve got a block of time to research and study I might find a book chapter or tutorial and practice training a boosted tree model.

This entire process was quick (it took me about half an hour) and surfaced a lot of great avenues for learning.

In this case I applied the method to a theoretical concept, but it can be even more valuable if you apply it practically to aspects of your work. Try starting with a system in your team’s architecture (for example, the “recommendation service”) that’s adjacent to your work. Think about the domain model, implementation, and the services it interacts with, as well as how it fits into the business strategy and which teams know more about it. When reviewing your questions, think about the best way to answer them. Should you talk to an engineer who worked on a particular part of a system, or just read some source code?

Validating Understanding

A lot of the time the most valuable questions aren’t the ones that will take you down a new learning path (like the last one in the list above), but ones that can be answered easily with a Slack message to a colleague and will serve to round out your mental model or fix flawed assumptions.

This form of curiosity doesn’t require you to set aside time to study and learn in depth, it just requires you to practice validating questions. Far too often I see developers make assumptions and never question them, requiring more senior team members to guess at what they’re missing.

Start from the premise that, no matter how experienced you are, a sizeable number of the things you think you know are wrong. Maybe not flat-out nonsense levels of wrong, but at least subtly incorrect in some way. Take every opportunity in conversations to validate your knowledge:

  • “Hey, so correct me if I’m wrong, but this is authenticated with OAuth2?”
  • “Ah, if I understand correctly, that doesn’t write directly to the database–there’s a queue, right?”,
  • “Hey, I just wanted to clarify something about the network layout of the VPC…”

Believe me when I say you can’t do this too much. Even if the assumption you’re validating is correct 90% of the time, you’re helping to get the whole team on the same page, and potentially clarifying things for other team members who might be afraid to ask a question. Other times you’ll be mostly right but missing a small detail, so it’s a good opportunity to round out your mental model. Sometimes the person you’re asking doesn’t actually know the answer, and you get to find out together. It’s always a win.

By combining these approaches of discovering your blind spots and validating what you know, you’ll build up a pool of knowledge that isn’t limited to the purely technical hands-on skills of writing code. You’ll understand the organization, the architecture, the systems, and the theory.

What’s Next?

So far we’ve gone over building mental models to understand software systems, and both deepening and broadening your knowledge with active curiosity. In the final part I’ll talk about sharpening your tools and getting in the practice you need.

Add Comment