There’s a funny anecdote, repurposed throughout the years, originally about the difference between Soviet and American military design during The Cold War. The anecdote relates how the Americans and Soviets agreed to end The Cold War once and for all to avoid nuclear war. Each country would have 5 years to produce the ultimate fighting dog after which The Cold War would be settled with a dog fight.
The Soviets bred fierce Rottweilers and Dobermans with Siberian wolves using genetic engineering and competition for resources. Finally a single ferocious fighting dog was produced the likes of which the world had never seen. On the day of the fight, the Americans showed up with a 12 foot long wiener dog or Dachshund. Only seconds after the fight began the Dachshund crushed the Soviet wolf-dog with a single bite, proceeding to consume it. Surprised, the Soviets told about how their best geneticists and behavioral scientists had worked five years to create this ultimate fighting dog to which the Americans replied that their best plastic surgeons had spent the last five years turning an Alligator into a Dachshund.
I find this story to be a good example of how the approach to a problem often matters more than how well that approach is implemented. The alligator did not win the fight due to the ability of the plastic surgeons, it won because it was an alligator. Many data scientists become so focused on proving a hypothesis they don’t consider whether or not they have the right hypothesis in the first place. They need to rethink their strategy, before focusing on tactics. This is surprisingly common among many different types of data scientists.
Choose the Right Hypothesis
Data Scientists who were previously software developers may be used to tasks coming with clear requirements and user stories as opposed to brainstorming what the tasks should be in the first place. The tendency to get stuck on the wrong hypothesis is just as common among data scientists with strong academic backgrounds. It is common in many research fields for hypotheses or problem definitions to be set by the research community at large. In fact, a hypothesis often needs general acceptance by the research community before its proof is of interest to be published. This tends to result in academics spending the majority of their time thinking deeply about a few well known hypotheses as opposed to brainstorming new hypotheses outside traditional constraints.
But all is not lost, below I discuss three important ways data scientists can be creative and think strategically first, then tactically. Some of these may work better for some than others, but I think they all show that particular actions can be taken to improve one’s approach to a problem.
1. Balance Tactics with Strategy
I refer to strategy as the high-level goals and motivations behind a solution with tactics referring to the techniques and technologies adopted by the solution. Strategy and tactics should be balanced, as a weak strategy or weak tactics may render their strong counterpart worthless.
Most data scientists tend to have more training in tactics than strategy. I believe one of the differentiators between data scientists and more traditional roles is the ability to creatively connect data tactics with high level strategy. If a data scientist views high level strategy as outside their responsibility, then they are probably filling a more traditional role (e.g., analyst, software developer, statistician). Data scientists should intentionally devote up-front time to brainstorming and return to this activity throughout analysis and implementation. Most of the valuable solutions I’ve developed have required both creativity and data expertise coinciding during the brainstorming process.
2. Rest and Reflect
Thinking strategically requires creativity! Josef Brodsky likens creativity to a grain of sand being swept away with the tide in that the grain’s proximity to the ocean increases its chance of being swept away, but it has no direct control over this process. Data scientists should learn from artists, whose careers depend on creativity, to practice disciplines that encourage creativity. Discipline can be defined as habits or repeated actions performed to indirectly achieve some end one is unable to directly affect. For instance, if an athlete wants to be stronger, they must adopt the discipline of exercise, for while they cannot directly will that their body become stronger, they can control whether or not they regularly exercise, which will tend to strengthen their body. In the case of creativity, I have found rest and reflection to be the most effective disciplines.
In the first two years of undergrad I forced myself to sleep no more than four hours a night. My body adapted and I was able to devote long days to finishing assignments. I found this worked well for tactics (or implementation) but not strategy. When I went to grad school I found it difficult to come up with creative ideas unless I was regularly sleeping as much as my body wanted. I couldn’t get as much work done by sleeping in, but I found my brainstorming was infinitely more productive when I was sleeping well. Some of my best ideas may have come in the middle of a late night, but such nights were usually preceded by some sort of rest.
Rest alone was only a piece of the picture, the other is reflection. I suspect half my published project would never have begun had I not sold my car and started riding the bus. Without knowing it, I forced myself to periodically stop working, interacting, or surfing the internet, and simply wait and think. Riding the bus is probably still to credit for over half my good ideas, and while I’m not suggesting every data scientist ride the bus, they should make a habit of habitually/periodically devoting time to reflection, allowing creativity to do its best work.
3. Talk to People Involved With Your Problem
This may seem obvious, but over-the-wall engineering doesn’t work for data science either. Data scientists should interact tightly with all people related to a problem, or problem space, as opposed to adopting a contract (defining the problem), independently working on it, and lobbing it over the wall. This doesn’t only include talking to “domain experts,” data scientists should make a point to talk with support staff, outside clients, CEOs, and janitors. The value I’ve seen from these interactions comes in many forms, but the most common is identifying false assumptions and existing solutions. Not-so-surprisingly, the data scientist is usually new to the problem space and the domain experts tend to carry many unstated assumptions, the combination of which leads to a lot of missing pieces. The data scientist should take it upon themselves to “turn over rocks,” poking their nose anywhere and everywhere that might relate to the problem they’re working on.
Put it All Together
The combination of a broad interaction coupled with intentional rest and reflection will encourage a “strategy-first” approach to data science. The data scientist will be forced to consider and evaluate multiple solutions from many different angles. Reflecting on this process will often result in synthesizing completely new approaches to the problem, that borrow from both the past experience of others and the fresh perspective of the data scientist. Whether these suggestions result in a brand new approach to the problem, or simply choosing the best from many existing solutions, it will force the data scientist to take a step back and strategize, before they begin thinking about tactics and implementation.
For more of my thoughts on data science, in particular, how to help put better science into data science, read my blog post, “Data Scientists, Beware Your Own Arrogance.”