Don’t Cross the Beams: Avoiding Interference Between Horizontal and Vertical Refactorings

As many of my pair programming partners could tell you, I have the annoying habit of saying “Stop thinking” during refactoring. I’ve always known this isn’t exactly what I meant, because I can’t mean it literally, but I’ve never had a better explanation of what I meant until now. So, apologies y’all, here’s what I wished I had said.

One of the challenges of refactoring is succession–how to slice the work of a refactoring into safe steps and how to order those steps. The two factors complicating succession in refactoring are efficiency and uncertainty. Working in safe steps it’s imperative to take those steps as quickly as possible to achieve overall efficiency. At the same time, refactorings are frequently uncertain–”I think I can move this field over there, but I’m not sure”–and going down a dead-end at high speed is not actually efficient.

Inexperienced responsive designers can get in a state where they try to move quickly on refactorings that are unlikely to work out, get burned, then move slowly and cautiously on refactorings that are sure to pay off. Sometimes they will make real progress, but go try a risky refactoring before reaching a stable-but-incomplete state. Thinking of refactorings as horizontal and vertical is a heuristic for turning this situation around–eliminating risk quickly and exploiting proven opportunities efficiently.

The other day I was in the middle of a big refactoring when I recognized the difference between horizontal and vertical refactorings and realized that the code we were working on would make a good example (good examples are by far the hardest part of explaining design). The code in question selected a subset of menu items for inclusion in a user interface. The original code was ten if statements in a row. Some of the conditions were similar, but none were identical. Our first step was to extract 10 Choice objects, each of which had an isValid method and a widget method.


if (...choice 1 valid...) {
if (...choice 2 valid...) {


$choices = array(new Choice1(), new Choice2(), ...);
foreach ($choices as $each)
  if ($each->isValid())

After we had done this, we noticed that the isValid methods had feature envy. Each of them extracted data from an A and a B and used that data to determine whether the choice would be added.

Choice pulls data from A and B

Choice1 isValid() {
  $data1 = $this->a->data1;
  $data2 = $this->a->data2;
  $data3 = $this->a->b->data3;
  $data4 = $this->a->b->data4;
  return ...some expression of data1-4...;

We wanted to move the logic to the data.

Choice calls A which calls B

Choice1 isValid() {
  return $this->a->isChoice1Valid();
A isChoice1Valid() {
  return ...some expression of data1-2 && $this-b->isChoice1Valid();


Which Choice should we work on first? Should we move logic to A first and then B, or B first and then A? How much do we work on one Choice before moving to the next? What about other refactoring opportunities we see as we go along? These are the kinds of succession questions that make refactoring an art.

Since we only suspected that it would be possible to move the isValid methods to A, it didn’t matter much which Choice we started with. The first question to answer was, “Can we move logic to A?” We picked Choice. The refactoring worked, so we had code that looked like:

Choice calls A which gets data from B

A isChoice1Valid() {
  $data3 = $this->b->data3;
  $data4 = $this->b->data4;
  return ...some expression of data1-4...;

Again we had a succession decision. Do we move part of the logic along to B or do we go on to the next Choice? I pushed for a change of direction, to go on to the next Choice. I had a couple of reasons:

  • The code was already clearly cleaner and I wanted to realize that value if possible by refactoring all of the Choices.
  • One of the other Choices might still be a problem, and the further we went with our current line of refactoring, the more time we would waste if we hit a dead end and had to backtrack.

The first refactoring (move a method to A) is a vertical refactoring. I think of it as moving a method or field up or down the call stack, hence the “vertical” tag. The phase of refactoring where we repeat our success with a bunch of siblings is horizontal, by contrast, because there is no clear ordering between, in our case, the different Choices.

Because we knew that moving the method into A could work, while we were refactoring the other Choices we paid attention to optimization. We tried to come up with creative ways to accomplish the same refactoring safely, but with fewer steps by composing various smaller refactorings in different ways. By putting our heads down and getting through the other nine Choices, we got them done quickly and validated that none of them contained hidden complexities that would invalidate our plan.

Doing the same thing ten times in a row is boring. Half way through my partner started getting good ideas about how to move some of the functionality to B. That’s when I told him to stop thinking. I don’t actually want him to stop thinking, I just wanted him to stay focused on what we were doing. There’s no sense pounding a piton in half way then stopping because you see where you want to pound the next one in.

As it turned out, by the time we were done moving logic to A, we were tired enough that resting was our most productive activity. However, we had code in a consistent state (all the implementations of isValid simply delegated to A) and we knew exactly what we wanted to do next.


Not all refactorings require horizontal phases. If you have one big ugly method, you create a Method Object for it, and break the method into tidy shiny pieces, you may be working vertically the whole time. However, when you have multiple callers to refactor or multiple implementors to refactor, it’s time to begin paying attention to going back and forth between vertical and horizontal, keeping the two separate, and staying aware of how deep to push the vertical refactorings.

Keeping an index card next to my computer helps me stay focused. When I see the opportunity for a vertical refactoring in the midst of a horizontal phase (or vice versa) I jot the idea down on the card and get back to what I was doing. This allows me to efficiently finish one job before moving onto the next, while at the same time not losing any good ideas. At its best, this process feels like meditation, where you stay aware of your breath and don’t get caught in the spiral of your own thoughts.

My Ideal Job Description

September 2014

To Whom It May Concern,

I am writing this letter of recommendation on behalf of Kent Beck. He has been here for three years in a complicated role and we have been satisfied with his performance, so I will take a moment to describe what he has done and what he has done for us.

The basic constraint we faced three years ago was that exploding business opportunities demanded more engineering capacity than we could easily provide through hiring. We brought Kent on board with the premise that he would help our existing and new engineers be more effective as a team. He has enhanced our ability to grow and prosper while hiring at a sane pace.

Kent began by working on product features. This established credibility with the engineers and gave him a solid understanding of our codebase. He wasn’t able to work independently on our most complicated code, but he found small features that contributed and worked with teams on bigger features. He has continued working on features off and on the whole time he has been here.

Over time he shifted much of his programming to tool building. The tools he started have become an integral part of how we work. We also grew comfortable moving him to “hot spot” teams that had performance, reliability, or teamwork problems. He was generally successful at helping these teams get back on track.

At first we weren’t sure about his work-from-home policy. In the end it clearly kept him from getting as much done as he would have had he been on site every day, but it wasn’t an insurmountable problem. He visited HQ frequently enough to maintain key relationships and meet new engineers.

When he asked that research & publication on software design be part of his official duties, we were frankly skeptical. His research has turned into one of the most valuable of his activities. Our engineers have had early access to revolutionary design ideas and design-savvy recruits have been attracted by our public sponsorship of Kent’s blog, video series, and recently-published book. His research also drove much of the tool building I mentioned earlier.

Kent is not always the easiest employee to manage. His short attention span means that sometimes you will need to remind him to finish tasks. If he suddenly stops communicating, he has almost certainly gone down a rat hole and would benefit from a firm reminder to stay connected with the goals of the company. His compensation didn’t really fit into our existing structure, but he was flexible about making that part of the relationship work.

The biggest impact of Kent’s presence has been his personal relationships with individual engineers. Kent has spent thousands of hours pair programming remotely. Engineers he pairs with regularly show a marked improvement in programming skill, engineering intuition, and sometimes interpersonal skills. I am a good example. I came here full of ideas and energy but frustrated that no one would listen to me. From working with Kent I learned leadership skills, patience, and empathy, culminating in my recent promotion to director of development.

I understand Kent’s desire to move on, and I wish him well. If you are building an engineering culture focused on skill, responsibility and accountability, I recommend that you consider him for a position.



I used the above as an exercise to help try to understand the connection between what I would like to do and what others might see as valuable. My needs are:

  • Predictability. After 15 years as a consultant, I am willing to trade some freedom for a more predictable employer and income. I don’t mind (actually I prefer) that the work itself be varied, but the stress of variability has been amplified by having two kids in college at the same time (& for several more years).
  • Belonging. I have really appreciated feeling part of a team for the last eight months & didn’t know how much I missed it as a consultant.
  • Purpose. I’ve been working since I was 18 to improve the work of programmers, but I also crave a larger sense of purpose. I’d like to be able to answer the question, “Improved programming toward what social goal?”

A Few Tips for Using Saros for Remote Pairing

Saros (pronounced “zar-ose”, btw) is a set of extensions to Eclipse to support real-time collaboration. It is a research prototype at the moment, and as such has some rough edges. In 15 or 20 years, most programs will be written through real-time collaboration, so for me it’s worth a bit of pain today to experience the future.

This post describes what I learned from the Saros team about what is required to get started successfully. Unfortunately all the lessons are in the form of arcane magic (that is, I can’t explain to you why they work), but fortunately they do work.

Use Jabber.Org

Initially David Saff and I tried to use as our Jabber server. For mysterious reasons, that doesn’t work. Sign up for an account at instead.

Preload Projects

When one person shares a project, the other people in the session theoretically don’t need to have the project. However, Saros is slow at transferring whole projects. Instead, make sure that you have both checked out and imported the same version of the project before you start. Then, when you accept the invitation to share, make sure you select “Share existing project” instead of “Create new project”.

HTTP Proxy Magic

If you are using Mac OS X, you need to trick Eclipse into not using HTTP proxies. Select Eclipse>>Preferences>>Network Connections. Change from “Active Provider>>Direct” to one of the other settings. Press “Apply”. Select “Active Provider>>Direct”. Press “OK”.

You only need to do this once per Eclipse installation. Don’t you just love computers?


…to Lutz Prechelt, Karl Beecher, and Björn Kahlert for their help.

TDD is Kanban for Code

The other night Cynthia and I were having drinks in the Tower Bar of the Hotel Hafen in Hamburg (highly recommend for the view if not the service) with Henning Wolf and Arne Roock of it-agile when I casually mentioned that test-driven development was kanban for code. Arne teaches kanban but the connection wasn’t obvious to him, so I sketched my idea (see napkin above). He seemed to understand (he kept nodding, anyway), but I thought it prudent to follow up with a post to make sure I’d thought the whole thing through. Arne, this one’s for you.


The goal of kanban is to increase the value of a production process. Kanban increases the feedback in production by limiting the amount of work in progress at any one time. Without the “safety” of inventory, troublesome process steps and connections between steps have nowhere to hide. With better feedback, the steps can be optimized (or eliminated) and their connections improved. The result is higher throughput and lower latency and variance.

Kanban works by only producing on the basis of demonstrated need. While one finished product is being assembled, upstream steps are producing the parts for the next finished product, timed to arrive just as they are about to be used. When a kanban system is humming, it produces a steady stream of finished products with a short gap between the receipt of an order and its fulfillment. When it isn’t humming, the source of the dischord is likely to be clear.

Test-driven Development

Test-driven development (TDD) is an alternative programming workflow where features are divided into a series of automated tests. The tests are chosen so that satisfying the spirit of all the tests implies that the feature as a whole works correctly. The cycle goes like this:

  1. Write the next test (while several tests may be outlined, only one is written at a time)
  2. In pessimistic languages, add enough stubs so the test compiles. It should fail when run.
  3. Refactor, if necessary, to prepare for the implementation.
  4. Change the logic of the system so the test passes.
  5. Refactor to eliminate duplication and other excess complexity

Repeat until no more of the tests implied by the feature would fail. Then the feature is complete.

The Analogy

If TDD is kanban for code:

  • What is the product?
  • What is the kanban card?
  • What are the production steps?
  • How does demand flow backward and products flow forward?
  • What are the feedback loops?

The product of software development is two-fold:

  • The behavior of the system
  • The options for extending and modifying that behavior in the future

As difficult as it can be to precisely specify and validate the behavior of the system, it is even harder to measure a system’s option value. Finding the right balance between these two goals is one of the big challenges of development, especially as different phases of the business cycle require different proportions. I’ll describe in a moment how TDD addresses this balance, both in theory and in practice.

The tests are the kanban cards in this system. Each one is a request for a change of behavior. “Please change the system so if the customer is a smoker the premium is 12% higher”. This will require changes to the logic of the system. It may also require changes to the structure of the system so the change to the logic is easier to make correctly and efficiently.

The post-success refactoring commonly emphasized in descriptions of TDD isn’t explicitly called for by the tests. It is work done for future benefit. If the future happens to be the next test and implementing it is easy because of cleanup done after the last test was satisfied, then the payoff is immediate. From the kanban perspective, though, post-success refactoring is over-production, the worst of the seven wastes of the Toyota Production System. Still, most TDDers, myself included, clean up after a test is passing as a matter of habit.

The behavior and options are the product, the test is the kanban card, so that makes the production steps the changes to the logic and the changes to the structure of the system. In TDD, these changes are not begun until a test is failing. The changes flow forward into the product, as demonstrated by the passing test, and the lack of collateral damage is demonstrated when all previous tests also pass.

Feedback loops fuel kanban. In TDD, the programmer gets feedback in seconds about whether the logic implied by the test is the logic he writes. He also gets feedback about whether the design supports the logic changes based on how hard or easy those changes are to get right.


This isn’t the whole story of development. Post-success refactoring doesn’t fit into this picture but is a common practice. Is this just because of our still-evolving understanding of TDD, or is it more like the preventative maintenance needed to keep kanban systems running smoothly? Should some or all of it be deferred until immediately before it is needed? What about other valuable software development activities like automation, tool building, and exploration?

We could also pull back and look at development at a larger scale and see the features as kanban cards, each one “pulling” a collection of tests into the system, triggering coding. From even further back, a process like Customer Development can be seen as “pulling” demand for features.

In any case, TDD is kanban for code:

  • Product = behavior and options
  • Kanban card = test
  • Production step = coding and refactoring
  • Feedback = effort and test results

Arne, how does that work for you?

Decisions that go into implementing Stack

As part of a recent advanced TDD course, we took a careful look at a simple stack implementation TDD-style. Here are the decisions that went into designing and implementing the stack. First, the specification decisions:

  • Stack is an object
  • Name is “Stack”
  • There is an operation to add an element
  • It is called “push”
  • It takes a parameter
  • The type of the parameter is the same as the type of the stack
  • Stack has a type parameter
  • There is an operation to remove elements
  • Its name is “pop”
  • Its return value is the same as the type of the stack
  • Elements are ordered LIFO

Here are the implementation decisions:

  • Store the elements in a List
  • Type of the list is the same as the type of the stack
  • Implementation type is ArrayList
  • Add/remove elements at the beginning of the list

Here’s the exercise: start with any decision above & TDD, then another, and another. Pay attention to how frequently you can reach a green test. Pay attention to which sequences of decisions actually make sense.

What we found was that of the 15! permutations of decisions, many of them worked just fine and could be used for different purposes.

Minimum Viable Product revisited

I wrote the following in response to a question about the Lean Startup practice of Minimum Viable Product.

The straightforward interpretation of MVP is a product that is built to gain feedback rather than built to maximize sales. I find it helpful to extend the idea. Here’s my interpretation:

  • “Minimum” is a reminder to invest as little as possible to get the next burning question answered or assumption validated.
  • “Viable” is a reminder to build enough to answer that question.
  • “Product” is a reminder to work from particulars.

MVP to me means “what I need to make in order to learn something valuable”. At first this can be as simple as a phrase: “we’ll cross StackOverflow with Twitter” (Quora). If you say your phrase to five people who ought to be interested if your assumptions are accurate and they all respond enthusiastically, then you’ve learned something valuable. Sharpie sketches on index cards could be a next step. Again, show them to people who you think “ought” to be interested and their reactions will give you valuable feedback. A wireframe might be your next step. Then a working but emaciated product. Then adding and/or deleting features.

The goal at each step is gathering feedback fast and cheap. You’re not trying to invest in these increments, you’re trying to avoid investing until you are more certain that a payoff is likely. You stick with a level of investment as long as it is providing valuable feedback, then move on (either forward or back, depending on the feedback). For example, I’ve seen people stick with wireframing long after it has ceased to provide feedback, which is just as wasteful as skipping wireframing if it can validate a hypothesis more cheaply than real code.

I call these steps “Informative Increments”, of which the MVP=barely-but-informatively-functioning-prototype is a special case. Unfortunately, neither the phrase itself nor the acronym can compete with MVP. The principle remains, though: while decisions are risky, make them & validate them as cheaply as possible to preserve capital for the (nearly) inevitable iterations.

The temptation in StartupLand is to try to make something good enough to survive. The paradox of the MVP is that by making a series of products which aren’t good enough to survive but are good enough to inform, you increase your chance of eventual success.

Why Accelerate Deployment?

The premise of my recent Software G Forces talk is that deployment cycles are shrinking, and that what constitutes effective software development at one cycle (say annual deployments) can be fatal at another (like daily deployment). Each transition–annual->quarterly->monthly->weekly->daily->hourly–requires a different approach to development. Everyone can find their current deployment cycle in the sequence above and everyone is under pressure to shrink the cycle.

Almost everyone. I gave a workshop based on the G Forces model in Israel recently and one workgroup made it clear that their current deployment cycle was just fine. As a followup, someone else asked the fundamental question, “Why should we deploy more frequently?” My inspiration for the talk was my long-standing observation that cycles are shrinking, but I never really thought about why, so I didn’t have a good answer. This post, then, gives me a chance to think about why to shrink deployment cycles. (I’ll be giving the talk in Hamburg on Thursday, November 4, 2010 if you’d like to see it live.)


The obvious reason to deploy more frequently is to get a jump on the competition. If you are in a head-t0-head competition where features matter and you can bring them out faster, you should win. If the villain gets ahead, you can rapidly catch up. If they get behind, you can keep them from catching up. Analogies to the OODA loop come to mind.

When I tried to come up with examples of such competition, though, I had a hard time finding any recently. The days of word processors competing on feature lists is long gone, resulting as it does in bloat and complexity. One recent example is Bing versus Google. Even there, the struggle is more to learn about user behavior more quickly than the competition, not a strict feature battle. It would be an advantage if one of them could deploy weekly and the other only monthly, but the winner still would be the one who understood users best.


A lesson I learned from my officemate at Oregon, David Meyer (now a director at Cisco), is that as systems grow in complexity, every element is potentially coupled to every other element. This suggests that systems be made as simple as possible to keep that N^2 from blowing up too far, and it suggests that changes be made one at a time. If any change can potentially affect any part of the system, then introducing two changes at once is much more complicated to debug than introducing one change. Was the problem change A? Change B? Some interaction of A and B? Or was it just a coincidence? Introducing one change at a time keeps the cost of maintenance in check.

At the same time, systems need to grow rapidly. You need many changes but you can only introduce one change at a time. One way to reconcile the conflicting needs is to reduce the cycle time. If you can change a system in tiny, safe steps but you can make those steps extremely quickly, then it can look from the outside like you are making big changes.

Timothy Fitz, formerly of IMVU, told a story that brought this lesson home to me. The discipline they came to was that as soon as they said to themselves, “That change couldn’t possibly break anything,” they deployed immediately. If you weren’t at least a little worried, why would you even say that? By making the overhead of deployment vanishingly small, they could create value with every deployment. Either the deployment was fine, in which case the engineer regained confidence, or the deployment broke, in which case the engineer learned something about the sensitivities of the system.


In Toyota Production System, Taiichi Ohno makes an analogy between inventory and water in a river. By lowering the water level in the river (reducing inventory), you can uncover previously hidden rocks (identify bottlenecks). Undeployed software is inventory. By deploying in smaller batches, you can identify bottlenecks in your software process.

Startups have a vital need to eliminate waste. Because many of the assumptions on which a startup are based are likely to prove false, most startups need to iterate to be successful. If the team can learn to learn from each iteration and can make those iterations short and cheap, then the whole enterprise has a greater chance of succeeding overall. Startups have the initial advantage of no code and no customers, so establishing a rapid deployment rhythm is relatively easy, although maintaining that rhythm through growth can be a challenge.


The final reason I thought of for accelerating the deployment cycle is the adventure. Especially if someone claims it is impossible, establishing a rapid rhythm is simply fun and satisfying. Don’t underestimate the role of fun in driving innovation.


There are my reasons for accelerating deployment: responding to (or staying ahead of) competition, scaling safely, identifying waste, and fun. My next post will look take a more abstract look at how accelerating deployment works, through its effects on latency, throughput, and variance.

Commercial plug–the switching costs between tools becomes more significant as the deployment cycle shrinks. That’s why JUnit Max runs tests automatically and displays test results right in the source code.

Tech Podcasts

The0retico asked for a list of my favorite tech-oriented podcasts. Since I listen every day while I do farm chores, I have ample time. Here is my list (oh, the picture above is my quintessential geek picture, from an outstanding, but non-tech-related, book):

  • Entrepreneurial Thought Leaders — Guest speakers to Steve Blank’s entrepreneurship class at Stanford. Most enduring quote: M.C. Hammer, “It’s all about the analytics.”
  • Hanselminutes — Sometimes spins off into Microsoft minutiae, but often interesting in general. Most memorable episode: building the ultimate developer machine.
  • History of Rome — Nothing tech about it, but just absolutely brilliant stories. Deserves a Pulitzer.
  • IT Conversations — Wide variety of tech topics, many interesting.
  • The Changelog — Geeky conversations about open source.
  • Software Engineering Radio — The Big Daddy, and not just because I finally got interviewed. A little too much modeling for my taste, but everyone is entitled to an obsession.
  • This Developer’s Life — A recent find, but shaping up nicely. No particular point but to provide a setting for storytelling.

And finally, my favorite music for programming:

Please feel free to point me to new casts in the comments.

The Economic Case for JUnit Max

In preparing to relaunch JUnit Max, I’ve been try to articulate exactly why it is worth the price. I’m still conflicted about charging for Max, although realistically if I had no chance to be paid for it I couldn’t afford to work on it. If I’m going to charge money, though, I’d like to know that Max is worth it.

For me this isn’t an issue. I run tests all the time and I appreciate the time savings and additional focus Max gives me. Of course, I had to give up the things that the time I spent implementing Max could have brought, but that’s a sunk cost for me. I simply like programming more when I have Max, which is a big part of my motivation in the re-launch.

If you’re out there with $100 in your pocket, though, I can imagine that you need convincing. If you’re going to ask someone else for the $100, then they need convincing. So, does Max make economic sense?

Here’s an envelope calculation based entirely on time savings. I’ll take the JUnit unit tests as a baseline. They take about 10 seconds to run. While we’re programming we run them ~50 times per hour. Scaling this up to a full-time job, we would be spending 10 seconds/run * 50 runs/hour * 1000 programming hours/year = 139 hours/year waiting for tests to finish. At $200/hour, your employer is paying $27,000 for you to wait for test results.

With Max, wait times are reduced because of test prioritization. With the JUnit test suite we get results in 2 seconds instead of 10. Even if the savings were only 50%, though, Max would still be worth $13,000/year (and the savings on longer-running tests suites will be larger than the 80% we get for the JUnit tests). In other words, Max pays for itself roughly every two working days.

And that doesn’t count all the times I stay focused on programming because it’s only two seconds instead of getting distracted during a 10-second or 30-second pause and not getting anything done for several minutes. I can only conclude that at $100/year Max is seriously underpriced. My conscience is assuaged.

CD Survey: What practices do developers use?

The survey I’ve been writing about (raw results here) was intended to give us speakers at the continuous deployment webinar (Timothy Fitz, Jez Humble, and myself) some background on the attendees. I’ve saved the best (most informative) question for last: what practices do attendees use in software development. Here is the data:

What practices do people use?

Some thoughts:

  • Business-based operations metrics. One of the key insights of continuous deployment is using business-oriented metrics to monitor operations instead of the more natural (for programmers, anyway) technical metrics. If you expect 50 sign-ups per minute and the rate suddenly dips to 20/minute after a deployment, it’s time to roll back. The practice is not in common use.
  • Kanban versus iterations. Iterations still dominate, even though the additional flexibility of kanban is a better match for continuous deployment.
  • Pair programming. For all the complaints I hear about pair programming, I would have expected this number to be lower than 25%.
  • Test-driven development. 50% is higher than I would have expected. Adoption of TDD is excellent preparation for teams wishing to deploy more frequently (see my commercial screencasts for more details).
  • Continuous integration. I expected this number to be higher. CI was the first practice from Extreme Programming to spread widely, but, at least among this audience, it is not pervasive.
  • More than 75% of teams test manually before deployment. This is a sensible practice until the defect rate is brought down and the operations infrastructure made robust in the face of errors, but I expect the number to drop as teams mature in their application of continuous deployment.

Change generally happens on a time scale of decades. Mass production and then lean production each took upwards of fifty years to become widespread. I don’t mean to be overconfident, but the picture above (skewed as it is by selection bias) paints a picture of software development that is substantially different than common practice twenty or even ten years ago. There’s still a long way to go until software development pours out the stream of value it is capable of, but we’re making progress.

Commercial plugs: Check out my series of screencasts on intermediate-level test-driven development, $25 for four episodes. If you run unit tests for Java in Eclipse, check out JUnit Max, the continuous testing plugin, $50/year until August 1, 2010.