online shoes store xkshoes,here check the latest yeezy shoes click here.

know more about 2020 nike and adidas soccer cleats news,check shopcleat and wpsoccer.

Where, Oh Where to Test?

Kent Beck, Three Rivers Institute

System versus low-level tests

Abstract: Where should I test? High-level tests give me more coverage but less control. Low-level tests cover less of the system and can be broken by ordinary changes, but tend to be easy to write and quick to run. The answer to, "Where should I test?" is, as always, "It depends." Three factors influence the most profitable site for tests: cost, reliability, and stability.


What makes software development fascinating after thirty five years is the conundrums and contradictions I still encounter every programming day. Another one of these came up the other day. I was programming with a client when they said they were glad to have system-level tests for the part of the system that collects payments from customers. The design and implementation weren't right yet, but with a comprehensive set of functional tests they were confident that they could replace the implementation without affecting the user-visible behavior. Now, just the day before I had gotten stuck because I had comprehensive functional tests, setting off my Conundrum Alarm. What was going on?

    The tests for payment collection were written in terms of business events: time passes, payments become due, payments arrive, payments are cancelled, contracts change between the time payments are requested and when they arrive, and so on. The business rules related to payment processing is complicated by all the crazy special cases. At the same time, smoothly handling payments, automatically wherever possible, is critical to reducing customer service costs. To write these tests, though, the system required support for recreating specific business situations (hand-written tests tend to be too simple to be realistically exercise the system).

    The day before I had encountered a counter-example to the value of functional tests. I was building a system to translate quantitative data into the Google Chart API for some curious data I had collected about how programmers use tests (I'll cover this in a later post). With the Chart API you supply a URL with various parameters containing data and formatting and Google gives you a bitmap in return. I wrote my first test with my input and the URL I expected to generate. I made it pass. I wrote a second, more complicated test. In making that test run, I broke the first test and had to fix it. Same story with the third test--two broken tests, extra work to get them working. Every time I changed anything in the code, all the tests broke. Eventually I gave up.

    Overnight I had an idea: what if I had little objects generate the parts of the URL? It should be easy to test those little objects. Concatenating the results and inserting ampersands I was confident I could do correctly. Once I got rolling on this new implementation, the individual objects were so simple I just verified them manually. (If I wasn't in such a hurry to see the graphs I would have written automated tests.) In the end, I got the functionality I wanted quickly with code that only required component-level validation.

    To test the system or to test the objects, that was question on which I now had contradictory evidence. The payment collection system profited from system-level tests and would have found object-level tests a burden. The data charting system was just the opposite, nearly killed by the burden of system-level tests. How could I productively think of this situation?

Cost, Stability, Reliability

On reflection, I realized that level of abstraction was only a coincidental factor in deciding where to test. I identified three factors which influence where to write tests, factors which sometimes line up with level of abstraction, but sometimes not. These factors are cost, stability, and reliability.

    Cost is an important determiner of where to test, because tests are not essential to delivering value. The customer wants the system to run, now and after changes, and if I could achieve that without writing a single test they wouldn't mind a bit. The testing strategy that delivers the needed confidence at the least cost should win.

    Cost is measured both by the developer time required to write the tests and the time to execute them (which also boils down to developer time). Effective software design can reduce the cost of writing the tests by minimizing the surface area of the system. However, setting up a test of a whole system is likely to take longer than setting up a test of a single object, both in terms of developer time and CPU time.

    In the payment collection case, writing the system-level tests was expensive, but made cheaper by the infrastructure in place for recording and replaying such tests. Executing such tests takes seconds instead of milliseconds, though. However, since no smaller scale test provides confidence that the system as a whole runs as expected, these are costs that must be paid to achieve confidence. In the charting example, setting up tests was simple at either level, and running them was lightning fast in either case.

    Stability is an aspect of cost: given a test that passes, how often does it provide incorrect feedback? How often does it fail when the system is working, or pass when the system has begun to fail? Every such case is an additional cost that should be charged to the test, and minimizing these costs is a reasonable goal of testing. The payment collection system didn't have a problem with stability. Business rules changed occasionally, but when they changed the tests needed to change at the same time. The data charting system had a serious stability problem, though. Tests were constantly breaking even though the part of the system they validated was still working.

    I could have spent more effort on my charting tests, carefully validating only parts of the URL. This is a strategy often employed when the expected output of a system is HTML or XML: check to make sure a particular element is present without checking the entire output. Partial validation increases the cost of testing, because you have to implement the mechanism to separate out the parts of the output and check them separately, and it reduces coverage. That is, it is possible for an output whose parts all check out to be incorrect as a whole. However, this risk may be worth it compared to having system-level tests continually fail because of irrelevant changes to the output.

    Reliability is a third factor in deciding where to test. I should test where I'm likely make mistake, and not things that always work. Each test is a bet, a probabilistic investment. If it succeeds or fails unexpectedly, it pays off. Since it's probabilistic, I can't tell with certainty beforehand which ones I should write. However, I can certainly track trends and decide to concentrate where I make mistakes (for me, I tend to make sense of test errors, so I'm particularly careful testing around logic requiring conditionals).

    The challenging part of the payment collection system is the overall system behavior. Components like sending an email or posting an amount to an account are reliable. It's putting all the pieces together that creates the opportunity for errors. On the basis of reliability, then, the payment system will find system-level tests more valuable. The data charting system reverses this analysis. Getting the commas and vertical bars in just the right places is error prone. Given a sequence of valid parameters, though, putting them together is trivial. Tests of the components are just as likely to pay off as tests of the whole URL, so the advantage goes to the cheaper tests.


I usually prefer low-level tests in my own practice, I think because of the cost advantages. However, this preference is not absolute. I often start development with a system-level test, supported by a number of low-level tests as I add functionality. When I've covered all the variations I can think of by low-level tests, I seed further development with another system-level test. The result is a kind of "up-down-down-down-up-down-down" rhythm. The system-level tests ensure that the system is wired together correctly, while the low-level tests provide comprehensive confidence that the various scenarios are covered.

    Our work on JUnit tilts this balance towards system-level tests. Writing and running system-level tests is generally economical, both in programmer and execution time. Simulating test conditions, normal and otherwise, is generally easy. Our tests tend to exercise the system through the published API (;). A fourth of our tests (116/404 as of today) are written at the system level.

    When I'm working on Eclipse plug-ins, the balance tilts the other way. System-level tests tend to be cumbersome and run slowly, so I work hard to further modularize the system so I can write comprehensive low-level tests. My goal is to have reasonable confidence that my plug-in works without starting Eclipse. Some system-level tests are necessary in the whole suite, if only to validate assumptions my code makes of Eclipse, assumptions that may change between releases.

    Large-scale and small-scale tests both have their place in the programmer's toolbox. Identifying the unique combination of cost, stability, and reliability in a given situation can help to place tests where they can do the most good for the least cost.