In a recent automation work I’ve been asked to take part in, running a single test safely, according to peers, generally follow a three-step process:
- Delete existing feature files:
rm -rf *.feature
- Create updated feature files:
PROJECT_DIRECTORY=../<dir> APP=<app> TEST=automated ruby create_tests.rb, where app is the name of the app you want to test
- Run the desired specific test:
cucumber --tags @<tag>
This means I have to re-run these three steps over and over for every single test I want to review. That can get tiring easily, especially if I have to re-type them repeatedly, or even if I copy-paste. Using the arrow keys in the terminal helps, but sometimes commands can get lost in the history and then it becomes a bit of a hassle to find them.
There should be a way for me to run an automated check I want to review, given just the app name and the test tag. It would be nice if I can run a specific test using a shorter command but still following the same process as provided.
I decided to set aside a little bit of time to write a personal script, and used Makefile because I’ve had experience with that before.
My personal short command for running a test:
make test app=<app> tag=<tag>
And here’s what the script looks like, which runs the known commands step-by-step plus displays some information on what’s currently happening:
Now I won’t have to worry about remembering what commands to type in as well as the sequence of the commands. I can give more focus on actually reviewing the test itself, which is what’s more important.
Tracking exploratory testing work is difficult for test managers. We don’t want to micro-manage testers, we want them to explore to their hearts content when they test, but we won’t know how much progress there is if we don’t track the work. We also won’t know what sort of problems testers encounter during testing, unless they have the nerve to tell us immediately. Jonathan Bach’s “Session-Based Test Management” article has one suggestion: use sessions, uninterrupted blocks of reviewable and chartered test effort.
Here are a few favorite notes from the article:
Unlike traditional scripted testing, exploratory testing is an ad hoc process. Everything we do is optimized to find bugs fast, so we continually adjust our plans to re-focus on the most promising risk areas; we follow hunches; we minimize the time spent on documentation. That leaves us with some problems. For one thing, keeping track of each tester’s progress can be like herding snakes into a burlap bag.
The first thing we realized in our effort to reinvent exploratory test management was that testers do a lot of things during the day that aren’t testing. If we wanted to track testing, we needed a way to distinguish testing from everything else. Thus, “sessions” were born. In our practice of exploratory testing, a session, not a test case or bug report, is the basic testing work unit . What we call a session is an uninterrupted block of reviewable, chartered test effort. By “chartered,” we mean that each session is associated with a mission—what we are testing or what problems we are looking for. By “uninterrupted,” we mean no significant interruptions, no email, meetings, chatting or telephone calls. By “reviewable,” we mean a report, called a session sheet, is produced that can be examined by a third-party, such as the test manager, that provides information about what happened.
From a distance, exploratory testing can look like one big amorphous task. But it’s actually an aggregate of sub-tasks that appear and disappear like bubbles in a Jacuzzi. We’d like to know what tasks happen during a test session, but we don’t want the reporting to be too much of a burden. Collecting data about testing takes energy away from doing testing.
We separate test sessions into three kinds of tasks: test design and execution, bug investigation and reporting, and session setup. We call these the “TBS” metrics. We then ask the testers to estimate the relative proportion of time they spent on each kind of task. Test design and execution means scanning the product and looking for problems. Bug investigation and reporting is what happens once the tester stumbles into behavior that looks like it might be a problem. Session setup is anything else testers do that makes the first two tasks possible, including tasks such as configuring equipment, locating materials, reading manuals, or writing a session report
We also ask testers to report the portion of their time they spend “on charter” versus “on opportunity”. Opportunity testing is any testing that doesn’t fit the charter of the session. Since we’re in doing exploratory testing, we remind and encourage testers that it’s okay to divert from their charter if they stumble into an off-charter problem that looks important.
Although these metrics can provide better visibility and insight about what we’re doing in our test process, it’s important to realize that the session-based testing process and associated metrics could easily be distorted by a confused or biased test manager. A silver-tongued tester could bias the sheets and manipulate the debriefing in such a way as to fool the test manager about the work being done. Even if everyone is completely sober and honest, the numbers may be distorted by confusion over the reporting protocol, or the fact that some testers may be far more productive than other testers. Effective use of the session sheets and metrics requires continual awareness about the potential for these problems.
- One colleague of mine, upon hearing me talk about this approach, expressed the concern that senior testers would balk at all the paperwork associated with the session sheets. All that structure, she felt, would just get in the way of what senior testers already know how to do. Although my first instinct was to argue with her, on second thought, she was giving me an important reality check. This approach does impose a structure that is not strictly necessary in order to achieve the mission of good testing. Segmenting complex and interwoven test tasks into distinct little sessions is not always easy or natural. Session-based test management is simply one way to bring more accountability to exploratory testing, for those situations where accountability is especially important.
The first edition of “Planning Extreme Programming” by Kent Beck and Martin Fowler was published about 18 years (that long) ago, and already it says so much about how planning for software projects can be done well. It talks about programmers and customers, their fears and frustrations, their rights and responsibilities, and putting all those knowledge into a planning style that can work for development teams. It doesn’t have to be labeled XP, but it does need to help people be focused, confident, and hopeful.
Here are some noteworthy lines from the book:
- Planning is not about predicting the future. When you make a plan for developing a piece of software, development is not going to go like that. Not ever. Your customers wouldn’t even be happy if it did, because by the time the software gets there, the customers don’t want what was planned; they want something different.
- We plan to ensure that we are always doing the most important thing left to do, to coordinate effectively with other people, and to respond quickly to unexpected events.
- If you know you have a tight deadline, but you make a plan and the plans says you can make the deadline, then you’ll start on your first task with a sense of urgency but still working as well as possible. After all, you have enough time. This is exactly the behavior that is most likely to cause the plan to come true. Panic leads to fatigue, defects, and communication breakdowns.
- Any software planning technique must try to create visibility, so everyone involved in the project can really see how far along a project is. This means that you need clear milestones, ones that cannot be fudged, and clearly represent progress. Milestones must also be things that everyone involved in the project, including the customer, can understand and learn to trust.
- We need a planning style that
- Preserves the programmer’s confidence that the plan is possible
- Preserves the customer’s confidence that they are getting as much as they can
- Costs as little to execute as possible (because we’ll be planning often, but nobody pays for plans; they pay for results)
- If we are going to develop well, we must create a culture that makes it possible for programmers and customers to acknowledge their fears and accept their rights and responsibilities. Without such guarantees, we cannot be courageous. We huddle in fear behind fortress walls, building them ever stronger, adding ever more weight to the development processes we have adopted. We continually add cannonades and battlements, documents and reviews, procedures and sign-offs, moats with crocodiles, torture chambers, and huge pots of boiling oil. But when our fears are acknowledged and our rights are accepted, then we can be courageous. We can set goals that are hard to reach and collaborate to make those goals. We can tear down the structures that we built out of fear and that impeded us. We will have the courage to do only what is necessary and no more, to spend our time on what’s important rather than on protecting ourselves.
- We use driving as a metaphor for developing software. Driving is not about pointing in one direction and holding to it; driving is about making lots of little course corrections. You don’t drive software development by getting your project pointed in the right direction (The Plan). You drive software development by seeing that you are drifting a little this way and steering a little that way. This way, that way, as long as you develop the software.
- When you don’t have enough time you are out of luck. You can’t make more time. Not having enough time is a position of helplessness. And hopelessness breeds frustration, mistakes, burnout, and failure. Having too much to do, however, is a situation we all know. When you have too much to do you can prioritize and not do some things, reduce the size of some of the things you do, ask someone else to do some things. Having too much to do breeds hope. We may not like being there, but at least we know what to do.
- Focusing on one or two iterations means that the programmers clearly need to know that stories are in the iteration they are currently working on. It’s also useful to know what’s in the next iteration. Beyond that the iteration allocation is not so useful. The real decider for how far in advance you should plan is the cost of keeping the plan up-to-date versus the benefit you get when you know that plans are inherently unstable. You have to honestly asses the value compared to the volatility of the plans.
- Writing the stories is not the point. Communicating is the point. We’ve seen too many requirements documents that are written down but don’t involve communication.
- We want to get a release to the customer as soon as possible. We want this release to be as valuable to the customer as possible. That way the customer will like us and keep feeding us cookies. So we give her the things she wants most. That way we can release quickly and the customer feels the benefit. Should everything go to pot at the end of the schedule, it’s okay, because the stories at risk are less important than the stories we have already completed. Even if we can’t release quickly, the customer will be happier if we do the most valuable things first. It shows we are listening, and really trying to solve her problems. It also may prompt the customer to go for an earlier release once she sees that value of what appears.
- One of the worst things about software bugs is that they come with a strong element of blame (from the customer) and guilt (from the programmer). If only we’d tested more, if only you were competent programmers, there wouldn’t be these bugs. We’ve seen people screaming on news groups and managers banging on tables saying that no bugs are acceptable. All this emotion really screws up the process of dealing with bugs and hurts the key human relationships that are essential if software development is to work well.
- We assume that the programmers are trying to do the most professional job they can. As part of this they will go to great lengths to eliminate bugs. But nobody can eliminate all of them. The customer has to trust that the programmers are working hard to reduce bugs, and can monitor the testing process to see that they are doing as much as they should.
- For most software, however, we don’t actually want zero bugs. (Now there’s a statement that we guarantee will be used against us out of context.) Any defect, once it’s in there, takes time and effort to remove. That time and effort will take away from effort spent putting in features. So you have to decide which to do. Even when you know about a bug, someone has to decide whether you want to eliminate the bug or add another feature. Who decides? In our view it must be the customer. The customer has to make a business decision based on the cost of having the bug versus the value of having another feature – or the value of deploying now instead of waiting to reduce the bug count. (We would argue that this does not hold true for bugs that could be life-threatening. In that case we think the programmers have a duty to public safety that is far greater than their duty to the customer.) There are plenty of cases where the business decision is to have the feature instead.
- All the planning techniques in the world, can’t save you if you forget that software is built by human beings. In the end keep the human beings focused, happy, and motivated and they will deliver.
Looking at the screenshot above, it’s ironic that there’s a mismatch between the commit message and the actual pipeline result from the tests after pushing the said commit. 😛 What happened was that after making the unit tests pass on my local machine, I fetched the latest changes from the remote repository, made a commit of my changes, and then pushed the commit. Standard procedure, right? After receiving the failure email I realized that I forgot to re-run the tests locally to double-check, which I should have done since there were changes from remote. Those pulled changes broke some of the tests. Honest mistake, lesson learned.
I frequently make errors like this on all sorts of things, not just code, and I welcome them. They bewilder me in a good way, and remind me of how fallible I can be. They show me that it’s alright to make mistakes, and tell me that it’s all part of the journey to becoming better.
And yes, testers can also test on the unit level! 🙂
The experience of writing and running automated checks, as well as building some personal apps that run on the terminal, in recent years, has given me a keen sense of appreciation on how effective command-line applications can be as tools. I’ve grown fond of quick programming experiments (scraping data, playing with Excel files, among others), which are relatively easy to write, powerful, dependable, and maintainable. Tons of libraries online help interface well with the myriad of programs in our desktop or out in the web.
Choosing to read “Build Awesome Command-Line Applications in Ruby 2” is choosing to go on an adventure about writing better CLI apps, finding out how options are designed and built, understanding how they are configured and distributed, and learning how to actually test them.
Some notes from the book:
- Graphical user interfaces (GUIs) are great for a lot of things; they are typically much kinder to newcomers than the stark glow of a cold, blinking cursor. This comes at a price: you can get only so proficient at a GUI before you have to learn its esoteric keyboard shortcuts. Even then, you will hit the limits of productivity and efficiency. GUIs are notoriously hard to script and automate, and when you can, your script tends not to be very portable.
- An awesome command-line app has the following characteristics:
- Easy to use. The command-line can be an unforgiving place to be, so the easier an app is to use, the better.
- Helpful. Being easy to use isn’t enough; the user will need clear direction on how to use an app and how to fix things they might’ve done wrong.
- Plays well with others. The more an app can interoperate with other apps and systems, the more useful it will be, and the fewer special customizations that will be needed.
- Has sensible defaults but is configurable. Users appreciate apps that have a clear goal and opinion on how to do something. Apps that try to be all things to all people are confusing and difficult to master. Awesome apps, however, allow advanced users to tinker under the hood and use the app in ways not imagined by the author. Striking this balance is important.
- Installs painlessly. Apps that can be installed with one command, on any environment, are more likely to be used.
- Fails gracefully. Users will misuse apps, trying to make them do things they weren’t designed to do, in environments where they were never designed to run. Awesome apps take this in stride and give useful error messages without being destructive. This is because they’re developed with a comprehensive test suite.
- Gets new features and bug fixes easily. Awesome command-line apps aren’t awesome just to use; they are awesome to hack on. An awesome app’s internal structure is geared around quickly fixing bugs and easily adding new features.
- Delights users. Not all command-line apps have to output monochrome text. Color, formatting, and interactive input all have their place and can greatly contribute to the user experience of an awesome command-line app.
- Three guiding principles for designing command-line applications:
- Make common tasks easy to accomplish
- Make uncommon tasks possible (but not easy)
- Make default behavior nondestructive
- Options come in two-forms: short and long. Short-form options allow frequent users who use the app on the command line to quickly specify things without a lot of typing. Long-form options allow maintainers of systems that use our app to easily understand what the options do without having to go to the documentation. The existence of a short-form option signals to the user that that option is common and encouraged. The absence of a short-form option signals the opposite— that using it is unusual and possibly dangerous. You might think that unusual or dangerous options should simply be omitted, but we want our application to be as flexible as is reasonable. We want to guide our users to do things safely and correctly, but we also want to respect that they know what they’re doing if they want to do something unusual or dangerous.
- Thinking about which behavior of an app is destructive is a great way to differentiate the common things from the uncommon things and thus drive some of your design decisions. Any feature that does something destructive shouldn’t be a feature we make easy to use, but we should make it possible.
- The future of development won’t just be manipulating buttons and toolbars and dragging and dropping icons to create code; the efficiency and productivity inherent to a command-line interface will always have a place in a good developer’s tool chest.
Here’s a screenshot of something we’ve currently been working on:
It’s not available publicly yet, because it’s still in the early stages of development. The app, hosted locally on http://aspen.reservations.com/#/stay-dates, will redirect to a 404 page unless you have the app installed in your machine. As a programmer, the changes I’m making on the app is only accessible on my local computer. Other programmers will only be able to view the changes I’ve made if I push said changes and they download those updates on their machines.
Similarly, I’ll be redirected to an error page if I view the said app on my phone:
What if a tester or a product owner requests an impromptu demo of the app? What if they want to test it as is? Early constructive feedback is always nice. We can set up the app on their machine, but what if they’re working remotely? I’m pretty sure it would be a pain for them to install the development environment on their own. And what if they want to test it on their mobile phone? Having the environment installed on their computer doesn’t mean they can test the app on an actual phone.
Apparently there’s Ngrok. It’s a quick solution to publicly sharing local apps over the internet.
Here’s a screenshot of me trying out the free version:
And now I can test the app on my phone!
No installations needed on the tester / product owner / client side, just a single bash command from the programmer instead. So cool!
Until recently, what I mostly knew about test-driven development was the concept of red-green-refactor, unit testing application code while building them from the bottom-up. Likewise, I also thought that mocking was only all about mocks, fakes, stubs, and spies, tools that assist us in faking integrations so we can focus on what we really need to test. Apparently there is so much more to TDD and mocks that many programmers do not put into use.
Justin Searls calls it discovery testing. It’s a practice that’s attempts to help us build carefully-designed features with focus, confidence, and mindfulness, in a top-down approach. It breaks feature stories down into smaller problems right from the very beginning and forces programmers to slow down a little bit, enough for them to be more thoughtful about the high-level design. It uses test doubles to guide the design, which shapes up a source tree of collaborators and logical leaf nodes as development go. It favors rewrites over refactors, and helps us become aware of nasty redundant coverage with our tests.
And it categorizes TDD as a soft skill, rather than a technical skill.
He talks about it in more depth in a series of videos, building Conway’s Game of Life (in Java) as a coding exercise. He’s also talked about how we are often using mocks the wrong way, and shared insight on how we can write programs better.
Discover testing is fascinating, and has a lot of great points going for it. I feel that it takes the concept of red-green-refactor further and drives long-term maintainable software right from the start. I want to be good at it eventually, and I know I’d initially have to practice it long enough to see how much actual impact it has.