Software testing , Control theory and Cynefin

So I recently attended a brilliant workshop on the Cynefin framework at Let´s Test held by Duncan Nisbet. That workshop got me thinking about the control theory I studied at the university which I up until this day have never really used. But when starting to think about software development as a control theory problem I could connect the dots between an unstable control system and the Cynefin domain of chaos. Control theory is a collection of general theories about controlling systems. It could be a thermostat controlling the indoor temperature, making a Segway keep the balance, adjusting the flow of packets in a computer network etc. The main principle is showed in the picture below.

The Reference signal would in the software development world be the requirements leading to some kind of change to the system. The Controller would be the developers making changes to the system from the requirements. The System is the system under development , and finally, the Sensor is our beloved testers observing the system and sending feeback to development. The feedback received is compared to the reference signal (difference between actual behavior and desired behavior) and adjustments are made (bugs are fixed). You would probably like your development process make the system get closer and closer to the desired behavior over time. But sometimes the behavior is oscillating (when bugs are fixed, just as many new ones are introduced) or worse (fixes and changes steadily decreases the quality of the system). Getting closer and closer to the reference signal is refered to as stability In the control theory world. There are different kinds of stabillity but this is one of them “A linear system is called bounded-input bounded-output (BIBO) stable if its output will stay bounded for any bounded input.” The inverted pendulum in this video is a good example of an unstable system:

A limited poke at the system makes it flip out of its normal boundaries. In the Cynefin model the act of “flipping out” would correspond to the domain of chaos. This is an undordered state without lack of control, a state you don´t want to stay too long in and might be hard to get out of (the inverted pendulum had to be reset manually). In software development, this would compare to some requirement initating a change in the system which will introduce one or several severe problems that puts the organization into chaos. So how can this be avoided? Control theory tells us that systems/feedback loops with these properties are hard to control and tend to be unstable:

  • Long time delays
  • Noise
  • Low observabillty of the system to be controlled

So if we in our organization can acheive short time delays within our feedback loop (test often, test early), little noise (direct and clear communciation) and high observability (good testabillty) the chance of keeping the process stable and thus avoid drifting into chaos should be significantly bigger. No revolutionary ideas here, this is what generally is preached in software development and testing. But it is rather nice to see that we have support in our claims from the field of control theory.

LET´S TEST 2015 – DAY 3

After two really intense days of Let´s Test it was time to get out of bed and muster up energy for the last day. The first session of the day was Baldvin Gislason Bern talking about his struggle to “Find purpuse in automation”. Baldvin is a former colleague of mine so I might be a bit biased but I thought he delivered a great presentation. The theme was revolving how he got assigned to different automation tasks in his organization and his struggle to understand why that automation was important, i.e. what purpose it served. I could not agree more about the importance to find out why to automate. All too often automation efforts seem to start with “we need to automate” instead of “how can we test better?”. Baldvin also managed to provoke a little by bashing the test automation pyramid, the Agile movement and also by telling us about how he told his automation team to throw away 10% of their checks. These deliberate provokations sparked a great discussion after the presentation and it felt like everybody got something to think of from this session.

bvin
Finding purpose in automation

What made me reevaluate my thoughts the most was the critique against the automation pyramid. I have always found it a useful model to apply for trying to “push tests down the stack”. I like the idea of having a few high level checks to find problems with how the different parts of the system interact, but to test the bigger set of variations on a lower level to reduce execution time, feedback loop and debugging. But I can agree with Baldvinds points that it cannot be applied in all contexts and that in some contexts comparing unit checks with higher level checks is apples and pears (my rephrasing). I guess it comes down to what checks you have control over in your context. If you work in a context where the development team is separate from the test team and the checks on different levels are divided between development and test, it is much harder to keep an overall strategy of how to distribute the checks. So I feel that the pyramid is valuable to illustrate an important principle, but that myself and many others might have been throwing it around too easily without providing the right context for it.

I have often felt that it is really important to understand the roles of people around you in your organization. Having only worked as a tester I don´t have the luxury to have gained that knowledge by first hand experience. So that was why Scott Barbers session “Experiencing product owners” appealed to me a lot. Scott, known as the “performance tester guy” had recently worked as a product owner and wanted to share his experience from that. I was surprised of how much stuff the product owner is involved in besides prioritizing the backlog for the development team and communicating with customers. It seemed like he needed to be everywhere discussing revenue, packaging, support issues, marketing etc, etc. If the normal product owner role has only half of the responsibillites that Scott´s role had, it is still baffling that they have any time for the development team at all.

po
The yellow part in the upper right corner is supporting development, the rest is other stuff.

The main purpose with the workshop was however to get to understand the perspective of the product owner so that we as testers can learn how to provide useful information to her/him. This was acheived by a couple of a simulation where we got to play different roles on a fictive team during different times of a project (just started, close to release etc). We made stuff up about our progress and Scott told us how he would have replied. As an extra dimension, Scotts answers in the beginning of the workshop simulated how he acted in when he just started as a product owner and in the end of the workshop he acted as he acted as he did after gaining more experience. I liked this workshop a lot since it did not only bring insight in to the role as a product owner, it also provided good tips on how to handle status meetings, distributed teams etc, all in Scotts very entertaining way. Favourite quote of the session:

“Done doesn´t mean done and I´m not talking about the whole ‘done’ versus ‘donedone’ thing” – Scott Barber on when things are really done for a product owner   

Now the end was near, only the final keynote before it was time to split up and go back to reality. The keynote by Tumo Untinen told us the story about the finding and publishing the famous Heartbleed bug. It was interesting to hear about how it was exposed and how the release to the general public was handled. An interesting subject but I feel that keynote would have been better if some technical details had been left out.

IMAG0305
A heartbleeding goodbye to Let´s Test 2015

As always (I get to say that now since this was my second Let´s Test) it was sad that the conference was over. Unfortunately I had to leave immediately so no time for huging and goodbyes this time. Thank you all you awesome peers who enriched my conference experience, hope to see you next year if not sooner!

LET´S TEST 2015 – DAY 2

The second day of Let´s Test kicked off with an hour of “Utilising automation tools in a CDT environment” with Guy Mason.

IMAG0293

He talked about how automation is much more than regression checks and how it can be used to assist the testers in different ways. Some examples of these were: automation of work flows, data creation and performance testing. It is indeed an important aspect of automation that is often forgotten in the pursue of the ultimate regression test suite. I had thought about this topic myself a lot recently and Guys presentation helped strengthen my beliefs on this topic.

After the short morning session it was time for a full day workshop with Michael Bolton () and Laurent Bossavit () named “Defense against the dark arts”. Having quite recently read Laurents book about “Leprechauns in software engineering” I was looking forward to dig deeper into questioning various claims that gets thrown around a lot in our business.

IMAG0295

The session started by a some background info followed by a short excercise where we were intstructed to put numbers on our gut feelings of different claims, such as “Spinache is a good iron source”, “Some developers are 10 times more effective than others” or “Hurricanes with female names are deadlier” (these are not the exact phrasings of the claims since I´m citing from the top of my head). We also noted what made us react to in certain ways to the claims. Maybe a number used was suspiciously precise or it sounded to much of a sweeping generalization to be true for all contexts. After a discussion around our notes we started digging into one claim that was chosen by the group at each table. Our group chose a claim stating something along the lines of: “Three quarters of the DoD´s budget 1995 was wasted on failed waterfall projects” (yet again not an exact citation). It turned out that it was really hard for us to find the original claim, despite all the collected googling skill in the group. It turned out that certain claims are easily twisted into new forms and meanings, and it can be really intereresting to follow a claim through the citation history to see how it has been transformed. After the exercise the groups tried to formulate the thought process on how to investigate and evaluate claims. All groups produced quite similar results with the magical “context” word appearing everywhere. This was the result from our group:

We continued by getting some pointers and tricks from Laurent and Michael on how to find potential problems with articles. For instance, searching for exact sentences that probably are quite unique might reveal other articles that are suspiciously related to the article under investigation. Also, I finding out that one of the main references is an unpublished master thesis doesn’t set off your smoke detector, I don´t know what will. Finally we got to choose a last exercise and I together with a peer tester decided to play around with the data suposedly supporting the claim that “Hurricanes with female names are deadlier”. The basic idea was that people are less scared of hurricanes with female names and would therefore be less cautious. It was fund to investigate a data set (we used Excel) in different ways to spot pontential problems. When we colored the cells for hurricanes with female names we found out that there was a period from the early 50´s to the late 70´s where all hurricane names where female (during other time periods it was roughly 50/50). The numbers of people dying from hurricanes where much higher than the total average and also much higher than the average of the “female hurricanes” from later years. This suggested that the death rate could be highly affected by the time era rather than the actual name. Investigating data sets like these is definitely something I would like to be more skilled at. Book pointers anyone?

The main takeaway from the workshop was that the research field of software engineering is in bad shape and that we as professionals working in the field have a responsibillity to try to make things better (or at least not making them worse by throwing around folklore). It also helped me in striking a balance between being a critical thinker but still somewhat open minded to new input.

For the evening session I joined Duncan Nisbet () in “Cynefin sensemaking surgery”. I had recently tried to understand the Cynefin framework but felt that I was lacking examples on how it could be applied. Duncan had done some work on his own to as he put it “shoehorn software testing into Cynefin” (very successfully in my opinion).

IMAG0299IMAG0298

After a short introduction to Cynefin we got to put down different problems we had encountered on stickies and the placing the stickies in the appropriate domain of the framework. This was followed by group discussions where we were telling stories around our stickies and the group helped out on if the stickies were put in the right domain and how they could eventually be moved clockwise to a more ordered domain. These disussions were exactly what I needed to get a better grasp of Cynefin (which might feel very abstract when you first encounter it). I recommend that you check it out if you already haven´t, it is very useful to make sense of your complex and sometimes chaotic surroundings in a software project, or in life in general.

The evening was rounded off in the test lab (a place I missed last year and had on my to-do list for this year). I paired up with another tester and we made some exploration of a new planning tool for complex projects. It was fun to make some hands-on testing and we managed to observe some important problems during our short session.

IMAG0300
The Test Lab at night

The product feels far from ready in its current state but did show some promise for the future. It was a great thing though to test an actual product for someone appreciating our services. Bonus points awarded for having the developers of the product available on instant messaging.

LET´S TEST 2015 – DAY 1

So it´s that time of the year when the Europe is hosting its big song contest and also its great test conferences: Let´s Test and Nordic Testing Days. I currently have the pleasure to attend the former for the second year in a row. I also decided to try to blog about it every day like I did last year. Partly due demand and partly because I like to summarize and write stuff down before I forget it. There is so much input during conferences like these so it will help me remember more of what I´ve learned and experienced. The extra spelling and grammar errors that comes with this way of working late in the evening will have to be considered a part of the authentic experience.

The first day of kicked off with the traditional AC/DC-intro followed by a brief introduction and then a keynote by Ben Simo ().

003

Ben talked about his experience investigation problems with the controversial healthcare.gov webpage and the attention it got both on social media and in traditional media. Although the start was a bit slow, the presentation really took off when he started discussion the problems he had encountered, and how the feedback was received. The whole keynote was a great reminder that we as testers can make a difference, even if all problems were not fixed in this case. Ben also listed a bunch of heuristics and mnemonics he used during his testing, including both the OWASP top security threats as well as ethical considerations. It really displayed the range of dimensions we as testers need to keep in mind while doing what we do.

After the the keynote it was time for the first workshop. I chose to attend “Automation in testing workshop” hosted by Richard Bradshaw (). I´ve been working with automation quite a lot lately and was hoping to get some good pointers on automation in general as well as getting a tool to communicate the challenges of automation with non-technical people, I was not disappointed. The idea of the workshop was to assemble Duplo-pieces according to given requirements (a picture of the finished assembly). Initially we were split up in pairs where one person was the tester and the other one was the automation. The tester had to provide instructions to the automation and the automation was obliged to follow these instructions literally, without any own sapient reasoning. I got to play the automation at first and it turned out to be quite a difficult task to turn the brain off, blindly following instructions. The first assembly went well though, with a combination of luck (there were different shades of green, but the first piece I picked happened to be the right kind) and semi-sapient automation (I knew somehow at which tables the right pieces were). When we switched roles and the tester tried to follow my written down instructions, things did not really work out:

007

I think my testing partner did a better job of simulation a computer since he was able to follow the instructions correctly and get the result above. It was also interesting to see the failures and successes at the other tables, the approaches and results was quite mixed. As the session went on we got new constructions to make and was able to make abstractions in our automation implementation that provided possibillities to assemble different products only swithing a list of colors instead of changing all the code. In between the different excersizes we had brief discussions on the topics of what automation is, how do we know that it does what is supposed to and how can we make it more reusable. Dividing the automation framework into different components was a key point of the workshop, not only to make the solution more maintainable, but also to be able to reuse different parts to help the testers test. After all, automation is so much more than automated regression checks. This was illustrated by the last constructions we did where we were allowed to combine our human brains with automation assistance and was able to assemble constructions like these:

010

All in all a great workshop, if you are attending Nordic Testing Days, you should definitely check it out.

After lunch I attended the “Bad idea, bad idea…..good idea!” workshop with Paul Holland (). The workshops theme was brainstorming; how can we make it effective? The whole worskhop was one more or less controlled experiment where the parameters group size, constraints and external stimulis were variated between the different sessions and the results were evaluated. I had great fun during this workshop, Paul is a funny guy to start with and the different topics like “Super villain names” and “Things that have holes” made sure to keep the mood high. That was also one of the takeaways from the workshop, that humour is needed to keep the energy level up. Especially when you have four brain storming sessions in a row like we had. Some other results from the workshop were that:

  • Bad ideas can trigger good ideas if they are focused on the goal,
  • Keep the group size small since it is difficult to manage large groups 
  • The scribe was not able to be creative, so the most creative person in the group should probably not be take note all the time

011012

We also discussed the impact of different personalities and the conclusion was that it probably was quite high, although it is not an easily controlled parameter in a session like this. The idea of getting a mix of own time to think and sharing ideas with each other resonated well with me though. Also, a beer or two might be appropriate at some points:-)

After dinner it was time for yet another session, no time is wasted on Let´s test indeed. I attended ” A tester´s walk in the park” with Illari Henrik Aegerter (). I did not know what to expect but I thought that at least I would get some fresh air. Inspired from the old greek philosophers peripatetic school of thinking the group took a walk in the park discussing different topics collected from the old “Tree of questions” (which turned out to be more of a bush really). 

027
To explore more or to return and report, that is the question.

The format was really enjoyable and walking around disscussing definitely has its advantages such as: natural flow between different conversations and groups, more comfortable silences and easier to listen to what the person is actually saying since you don´t have to look them in the eyes all the time (the last one is especially true for introverts I would think). I really enjoyed this session due to its relaxed format. It felt like a natural part of the Let´s test experience with some additional setup that sparked interesting conversations. Bonus points for Illaris outfit and the beverage that was provided in the end for thirsty peripatetics.

028
Cows doing some beta testing (joke only available in Swedish)

The evening ended with and some interesting discussions on schools of testing and exploratory testing. More of that tomorrow please.

Over and out!

10 years is not the same as 10 years

Let´s say you are recruiting for a position as software tester on your team. You have two final candidates that are very similar in competence.  They both have 8 years of experience within software testing, the only difference is how it is distributed:

  • Person A has worked 8 years within the same company. A company that is within the same business domain as your company.
  • Person B has worked for 4 different companies, approximately 2 year on each. None of the companies where in the same business domain as your company.

From a pure experience point of view, who would you choose?.

For me, it would come down to what type of tester I needed for my team. If the team was already full of ideas and initiatives on how we could change our ways of working, but lacking of domain knowledge person A would be my pick. But in the opposite situation where the need for fresh thoughts and ideas was needed, person B would get the nod.

Sounds simple enough but it gets more complex than that. Person A has lived through the same organization for a long time and has probably experienced a lot of re-organizations, improvement initiatives etc. This person would probably have learned the hard way what worked or not. Person B might have been involved in the startup of many new initiatives and improvements, but might not have stayed long enough to actually find out which ideas that worked and which did not. On the other hand, just because some initiatives did not work in person A´s workplace, they might just work fine in another environment. Something that person A might not acknowledge.

So you can argue back and forward like this forever trying to find the optimal hire for your team. But basically what I´m trying to say with this post is:

  1. Number of years of experience only tells us a small part of the story. My 10 years is not the same as your 10 years.
  2. What the person has experienced and how the person handled it is much more important than any quantative measure of experience.
  3. Context matters, as usual. What does your team need right now?

 

 

State of Testing survey 2015

The State of Testing survey is back. I liked the idea of the first survey when it hit my Twitter feed a year ago. It offered a chance to provice some kind of state report of the current state of the testing business. This year, I look forward to the results even more since there is something to compare with (i.e. last years´ survey). It is always hard to judge the statistical significance of these kinds of surveys since the samples are not chosen at random. However, the more years this will keep on going, hopefully the participation will increase more and more over time, and we will be able to draw more certain conclusions from it. Also, it will be interesting to monitor the trends over time as our business changes. So, why not take part and make the survey a little bit better than it would have been without your participation?

http://qablog.practitest.com/state-of-testing/ 

Balanced automation through Toblerone

Disclaimer: This is not a suggestion of a best practice. It´s a heuristic that may be useful in the right context.

Striking the balance between different levels of checks can be tricky when working with automated regression checks. On the lowest level we have the unit check which checks small portions of the code, and on the higher levels the system checks and system integration checks that performs checks on big pieces of the whole system at once. It might be tempting to always go for the higher level checks since they “check that the whole system works”. However, these kind of checks are often brittle and requires a lot of maintenance. Unit checks on the other hand are small and lighweight and fairly easy to maintain. They usually don´t require that much of a complex test environment either. So how can I strike a good balance between different levels of checks? By letting those small and cheap unit checks excersize most of the variations and create a few high level tests that excersizes the main flows of the system. In this way you won´t get overwhelmed by maintaining all your precious checks. There is an existing model called the Test automation pyramid that suggests exactly this. It comes in different flavours with different names on the layers, but the general idea is the same. Read more about it here.

One variant of the test automation pyramid

This model tells us something about the relation of the amount of checks between the different levels but not how often they are run. Since high level checks often take longer time to run and often require a more complex test environment, it is probably not a good idea to try to run them as often as the unit checks. If trying to do so, it will probably either make you run your unit checks too seldom or your high level checks too often. How can a check be run too often then? Well, if the checks require a complex test environment you might have to invest in parallel environments which could be very expensive and not worth it. If you have some kind of criteria connected to the checks so that nothing can be committed, you might have created a monster of a slow development process which will frustrate everybody. Also, it takes time to investigate the results of the checks so this might leave you with a big pile of results to investigate and it may be hard to keep up.

So, it might be a good idea to run the unit checks more often and the high level checks a bit more seldom, for instance during a nightly build or over the weekend even. In short we have a heuristic that tells us that  “the higher level of check, the more seldom it will be run”. If we want to illustrate this heuristic we can imagine stretching out the test automation pyramid along a time axis, creating a prism. Then imagine inserting holes in the prism illustrating a time period where a check is not run. Now, given the aforementioned heuristic, we will have the biggest holes on the top and no holes in the bottom. So what do we end up with? Something looking like the (delicious) chocolate bar Toblerone.

The Toblerone shape

I find the Toblerone model useful when thinking about balanced automated regression checking, I hope someone else does too. The biggest drawback is the craving for chocolate it always brings.

Follow

Get every new post delivered to your Inbox.