I’ve been interested in the Cynefin framework for a while and have used it to frame some of my own thoughts, but had not used it in any practical sense before.
(I’m not going to go into what Cynefin is here as there is already a lot of material that will explain it much better than I can, such as this video from Dave Snowdon)
Following a day spent with Mike Burrows learning about AgendaShift facilitation techniques involving Cynefin, I tried to tweak one of the exercises we did that day for use in a regular sprint retrospective. Mike was also keen to credit Karl Scotland for the ideas, and of course Dave Snowdon.
I tried this out for the first time recently, this is an overview of how it worked out and what I learned along the way.
Firstly, I set out a whiteboard as per the Cynefin model, with some large sticky notes explaining the domains, leaving plenty of space in each domain to stick regular sized stickies for data gathered from the team.
The retro then worked in the classic gather data, generate insights, decide what to do stages.
We split the team up into small groups – threes worked well. Working individually first, the team thought of sources of frustration or anything that could be improved, not constrained by the last sprint, just anything they could think of, and wrote them down on a sticky note – 1 sticky per item.
Then in the groups, we had discussions to prioritise the top three issues in each group.
Once we had completed that, we went through the board and how we were going to use it.
Each of the sections on the board represents a domain of complexity, within which the issues we identified would fit. Each group decided where they want to place their stickies.
If the issue had an obvious solution that could be quickly agreed, it was placed in the simple domain. If an expert, such as a senior dev/tester or the product owner would be best suited to own the action and come up with a solution, it was placed in complicated. If there are lots of possible solutions and no one right answer, it fitted in complex. And lastly, if we could not think of any possible solutions, we placed it in Chaos.
Each group discussed and brought their sticky notes up to the board to place in the domain they felt it was best suited to. If we could not decide, we would place it in the middle of the board, in disorder, although we didn’t have any of those.
Once we placed all the stickies on the board, we walked through them to discuss what they are and why they fit the domain we placed them in. At this point we were not diving deep into solutions, just explaining the issues and why we put them where we put them.
We explored each with some probing questions to facilitate understanding and see if the team felt the issue was in the right place, if we'd had anything in disorder we could have tried to re-place it at this point.
Decide what to do
Once we all had a good idea about what each item on the board was and why we placed it where we did, back in groups of three, the team talked about some candidate actions to take away.
However, where the issue was on the board determined what type of action we thought about taking.
For anything in simple, we already knew what to do.
For anything in complicated, if the appropriate expert was in the team, that person could come up with an action or take it away to analyse. Otherwise we would have to identify an expert somewhere in the business to help us.
For complex issues, this is where things got interesting, we came up with some candidate experiments.
Each action was written on a sticky and placed at the bottom of the board.
Depending on the number of candidate actions and experiments we could have done some dot voting at this point, but we had a manageable number so just walked through them all.
Thoughts for next time
When explaining the complexity domains, thinking up some examples and where they might fit helped understanding, although next time i might think more about this upfront and have some clear and concise examples ready to go.
I really liked this retro for helping a team make decisions on how to proceed. It helped us make best use of our experts for anything in the complicated domain, which can be a big source of frustration and dysfunction in autonomous teams if people are not getting a chance to use their skills due to group dynamics.
It was great to generate lots of candidate experiments and decide what to try. Deciding on candidate experiments is a part of the retro that could be developed quite a lot next time, perhaps by introducing some A3s (as we did on the day with Mike) to frame the experiment as a hypothesis, think about who it effects, what the risks and issues might be, how we track that it is being implemented and how we make it safe to fail.
However an hour’s retro wasn’t really long enough to get into doing A3s, I think we would need 90 mins minimum, probably 2 hours to do that justice.
Overall I can see the power of running this exercise in a wider context to facilitate a larger organisational change, which is more in line with the AgendaShift exercise.
Lastly Dave Snowden describes Cynefin as a sense making framework rather than a categorisation framework, i'm unsure if we were sense making or categorising, however I think the exercise was valuable either way.
In a previous post I talked about how work in progress limits act as a safety net, stopping queues forming. In this post I present data from when this happened in a software delivery team I worked with.
Imagine you were measuring journey time between Leeds and Sheffield. If we assumed 50% of space on the motorway was taken up by cars, the motorway might feel busy, but traffic should flow and the journey take around an hour. Now imagine the motorway space is 95% taken up by cars. The traffic would be crawling. The journey could now be 3 hours, an increase of 200%.
The same thing occurs with lead times for software delivery teams. If the team is over utilised, new work can’t enter the system and must queue, causing increases in lead times. The higher the utilisation, the longer the queues, and the longer lead times get, illustrated below.
Our team hired new people in February and more developers joined than testers. The new developers were keen to show what they could do and enthusiastically picked up work, exceeding work in progress limits. Seeing these queues visualised on the Kanban board, the rest of the team agreed to not pull more work until levels of WIP returned to normal.
The following chart shows the number of work items queued waiting for the next stage in the workflow, the different colours representing the different queues we have, such as waiting for a code review, testing or deployment. There is a spike in queue sizes around the end of February as work in progress limits were exceeded, which returns to normal as the team took action.
Now compare average lead times during this period, shown by the blue line in the chart below. Around the middle of February, the team had a healthy lead time average of 5 days. This rises to 15 days by the end of February, before returning to 5 days after the team start to manage levels of work in progress again.
The system is behaving just like the motorway in the analogy. The journey time for a single work item increased by 200% due to the queues in the system. This then returns to normal once the queues reduce.
However, the most important thing is the overall effect on throughput, shown in the chart below. To understand this effect, imagine how many cars are actually managing to arrive in Sheffield in both scenarios in the motorway analogy.
Once we got into March the team were managing work in progress again. We got more done and were benefiting from extra people in the team. Whereas at the end of February, more people had meant more in progress and we actually got less overall output!
In a previous post I talked about how effective limiting work in progress had been for one of the software delivery teams I worked with. The team’s discipline to stick to the limits didn’t come overnight, it was developed through experience and learning.
People tend to understand the logic of limiting work in progress on an individual basis. Once we get to a team or organisation, things are more complex. Like most things in life, limiting WIP has a benefit and a cost, it’s important to understand both in different scenarios.
Here are some scenarios where I have seen teams call work in progress limits into question.
The Urgent Bug
The team stop to debate the limit rather than getting on with fixing the bug.
The cost of temporarily exceeding the limit is low, while the cost of not fixing an urgent bug is high. This is a simple economic decision. Once you have decided urgent bugs can exceed the limit, it’s sensible to set a policy so that next time, the team just get on and fix it.
If you are facing this frequently, there’s a high cost to exceeding the limit with too many bug fixes at once. Defining a class of service to handle urgent work with it’s own limit can allow you to manage this. Creating a separate horizontal lane, or ‘swim lane’ on the board is one way to help to manage urgent work that needs to be expedited. (We set our expedite limit to 1).
If enough work becomes stuck, staying within the WIP limit stops the system. The limit appears to be slowing the team down, there is other available work they could be getting on with.
However, the limit should force the team to stop and evaluate what’s going on. Why is so much work being blocked? If you can identify and permanently fix whatever is causing the problem, work will flow more smoothly from then on. You removed a source of delay, that delivers benefits for the delivery of every future work item.
Furthermore, what about when blocked items become unblocked? If you ignored the limit, suddenly you have lots of context switching back to old work. You may have to remember what you were doing before it got blocked!
If you can stick to the limits and fix the problems, the benefits of the limit should far outweigh the cost.
Pushing rather than Pulling
The MD needs something urgently. She goes directly to her favorite, most trusted developer. There’s a personal cost to the WIP limit, no one likes to say no, especially to senior people who they developed a relationship with.
Teams would not be blamed for choosing to exceed the limit to not upset anyone. However, hopefully they recognise they exceeded the limit and have a conversation about how to handle this situation in future.
Classes of service can help in a similar way to the urgent bug. You could measure lead times for that class of service and set a service level agreement with the MD, based on what the data tells you, giving the MD confidence and predictability.
A Natural Bottleneck
The team has 6 developers and 1 tester. The developers are getting through work quickly, then they hit the limit, as a queue of testing has formed. The team are stood around debating WIP limits again, surely they should just get on with starting the next piece of work?
Starting more work won’t result in that work being delivered faster, the system is limited by the speed that work can be tested. Starting more work will just add to the test queue, slowing work down further, just like cars queuing at roadworks on a motorway, the more cars, the longer the queue. The limit is acting as a safety net stopping that from happening.
The limit should force the team to stop, identify the bottleneck and find ways to alleviate it.
Often when the team point the finger of blame at WIP limits, they are thinking about the local effects on them, rather than the economics of the system as a whole.
It is said that the flutter of a butterfly's wing can ultimately cause a typhoon halfway around the world. In other words, a small change in a complex system can have dramatic effects. This is true of software delivery teams looking to improve flow, as I was reminded last year working with a new team and introducing just a few of the ideas popularised by Kanban and David Anderson.
The team’s natural state was to pull in too much work, context switching and queues started to develop. We visualised this on a board and could see the queues and avatars spread across the work. The ability to see this presented the opportunity to talk about how to improve.
We agreed to experiment with work in progress limits, starting by limiting the number of tickets to the number of developers that we had. We also talked about breaking down work further. The team were using relative sizing, so we agreed a maximum story point size, if a ticket was too big, we had to slice it up into smaller chunks. Lastly, we talked about context switching, and limited the number of avatars people were allowed to use. We captured all of these agreements next to our board, and validated the effects by looking at data before and after the change.
The following chart was built by exporting data from our work tracking system into excel. We recorded where work was in progress and where it was queuing, and visualised it. The coloured areas represent the average time where work was taking place at the different stages in our workflow, the white space is the average time the work was queuing before each stage. This is then laid out in the order that those stages occur. This allows us to both calculate and visualise flow efficiency and show it in relation to cycle time, which is shown in days.
This is what it looked like the first time we measured it. Our flow efficiency was 29%, meaning 71% of the time a ticket was on the board, it was sat waiting in a queue. We could also see where work was spending the most time queuing. On Average, it was taking us about 13 days to get a ticket from initial analysis to live and delivering value.
After the change, we saw some dramatic changes in the data. Cycle time to live had reduced from 13 days to just over 6 days. Our tickets were smaller, so you would expect cycle time to reduce, but the key is the increase in flow efficiency from 29% to 43%. The reason for this is reduction in queuing, which is clearly visualised in the chart.
When we looked at this on a cumulative flow diagram, the effect of that increase was clear. The black area on the CFD is the count of tickets that we had got done and the coloured areas are the amount of work we had in progress at any time, the different colours representing a different stage in our workflow.
The changes we made did take a little while to take effect, as the tickets we already had in progress from before the change gradually made their way across the board to done, but eventually the effect becomes clear. When we measured throughput at the end of August, we could see that it had improved by roughly 150%, far more than a linear change that would have resulted from the reduction in the size of the tickets.
We started publishing these results around our office, pretty soon other teams were asking us what we did, and slowly the same principles began to cross pollinate around other teams.
As we learned more, we tweaked the policies about WIP and ticket sizes, reducing them further. We introduced more ideas from Kanban, such as classes of service and understanding lead time distribution.
All of this started from a short discussion about the way we work and agreement to try some changes. No one needed to work harder, no one needed to work over time, just some simple thoughts about how to improve our system put into practice.
The Agile Apprentice