The Agile Apprentice
In a previous post I talked about how work in progress limits act as a safety net, stopping queues forming. In this post I present data from when this happened in a software delivery team I worked with.
Imagine you were measuring journey time between Leeds and Sheffield. If we assumed 50% of space on the motorway was taken up by cars, the motorway might feel busy, but traffic should flow and the journey take around an hour. Now imagine the motorway space is 95% taken up by cars. The traffic would be crawling. The journey could now be 3 hours, an increase of 200%.
The same thing occurs with lead times for software delivery teams. If the team is over utilised, new work can’t enter the system and must queue, causing increases in lead times. The higher the utilisation, the longer the queues, and the longer lead times get, illustrated below.
Our team hired new people in February and more developers joined than testers. The new developers were keen to show what they could do and enthusiastically picked up work, exceeding work in progress limits. Seeing these queues visualised on the Kanban board, the rest of the team agreed to not pull more work until levels of WIP returned to normal.
The following chart shows the number of work items queued waiting for the next stage in the workflow, the different colours representing the different queues we have, such as waiting for a code review, testing or deployment. There is a spike in queue sizes around the end of February as work in progress limits were exceeded, which returns to normal as the team took action.
Now compare average lead times during this period, shown by the blue line in the chart below. Around the middle of February, the team had a healthy lead time average of 5 days. This rises to 15 days by the end of February, before returning to 5 days after the team start to manage levels of work in progress again.
The system is behaving just like the motorway in the analogy. The journey time for a single work item increased by 200% due to the queues in the system. This then returns to normal once the queues reduce.
However, the most important thing is the overall effect on throughput, shown in the chart below. To understand this effect, imagine how many cars are actually managing to arrive in Sheffield in both scenarios in the motorway analogy.
Once we got into March the team were managing work in progress again. We got more done and were benefiting from extra people in the team. Whereas at the end of February, more people had meant more in progress and we actually got less overall output!