Breaking out is hard to do

It’s easy to get stuck in a way of thinking and be unable to break out of it, especially when you get so drawn into a problem as to be emotional rather than impassive. You can be drawn in too close to an issue and get so focused on a narrow set of possible but flawed solutions that you miss the real solution that’s just in your peripheral vision, and all that it takes to see that real solution is a step back.

I use a memory of one such occasion as a reminder when a big problem arises, in the hope that remembering the lesson I learned those years ago will be employed by my modern self before it’s too late.

I was nearly a year out of university. I was working in a production facility with automation equipment – heavy-ish machinery making and packaging things that was struggling with yield issues. We had a radical idea on how to improve some of those issues but no office-hours to actually make the change – this being a start-up we did most of it in the evenings and tested the system in our spare time when the production line was shut down for the night (or weekend). The change was invasive – it involved splicing a PC in between an OEM vision system and a PLC in charge of making things move. We designed our modification so that we could unhook everything within a couple of hours, which made doing comparative studies easy and gave us a safety net in case the line needed to be started back up at short notice while we were testing.

Our first version was a bit of a kludge but tested well and was stable enough for production runs so we implemented it one weekend and ran one day’s equivalent production volume to make sure it held up. It did – yield was 100%. I went home happy, if exhausted as the previous two months of night-time and weekend development had done nothing for my health.

I woke up on Monday morning at 4am having had nightmares about the production line. The process statistics were monitored at the board level, and as it was a small firm we’d essentially taken ownership of the project (and therefore responsibility for its outcome). Even though I knew that the new system worked, the previous runs were freebies – test runs the statistics of which weren’t reflected at board level. Now that the statistics were real it took on a different feel, a new pressure that hadn’t existed before.

I went to work at 6am and sat by my phone waiting nervously for a call signalling production disaster. I knew that my fear was irrational – I kept looking back over test logs, statistics, charts and meeting minutes and was re-assured that all signs pointed to an anticlimactic morning. By 9am it hadn’t rung and our engineering team meeting started. At 10am I returned to my desk and a new email.

“Can you give production a call urgently please?”

My heart sank – I phoned production for an update.

“The process isn’t working – almost every part we put in is coming out as a reject”.

Worse still they’d had this problem since 8am and had now performed two hours of running with a yield in the low-teens, and no parts passing the process for the past hour. This was madness – even if I fixed the issue right now the yield figures for the day, week and even month were shot all to hell with everyone watching.

I went over and watched the process running and failing, time after time. We couldn’t understand it – I couldn’t think of a single failure mode of the new system that would cause this output. I was sleep deprived, emotionally drained from weeks of late nights and early mornings and suddenly very self-conscious with a room of production staff looking at me for answers. My head ran through ever more unlikely scenarios and eliminated them one by one, each time getting a step closer to having to shut down the line and remove the new system. I was running out of ideas. Then I heard one of the operators say:

“What’s this washer doing over here?”

A washer had been knocked off during morning cleaning of the equipment, and without it the system was doomed to keep producing reject parts. It all clicked into place, and it should have been obvious to an impassive, rational observer from the outset – as I was neither of these kinds of observer and running on misfiring instinct I was clutching at all the wrong straws. The washer was replaced, and the system turned in a respectably high yield from that point on.

My troubleshooting had become myopic and I’d not noticed. Having seen a change to their environment the production staff experienced the same. Our minds had taken a mental short-cut; post-hoc ergo propter hoc – after it therefore because of it. We hadn’t considered that the problems on day 2 might be unrelated to the actions on day 1. Everyone had skipped the first page of the story and unconsciously incorporated the premise ‘yesterday’s change caused this’ into our problem model – the set of solutions that didn’t rely on yesterday’s exploits were shrouded from us and none of us had noticed. We were asking the question:

“How could the move from old to new system cause this sort of output?”

We should have been asking the question:

“What might cause this sort of output?”

From that point on whenever a seemingly-large issue comes up I inadvertently think back to that episode, where a room full of people were flummoxed by a missing washer and remember two things; that all problems should be approached impassively, and that sometimes the best route forward starts by taking a step back.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.