📘 The field guide of understanding 'human errors'

This is the raw version of the article. To get the rich content, get to my Remanso space.

Sidney Dekker - 2014

There is no root cause to a problem. Just as there is no root cause to a success. It is a set of circumstances and factors that contribute to an event.

Causes are constructed post-mortem. We will even find different causes depending on the profession, context, and time.

What we call root cause is simply the cause at which we stopped the investigation. But this investigation remains essential, and the goal of a retrospective is to dig deeper into what could have caused a human error. Rather than calling it a root cause, we can call it an interesting cause.

Preface

Old view: 'human errors' are the cause of all our problems.

[!NOTE] How often do I see a client request trying to fix the human? This old view has the merit of being the primal instinct that lies dormant in us: "Ugh, what did he do now?". When we could just as quickly identify everything missing in the system to prevent it from happening. But it's always too big a project to undertake.

In Theory Z, this old view corresponds to Theory X, which holds that people cannot be trusted because they are naturally lazy, and the only way to be productive is to control them.

New View: 'human errors' are the consequence of a systemic problem.

Make sure you ask all the questions that need asking. And then ask some more.

Changing the environment changes people's behaviors.

[!NOTE] What questions do we have?

The question: "Really? Is this really this person's fault?" — as long as we haven't found a systemic problem — is ideal. The idea that we need to review our safety management procedures is always the obvious solution.

Old View	New View
'human error' is the cause of trouble	What we call 'human error' is a symptom of deeper trouble
'human error' is a separate category of behavior, to feared and fought	'human error' is an attribution, a judgment that we make after the fact

The quotation marks around 'human error'

It is no more than a label.

It is a judgment, an attribution we make after the fact.

Attributing something to human error is the beginning of the investigation, not the end. This is what can be exciting: making sure the thinking shifts to the structures.

A focus on 'human error' makes you do all the wrong things

Thinking about 'human errors' is self-feeding, in the sense that there is always a new story convincing enough to say that "it was indeed the person's lack of awareness." Worse, focusing on human errors does not allow for improvement of the rest of the system.

Finding a standard is necessary to distinguish what is a 'human error' from what is not. The problem is that this standard is easily identifiable after the fact — once the retrospection is done. If the outcome had been different, the erroneous actions would have been invisible.

[!NOTE] Complexity is always easier to reduce to a 'human error'.

If an organization has investigative and disciplinary actions carried out by the same people, that is a good indicator of an "Old View" company.

Two views of 'Human Error'

Failures are always surprises: when Rémy does his problem-solving training and already shows that you need to be specific at the micro level to help people; the reflex was the same — proposing solutions that have already worked before: "film beforehand just in case", "test first to be sure". 2/3 of problems are said to be due to 'human errors', by the way.

Bad people in safe systems, or well-intentioned people in imperfect systems?

"You need to be more careful" is the phrase that keeps coming up.
"Taking that shortcut was obviously a bad decision": shortcuts are taken all the time, because we're pursuing multiple objectives at once (SQDCE). Some work and are thus invisible; others, here, become obvious.

Instead of asking who is responsible, ask what is responsible for the outcome.

People do not come to work to do a bad job

People mainly do what seemed to make sense to them at the moment the problem occurred. Is the new way of seeing things too charitable?

If numbers can speak for themselves with a Pareto chart of the people causing the most errors, we would want to challenge that graph by looking at what exposures those people have had. Exposure to risky situations is also highly unequal. It therefore becomes difficult to compare situations with one another.

We will rather look at the incompatibility between tasks and skills.

Individual competence in aviation is considered so critical that it must not be viewed outside the responsibility of the entire system.

The authority - responsibility mismatch

Facing an eternal conflict of objectives: "safety, regulations, speed, fuel, cost."

There is no evidence that a system approach dilutes personal accountability. In fact, second victims show just how much responsibility practitioners take for things that go wrong.

A new form of accountability is possible: letting people tell their stories rather than putting them in a defensive posture.

Investigation title:

!Error in the refactor

Identified cause:

!Bad refactor rather than bad code representation

Chapter 3 - Doing a 'human error' investigation

This chapter teaches us how to conduct an investigation in different ways, that is, a problem-solving process.

Working on the work environment requires the ability to change the system: allowing a person access to certain software, being able to add or remove someone from a meeting.

People's memory is fallible. Being able to put yourself in people's shoes at each decision juncture seems impossible to achieve in software development. We can aim to capture elements as they were, such as the design and user stories, which are already very good investigation inputs.

Which cues were observed (what did he or she notice/see or did not notice what he or she had expected to notice?)
What knowledge was used to deal with the situation? Did participants have any experience with similar situations that was useful in dealing with this one?
What expectations did participants have about how things were going to develop, and what options did they think they have to influence the course of events?
How did other influences (operational or organizational) help determine how they interpreted the situation and how they would act?

Representing interpretations, errors, knowing past experiences. → What he does is for investigations lasting several months / years; it's not replicable for us, but these are interesting lenses for reflection during problem-solving.

The debrief is an essential element in understanding the initial state of a situation.

Recording, facts and analysis

Succeeding in distinguishing facts from analyses and therefore from interpretations: Genshi Gembutsu.

The example of the accident where one documents the radial distance of the aircraft from the control tower as not being the right lens argues that depending on the angle of view, analyses will not be equally relevant. This is where the importance of experiences and especially theories matters in finding what is relevant or not.

Logs and their timestamps are extremely important when debugging in production, and therefore the monitoring system must be clear.

The notion of resolution is interesting: to what degree are timelines precise? We rarely ask that question.

In the transcription data from pilots, the investigator also analyzes who speaks the most, who holds a management position over which part.

The low-resolution timeline only shows the timeline when there is communication.

!Low resolution

!High resolution 1

!High resolution 2

Like MIFA, it shows more complexity.

Why does P1 raise the issue again at 15:20:23? This could be surprise (why hasn't it been done yet?). P1 might believe that P2 needs coaching or monitoring in that task. As about 10 seconds go by, and P2 is (presumably) making the setting, P1 could be supervising what P2 is doing (15:20:32) and thus not watching other things in the monitored process.

This suggestion, in turn, might feel superfluous for P2, s/he was already "planning to" do it. Yet the coaching continues, in finer detail now. Until P2 breaks it off while P2 is still talking, by saying "Okay it's set," leaving the "it" unspecified. Rather than a communication loop not being fully closed, this excerpt from P2 could serve a message about roles and the dislike of how they are being played out. "Okay it's set" may mean "Leave me alone now."

!fine-grained resolution

How do you identify "events" in your data? - The "events" that were no events

[!NOTE] Crucial question to avoid hindsight bias.

Some events are only events in retrospect: a missed opportunity, for example.

The question to ask is "Why, in their context, did this decision make sense?"

!Facts vs conclusion

This overly abrupt conclusion demands more specificity (5W, 1H): define what "Crew Resource Management" is and define the criteria for when they lost it, then locate the evidence for this in the facts that form the sequence of events.

Here, we will therefore arrive at a more precise definition of what happened:

Misunderstanding of the cause of the problem,
no shared objectives,
corrective actions not coordinated,

due to:

overlapping discussions,
no responses,
unnecessary repair.

There is no "root cause"

Explanatory versus change factors

The difference is that:

explanatory factors which explain the data from one particular sequence of events;
change factors that are levers for improvement or prevention.

What explains an event is not necessarily what will help manage