The best fault finding tool you'll ever need!

Pictured above is the most powerful fault finding tool you'll ever need - one just needs to put it to good use. When you've installed a system (in fact it could be anything) and it doesn't work, here are some general skills you can use to find the problem.

There are two way to find the cause of a problem: through knowledge and experience or by means of considering the changes and distinctions. You would rely on the latter if there is insufficient data or you don't have the former at your disposal.

First and foremost, you need to be able to describe the problem in terms of an object and a deviation. In other words, what has changed (or what doesn't work) in/on what? This makes life easier when communication with others. For example "center light bulb on panel A" makes more sense than "light not working".

When thinking about the changes and distinctions, there are a number of questions you need to ask. These questions and their answers are best listed in a table:


..is
..is not
What?


Where?


When?


Extent?




So, the questions you need to ask are:

What is the symptom?

What other potential symptoms could we expect to see but don't?

Where is the symptom occurring (both geographically and within the components of the system)?

Where else could the symptom occur but it isn't occurring?

When (i.e. on which date and at what time) was the system installed and when (date and time) was the symptom first seen?

Does the symptom also occur at certain times and was the symptom present and then dissapeared (i.e . comes and goes at certain times or in some form of a pattern)?

If one is dealing with widgets or like-elements of a system, think also about the extent: in other words what is the percentage of failure and what is the rate and trend of failures?

Remember you're not trying to solve the problem here, you're only thinking about the symptoms alone (i.e. gathering data).

Once you've considered the above questions, you would come up with a list of possible root causes. Each of these root causes need to be tested against each of the above questions in the table. If a potential root cause does not explain the data you've written down in the table, it can potentially be eliminated as a root cause.
If you are familiar with Fishbone diagrams and 5-Why's (like I do), you'll find the above extremely useful and complimentary.

You could possibly end up having two or more potential root causes. If this happens the only way to confirm which is right would be to test both (or all). This may not always be practical and here is when you may need to call in some expert help.

What I've described above is nothing new. It is taught to IT professionals as part of ITIL and was originally developed by Dr. Charles Kepner and Dr. Benjamin Tregoe - more details here. I would also suggest reading their book The Rational Manager which is in my must-read list.

Lastly, let me quote my more famous (yet fictional) namesake "when you have eliminated the impossible, whatever remains, however improbable, must be the truth". 


Image Credit: Patrick J. Lynch, medical illustrator (Patrick J. Lynch, medical illustrator) [CC BY 2.5 (http://creativecommons.org/licenses/by/2.5)], via Wikimedia Commons.

No comments

Powered by Blogger.