ITSM

The Best Way to Prevent Incidents

6 min read

October 22nd, 2019

3001 views

Prevent IT incidents

Organizations that put time and effort into problem management get a huge return on their investment. Although fixing incidents when they happen is important, it’s much better to stop them happening in the first place; and if you can’t do that, then at least make sure you know what you can do to minimize the impact of future incidents.

ITIL (the world’s leading best practice for IT service management) says that the purpose of problem management is “to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents, and managing workarounds and known errors.”

What are the phases of problem management?

According to ITIL 4 (the latest release of ITIL, published in February 2019), problem management has three phases

Problem identification – which identifies and logs problems
Problem control – which analyzes problems and develops workarounds
Error control – which monitors and improves workarounds, and resolves problems if this looks cost effective

Although fixing incidents when they happen is important, it’s much better to stop them happening in the first place – @StuartRance Share on X

How do most organizations identify problems?

Most organizations that I’ve worked with use two methods to identify problems

There’s been a major incident, and the organization needs to understand the underlying causes to ensure the same thing doesn’t happen again. The major incident management process focusses on resolving the incident and restoring normal operations, and then problem management kicks in to analyze what happened and what needs to be done next.
There’s been lots of similar incidents. Each of them has been investigated and closed, but they may recur and are causing significant cumulative impact on customers, or on the service provider organization. This cluster of similar incidents is usually identified by trend analysis of incident records, or by good service desk staff recognizing that something similar has happened before. Problem management activity is needed to identify the underlying cause of the incidents and decide how to prevent them in future, or at least reduce their impact.

The trouble with these approaches is that identification comes too late. Problem management activity after incidents have happened is important, as it can help to reduce the impact of future incidents. But it’s much better for everyone if the problem can be identified before it causes any incidents instead of after it’s had a significant impact on the organization.

When’s the best time to identify a problem?

Every incident causes a loss of productivity for one or more users, and requires effort from the service provider organization. If you can identify problems before they cause incidents, then you can provide much better service to your users, and you might even reduce your own costs! This is clearly good for everyone, but it requires some planning and effort.

It’s much better for everyone if the problem can be identified before it causes any incidents instead of after it’s had a significant impact on the organization – @StuartRance Share on X

How to identify problems that haven’t yet caused incidents

So, how can you identify problems without waiting for them to cause incidents first? What activities, processes, or practices, can result in problems being logged, analyzed, and resolved before they cause lost productivity and increased costs? Here are some practical steps you can take.

Here @StuartRance shares practical steps that you can take to identify problems without waiting for them to cause incidents first. Share on X

Review vendor websites and announcements

Every organization uses some third-party products as part of their IT solution. This can include:

User devices such as desktop and laptop computers, laptops, and phones
Operating system software, running on user devices and on servers
Applications, running on user devices
Commercial software, running as cloud-based services, or on your local servers
Network infrastructure, such as switches, routers, firewalls etc.
And many more…

All of these products are likely to include defects, and you can often find out about these defects before they have any impact on your users if you take the trouble to monitor announcements that the vendor makes, on their website, or via newsletters or other communications. Depending on your relationship with the vendor you may already speak to an account manager regularly. They’ll often be able to notify you of significant problems.

Every time you learn about a defect in a third-party product you use, this is an opportunity to address the problem before it’s caused an incident in your environment. Things you might do include:

Develop a plan for how you’ll respond when unavoidable incidents occur, so that you can reduce the impact on your users, and on your IT organization
Understand the exact circumstances that could trigger incidents, and modify how you configure or use the product to avoid triggering them
Monitor future announcements to ensure you can apply any patches or other solutions as soon as they become available
In extreme situations you may want to consider replacing the faulty product with one that does not have the defect. Bear in mind that this is only likely to make sense if the issue is severe, is unlikely to be resolved quickly, and when there is a viable alternative product.

Work closely with internal development teams

Many organizations have software development teams that develop and maintain applications they use. You need to ensure that you have a good working relationship between your operations staff and your development staff, so that you learn about issues and errors as they arise, and you can work together to plan how to manage any incidents they may cause. You should also work together to prioritize resolution of any issues and errors, to ensure that the ones with most impact are addressed in a timely manner.

Monitor user communities and social media

If you have a very large number of users, and especially if some or all of the users are outside your own company, then it’s important to monitor user communities and social media to find out about issues the users are seeing that they’ve not logged as incidents. Sometimes you’ll discover that users have developed perfectly good workarounds for themselves, and you can adopt these to help address the underlying problem – with, of course, suitable recognition of the people who contributed to the solution where that’s practical.

You can also join user communities that support third-party products that you use, and this may enable you to identify problems that are affecting other organizations before they become visible in your own environment.

Use third party threat assessment and penetration testing services

These types of service can help you prevent security incidents, by identifying how you might be attacked, and where you might be vulnerable.

Threat assessment services are provided by organizations who monitor a wide variety of organizations looking for what kind of threats exist, and the extent to which they’re being exploited. They can provide you with information that may help you to avoid security incidents by proactively taking defensive action, before your own organization comes under attack. Similarly, penetration testing services may identify a vulnerability in your defences that you can address before any incidents occur.

Conclusion

If you only use problem management to analyze incidents that have already happened, then you’ll always be reacting after your users have suffered. Try thinking about what might happen in the future and you can get ahead of problems, and deliver much higher value to your users and your customers, often with a reduction in your own overall costs.

If you’ve other ideas for how to identify problems before they cause incidents then please share them here – and if I ever update this blog I’ll be happy to include them – with suitable recognition for whoever contributed.

If you’d like to learn more about other aspects of problem management then you can read some of these blogs:

About

the Author

Stuart Rance

Stuart is an ITSM and security consultant, trainer, and author who has worked with clients in many countries, helping them create business value for themselves and their customers. He was the author of the 2011 edition of ITIL® Service Transition and lead author of RESILIA™ Cyber Resilience best practice published in June 2015. Now that his children have all left home, he has plenty of time on his hands for contributing to our blog – lucky us!

What are the phases of problem management?

How do most organizations identify problems?

When’s the best time to identify a problem?

How to identify problems that haven’t yet caused incidents

Review vendor websites and announcements

Work closely with internal development teams

Monitor user communities and social media

Use third party threat assessment and penetration testing services

Conclusion

You'll Love This Too!

Preparing IT Staff for AI Use

Considering Generative AI for ITSM? Here’s What You Need to Know

2024 ITSM Trends – “Do Existing Things Better”

If AI is the Future, That Future is Already Here for ITSM

Measuring Success in IT

Let’s Talk Employee Experience: Why Our AI Chatbot via Microsoft Teams is the Gamechanger You’ve Been Waiting For

About

the Author

Stuart Rance