An integration project involves some standard documentation – such as requirements, test cases, and architecture diagrams as well as information to help the business manage and support the integration going forward. This material can take the form of a troubleshooting guide or operations manual. Today’s post is about a methodology we use in our projects to ensure high quality: Failure Modes and Effects Analysis (FMEA). The FMEA is a critical tool we use at Dispatch as part of our DIVE project management methodology to ensure we design robust integrations and provide comprehensive documentation to help our clients manage them in a wide variety of situations.
What Is an FMEA
FMEA is a process analysis tool that was created in the 1940s by the US Military, and that is now widely adopted in aerospace, automotive, electronics manufacturing, and many other sectors where quality assurance is critical. It is a step-by-step approach to review a system, product, manufacturing process, or service. There are two fundamental components of an FMEA:
- Failure Modes: these are the ways (or modes) in which a system might fail, and the circumstances in which these failures may occur. Failures include any errors or defects that could negatively affect the customer.
- Effects Analysis: this is the study of the impacts or consequences of those failures.
The FMEA process is a powerful way to review and improve a design before implementation. At its essence, an FMEA is about asking the simple questions of “what might go wrong and under what circumstances,” and “if a failure of this type occurs, what bad things might happen?”
What Does This Have to Do with Integrations?
Enterprise integrations and workflow automation solutions work within a complex system that typically has lots of variables that can cause things to go wrong. Some factors that can cause failures include upstream and downstream application availability, data quality, user error, file corruption, user credential expiration, and dozens more.
It is common for system designers to focus on how the application works when everything is working well. The so-called “happy path” is the obvious way to start an integration project design. However, if the design work stops there, terrible things may happen in production if something deviates from that happy path, and customers can end up not very happy at all.
For integration projects at Dispatch, an FMEA starts with a prioritized list of everything that a diverse group of stakeholders (users, developers, managers, IT experts) could reasonably anticipate might go wrong at every step in an integration process. This team describes the impact of each failure on the business, users, upstream and downstream applications, and other stakeholders. Finally, the team works through whether modifying the integration system design could prevent the failure, and if not, how the business could detect the failure when it happens and resolve it, or at least mitigate any impact.
An Integration Is a System
Many people underestimate the complexity involved in data integration between enterprise systems and view it as a magical event where data is “poofed” from one system and instantly shows up in another. Of course, a developer ideally wants the user to see the integration as that magical. In reality, integration is a complex system that needs to work under a variety of circumstances.
Integrations are, at their essence, a type of process flow, with a series of steps, actions, and decisions that might be triggered depending on various conditions. In production, each step might fail in one or more ways, either because of an issue that was not discovered during testing, a change to the source or target system post-go-live, expiry of the login credentials for the user account used by the integration, etc. An FMEA begins by listing each step, trigger, and action. For each step, you try to anticipate all of the reasons why it could fail. Once you have a list of all reasonable potential failures, you specify the:
- Effect (what impact does the failure have on the business, users, and other stakeholders)
- Detection Method (how the user will know the failure happened)
- Severity (will this cripple the business process, or is it a minor issue with no tangible impact)
- Probability (is this something that could happen often, or is it a failure that would only occur under specific circumstances or use-cases)
- Resolution (how does the business recover from the error that has just happened, and how would they stop it from happening again)
- Resolution owner (whose job it is to fix the problem)
- Prevention options (can this failure be eliminated through a different design or process)
For example, if a record in system A needs to be transmitted to system B, but the record does not include a value in a mandatory field in system B, (say, an email address), the FMEA entry for that failure type might be:
|Mode||Effect||Detection Method||Probability||Impact Severity||Resolution||Resolution Owner||Prevention Options|
|User Record in System A excludes mandatory field required for System B||No record created in system B. Potential for data loss. Potential business impact: failed payroll notifications, tax slip delivery, employee communication failures||Integration must detect mandatory field(s) are null or misformatted. If found, automatically send a notification to system A representative||High – manual data entry – subject to user error||Medium – data loss of individual records – impedes data quality in System B||Automatically alert personnel to update the record in system A to add the email address. The integration will process the record on the next run.||System A user||Modify the configuration of the upstream system to prevent incomplete record creation.
Modify integration to not accept incomplete or malformatted records and alert users.
The Value of the FMEA for Integration Projects
The FMEA is a vital tool to help design integrations that eliminate as many opportunities for error as possible – in particular, those errors with a high probability of occurrence and high or severe business impacts.
In many cases, the cost to fully error-proof an integration is too high, especially for errors anticipated to have low impacts or to be infrequent. In these cases, an FMEA helps ensure that the integration builds in logic so that critical failure points will not go undetected. Of course, it’s always better to prevent an issue from arising in the first place, but when a problem happens, you want to know that it is happening. By specifying (and testing) how and who the integration system will notify upon a critical error or potential failure, you can avoid “silent failures” that are only noticed once it’s too late. A great example of this is the “low fuel” warning light in cars.
In some cases, no notification is possible, such as when a user selects in an incorrect value in one system, which results in a valid (but wrong) output in another system. The integration system may not be able to prevent or resolve these issues, but it can still be valuable for the business to anticipate they might occur. The point is a well-constructed FMEA can lead to fewer “unknown unknowns,” to quote Donald Rumsfeld.
Another purpose of the FMEA is to give the customer’s technical team the info they need to identify the significance of various error statements and quickly determine the path to resolution. The FMEA document can help identify the system that is the root cause of the issue (regardless of the source of the issue notification) and can reference an operations manual for step-by-step instructions to resolve the issue. An FMEA can become a troubleshooting guide for the technical team supporting integrations in production.
The FMEA serves as a risk reduction, mitigation, and resolution tool to make our integrations as robust as possible. Companies depend on the integrations for business-critical processes. The FMEA approach is an indispensable way to build confidence that businesses can rely on their integrations in production.
Contact us to learn about our products and services.
About Dispatch Integration:
Dispatch Integration is a software development and professional services firm that develops, delivers, and manages advanced data integration and workflow automation solutions. We exist to help organizations effectively deal with the complex and ever-changing need to integrate data and optimize end to end workflows between cloud-based, mission-critical applications.
Read More from Dispatch Integration:
Data Integration: Life in Production
Case Study: Contract Allocation Co-Processing for SaaS Company
Compare: Data Comparison Made Simple
Ben Lemire is an experienced Business Analyst and Data Integration Specialist. He specializes in data migration, developing test plans, executing and leading UAT and writing operation manuals, and integration architecture documents.