Twelve Questions to Consider When Defining Requirements for a Data Integration Project

ARTICLE

Mar 27, 2023

There is a truism that most Business Analysts try to take to heart – to get to the right solution, you’ve got to start by asking the right questions. We know from experience that this wisdom applies to data integration projects. Knowing the right questions to ask requires a broad understanding of the systems, processes, stakeholders, and overall business objectives that differ from project to project and client to client.

While every integration project is unique, we’ve compiled a list of questions that can serve as a helpful starting point. These questions are tailored to integration and automation projects and exclude broader questions like “What is the problem to be solved?” Many of these questions will generate discussion within the project team and spark more specific questions that will clarify the project scope, context, design parameters, and expected outcomes.

1. What environments are available?

This includes the data source(s), destination system(s), and integration platform. These questions help identify project infrastructure complexities, testing approaches, and go-live processes. For example, if development and test environments are not available, additional care must be taken to protect production data and manage test processes to avoid production issues. If development and test environments are available, you must know if they perfectly replicate production environments – if not, you may be surprised when the production solution does not perform the same in non-production systems. Even when development and test environments are available and are cloned versions of production, you must ensure adequate representative data exists to cover all test scenarios.

2. Is data movement real-time or scheduled?

Several factors can dictate this design decision, and you need a clear idea of when data is moving. Real-time or “triggered” updates move a single record when an event happens in the source system, like saving a new record, while a “scheduled” integration moves all eligible records on a set frequency, like once a day. Factors influencing this decision include API functionality and limitations, “task” usage limitations, and user experience requirements. While quick, real-time updates are usually preferred, this decision can affect how efficiently the systems operate and, in some cases, can result in additional costs. Conversely, if users expect instantaneous updates between systems, a scheduled “batch” update will not meet this requirement.

3. What is the system of record?

In a simple, one-direction integration, this is not a concern, but if you are building a two-way integration that deals with the same record (for instance, one that pushes invoices from a sales system to a finance system and pushes payment status back to the sales system for reporting), you need to have a clear understanding of which system is the master for particular fields, potentially at various points in time. This helps you identify any validation and guardrails you need to build. Nailing this definition is a prerequisite to mapping the fields between your systems. Understanding systems of record is even more important when more than two systems are involved and if data is enriched by third-party sources. Otherwise, the integration may propagate data issues across multiple systems.

4. What records are eligible for the integration?

In some integrations, this is as simple as saying that all records of a given type are moved. For example, take every new “hire” record from the applicant tracking software and create them in the HR software. More often, there is nuance here, such as whether the worker is a contractor, a temp, a previous employee, etc. Whether some data should be excluded or trigger different workflows is an important element of many integrations. The devil is in the details.

5. What is your unique identifier?

A unique identifier, or key, is a value on the record that distinguishes it from all other records, like a person’s social insurance number or the system ID value of the record. When this value is present in both the source and destination system, the integration can use it to know whether to create a new record in the destination system or update an existing record. Sometimes, a unique identifier may not be apparent or historically not mandatory (e.g. an email address). In these cases, you may need to redefine the data models to make these records mandatory in both systems.

6. What is the expected transaction volume?

This is an important factor to understand – both for the short and long term to ensure that the solution is scalable. In some cases, integrations can generate so much transaction volume that running during business hours without impacting system performance is impractical. Transaction volume may also drive costs, especially if you pay for API calls or tasks.

7. Does foundational data need to be synchronized?

Certain fields, such as dropdown lists of departments or locations, may occasionally be updated with new options. When these fields are included in an integration, you need to determine how any new, removed or changed field values in the source system will be updated to the destination system such that records with these new values can be transmitted to the destination system. A separate integration to synchronize this “foundational data” is one approach.

8. Are we dealing with the transmission of files?

Special attention must be paid to file formats whenever an integration needs to create, move, or parse a file. File specifications must be reviewed in detail. Sample files should be requested and compared to these specifications. Credentials required to retrieve and/or load the file must also be defined, as must any encryption/decryption requirements and methods.

9. What are the security requirements?

The sensitivity of the information in your systems will inform your approach to security. Security requirements are often underestimated, and we encourage project teams to consider any movement of data between systems to require a fulsome security review. Considerations include:

System permissions of the integration and how credentials are manage
The type of encryption needed – both in transit and at rest.
What kind of transactional logging is required?
Who has access to sensitive data during the project and in production?
Whether data is moving from a secure, controlled source (e.g. an HR system) to a less controlled destination (a database or CSV file output).

10. Are there Privacy / Governance considerations?

Various types of data are subject to special care and protection, from client lists to employee names. Their handling must reflect their sensitivity and consider applicable policies, laws, and regulations. Defining the requirements for integrations dealing with sensitive data may require input from the organization’s privacy officer or a legal consultant. These requirements can also impact whether real data can be used for testing.

11. How will support work?

Integrations can fail, just like any other software. You need a way to detect these failures, prioritize the urgency, and have resources available to diagnose and fix them (including data recovery). This begins with mapping the integration process and identifying the failure points, such as invalid data, the expiry of credentials, one of the systems being offline, etc. Then work on defining who can fix each type of issue and how they will be notified. See our article “What’s an FMEA, and Why We Use It” for more detail on this. Dispatch can provide technical support for your integrations and can provide tools that allow you to monitor your integrations to detect any anomalies. See our Sentinel blog post for more information.

12. Are there any go-live requirements?

Business Analysts know these as “transition requirements.” They are temporary steps that need to be taken at go-live. Common examples (outside of organizational readiness requirements like training that apply to most projects) include:

Copying the “unique key” (discussed above) from one system and storing it as an “external ID” in the other system, allowing the integration to match existing records reliably. Depending on your data hygiene, this may require a lot of manual matching beforehand.
Doing an initial load of data from one system to the other. If the destination system is new to your organization, you may need to populate it with all of the relevant records from the source system before turning on the integration. Some types of integration (g., “full file” where all records are sent upon every instance of a scheduled integration run) would not likely require this step. Note that if you need to do an initial load of a new system to kick off an integration, it can usually be done by tweaking the eligibility rules on a custom run of the integration.
The go-live time and date may be restricted by business considerations such as system availability during working hours. Define this in advance to ensure the key players are available when needed.
Less commonly, a blackout period may be required, where one or both systems cannot be used when the integration is moved into production environments.

In defining the requirements for an integration, you should expect your team to cover each of the questions above and more in order to develop a robust system architecture and a design that meets your needs for quality, budget, and timeline.

The Dispatch team can take you from integration discovery to post-implementation support, so you can do it right the first time and have stable integrations that require less maintenance over time.