Attending the Informatica analyst conference last week, my POV of the lines between application and data integration are blurring, if not disappearing, was strengthened. Informatica described an architectural hub-and-spoke data integration pattern that one of their customers implemented using PowerCenter, coupled with persistence and other complementary technology. Informatica is now productizing that pattern in their Data Integration Hub (DIH) solution.
The classic Application Integration (AI) functionalities of publish/subscribe, canonical message models, routing, brokering, and orchestration are being implemented in the Data Integration (DI) world, and blurring the lines between the two integration domains. Data formatting, transformation, enrichment are features that both domains have shared, because at the heart of every application programming interface (API) call is data. More recently, Change Data Capture (CDC) has brought real-time data messaging to the DI world.
The primary functional difference between AI and DI is the interface layer. AI interfaces at the API level whereas DI interfaces at the database level. The primary non-functional difference is the way that data volume is realized. An easy illustration is considering 1000 records of data being sent between applications. An AI scenario would represent those records as individual messages, sent 1000 times. In a DI scenario, those 1000 records would be sent in one message. Therefore, DI scenarios are not ideal when implemented in application integration Enterprise Service Buses, simply because ESBs are not engineered to process large data sets in each interaction.
Q: So if both approaches have their place, why are the lines blurring?
A: The technology they are being implemented in.
If a data integration product can’t call REST and SOAP APIs; or an application integration product can’t interact directly with a database; neither one of the vendors selling those products will get far in today’s IT landscape. Some of the larger vendors handle both approaches, but with different product sets, meaning customers need to spend more on software licensing to handle the two different scenarios. Other vendors focus on one or the other, and have some overlaps in each so they can claim to be all things to all people.
Wouldn’t it be great if data transformations written in an ESB are just as applicable in an ETL job, or vice-versa? Wouldn’t it be great if integration specialists didn’t need to know, and support multiple product sets in their environment that did similar, but different things. Wouldn’t it also be great if we could reduce the number of software licenses that needed to be negotiated, purchased and maintained.
The obvious part of the DIH is to tackle the integration hairball. Even with tooling, data integration has long been point-to-point. Extract, Transform, Load implies one source, and one target. The DIH provides the ability for multiple integration flows to re-use canonical data in a publish/subscribe paradigm, and remove the point-to-point nature of traditional data integration. Now the extract can work for multiple loads, because the ability to implement multiple transforms depending on target is now possible.
My only question is: how long will it take for the Data Integration Hub to evolve into a Data Services Bus (DSB) to run alongside, in or below the Enterprise Service Bus? Hub and spoke integration went the way of the dinosaurs when it became the single point of failure in a distributed environment.