167323951The line between Data Integration and Application Integration is no longer fuzzy – it has finally vaporized in the Cloud; fitting.

The emergence of the Cloud, including SaaS, IaaS, and growing PaaS adoption, has brought with it standardized interfaces at the application layer, otherwise known as APIs. Traditional integration vendors that have identified their products with Data or Application integration are no longer making that distinction in the Cloud; they are simply talking about integration. The line is disappearing as the Platform-as-a-Service war heats up.

Informatica, a vendor associated with Data Integration in the on-premise world just released its next version of Informatica Cloud, Winter 2014, which is delivering new process, service, and data integration in one package into the Cloud. Process integration, including human-centric workflows are part of the Winter 2014 release, as is industry standard service integration of RESTful and Web Services APIs. Informatica has also announced new ERP adapters, which will unlock back office ERP data for Cloud consumption.

The Informatica Cloud leverages its unique Vibe “map once, run anywhere” virtual data machine, allowing integration mappings to move between the Cloud and on-premise without the need for code changes. Informatica didn’t leave their jewels on the ground: they brought their data quality and profiling capabilities into the Cloud as well. Sounds like a comprehensive data and integration platform-as-a-service.

Pervasive, now owned by Actian, dropped the data integration moniker a while ago, and now likes to simply think of themselves as integration solution providers. Actian’s new “Invisible Integration” capabilities allows for easy integration construction in the Cloud between standard APIs.

This week at Dreamforce, Salesforce1 was announced. Salesforce has brought their multiple PaaS offerings under “1” platform, to create a unified platform that will allow for the creation of next generation apps using new APIs and advanced capabilities of integration with apps, data, processes, devices, and social networks. Salesforce, a successful product and company that was born in the Cloud, continues to build out their offerings and is making the move to expand their PaaS offerings.

IBM’s acquisition of SoftLayer and recent understanding of the importance of the public Cloud (guess they finally saw the light?), coupled with IBM’s software business’ focus on platform will put pressure on the PaaS market for new, unique, and innovative integration solutions in the Cloud.

Any discussion about the Cloud wouldn’t be complete without mentioning Amazon Web Services. Once primarily an IaaS vendor, AWS, which has always supported open standard APIs, now offers data warehousing, data integration, service integration, simple workflow, and messaging services. AWS is increasingly becoming more of a PaaS provider than ever before.

Why are we seeing these trends? Cloud is big, Cloud is hot, Cloud has been a growth area for many vendors; higher growth than what they are experiencing with on-premise offerings. Growth areas get investment money. Investment money seeds new projects, products, and innovations. Vendors stay competitive.

So I wonder when the line between Application and Data Integration is going to disappear on-premise. Maybe when the apps do.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

leadspace_Edge2013-medAt last year’s IBM Edge in Orlando, I said that IBM was looking to increase investment in storage while simultaneously aligning and focusing its then complex storage portfolio.

Was I wrong?

Well, I was at least half right. Instead of simplifying, IBM went out and bought another storage vendor, Texas Memory Systems (TMS), adding another product to its already complex portfolio. In addition to the initial purchase, IBM has committed to investing one billion dollars – one sixth of IBM’s annual R&D budget – on flash research and development in systems and software. So it has definitely increased investment in storage.

“We’ve tripled the size of the FlashSystem development team since the acquisition of TMS,” said Jan Janick, Vice President of Flash Systems and Technology at IBM. Moreover, IBM has committed two billion dollars thus far to its PureSystems research and development. The addition of TMS RamSan flash arrays, rebranded by IBM as FlashSystems, is the answer to what many believed was a missing component in IBM’s portfolio.

In my opinion, the biggest differentiator for IBM is its ability to move data off the array. In a recent Info-Tech solution set on how to Evaluate the Role of Solid State in the Next Storage Purchase, I point out the importance of understanding your organization’s requirements for data movement off of the all-flash array. If you’re just trying to provide consistently ultra high performance storage for a specific application, an all-flash array may be fine. But if you’re looking for a broader deployment with unpredictable workloads or data that degrades in value over time, eventual movement of data off the array is critical to keeping down total cost of ownership.

IBM accomplishes off the array movement by putting FlashSystem behind IBM System SAN Volume Controller (SVC), enabling movement of data from FlashSystem to slower, more cost effective storage. They call it the FlashSystem Solution. Flash System Solution also enables the use of storage services, such as snapshots and replication, which only adds 100 microseconds to system latency to the FlashSystem without SVC. Real-time Compression (RtC) can also be enabled on SVC, maximizing usable FlashSystem capacity (although the additional impact of RtC on latency has not yet been published). This improves the overall value of flash within the context of the larger system; it’s all about the economics of flash.

So, what does this mean for the future of IBM? Like I said, it has added complexity to its portfolio. Compound this with future plans for adding unified capabilities to XIV and the many updates to its other storage products (such as V7000 Unified, XIV, SONAS, N-series) and its storage solutions all start to overlap considerably in features, functionality, and management. In the long run, however, IBM has set itself up to simplify, by taking the storage media out of the equation; or rather, it will move much of the value and margins from hardware to software (of course the 1s and 0s have to get stored somewhere).

IBM argues that, right now, they are ahead of the competition in Software Defined Storage (read: Storage Virtualization) with SVC. While many vendors, including IBM, have developed the capability to abstract and virtualize underlying storage and add storage services, the key to what IBM calls Software Defined Storage 2.0 is about industry-led openness. IBM has invested heavily in OpenStack. It is the number two contributor (250 employees contributing) behind Red Hat, and it also made significant contributions to OpenDaylight for Software Defined Networking, which led to significant community sourced innovation in this space.

Nonetheless, most people’s reaction to this push for openness is “Why are you supporting the commoditization of storage? You’re a storage vendor?” The answer is that the value proposition is in the software and data services. By supporting open initiatives, IBM (in a sense) enables the extension of its Storwize platform to OpenStack, so that others can deploy applications directly on IBM’s platform. Thus, through the OpenStack GUI, organizations can now leverage IBM’s storage services capabilities (snapshots, replication, data movement, Real Time Compression).

The next step for IBM, and where it really plans to deliver value, is all about the data. With what it calls Software Defined Storage 3.0, data storage will be simplified. This will be done by delivering a single protected pool of data with all the automated management occurring on the backend, observing patterns in the data to dynamically and automatically create policy on the fly and match data with the right storage medium (solid state, spinning disk, or tape). Further, because movement of large volumes of data across the network is inefficient, it will intelligently move applications to the data. By making it easier for others to use its platform, IBM positions itself to capitalize on its strong support services capabilities and strong portfolio of analytics software, from which it can then derive its margins.

While storage virtualization…ahem, Software Defined Networking, really isn’t new, IBM has pushed the boundaries in terms of where it is headed. By abstracting storage services away from the hardware and opening them up, IBM will simplify decisions for customers around what storage hardware to buy from IBM. We are still pretty far out from this, however, and it will initially be of most benefit to large organizations leveraging OpenStack or for partners looking to develop new services through integration. However, it will change the way we purchase and utilize storage in the future. The only question, I suppose, is whether the software portfolio, where the value resides, will be equally complex to navigate.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter


Why is everyone making a big deal about big data?

Lots of reasons! However, one of the most critical is that there is a general need for resources with big data skills who can work with emerging big data technologies.

Currently, there are the really smart data scientist roles: these are people that look for solutions to problems inside massive amounts of data. There are also infrastructure roles: people that are responsible for setting up and maintaining information in technologies such as Distributed, Columnar, Document, Graph, and Geospatial databases.

However, one of the biggest skills gaps seems to fall in between these two extremes. There is a need for people that can write MapReduce jobs to look for patterns and data that is then fed into the Data Scientist’s algorithms. These roles may also require data integration, integrity and quality management across big data repositories, operational data stores, and data warehouses. Depending on the maturity of the big data technologies and market space, combined with the size of a given organization’s big data projects, these roles may be handled by the same person or multiple resources.

Data technology vendors to the Rescue!

Vendors, including the likes of Informatica, IBM, Actian, HP, SAP, and many others are providing technologies that will help reduce the learning curve by allowing developers to use well established data management and integration environments – such as Informatica’s PowerCenter or IBM’s new technology, BIG SQL – to work with complex big data technologies.

This week, Informatica released “VibeTM virtual data machine”. It underlies Informatica’s core data management and integration products and essentially gives you the ability to map once, run anywhere. Similar to the Java model of write once, run anywhere, VibeTM is an idea that is long overdue: or should I say the implementation is long overdue because the idea has been around for a long time. As a former enterprise integration architect, I saw a huge potential to re-use mapping logic without having to re-write it in each integration tool. For example, the mapping between two data objects in an ETL should be able to be re-used in an ESB that maps the same two objects.

VibeTM provides this capability and it further supports my PoV that the lines between application and data integration are disappearing, leaving us with simply integration. VibeTM takes this even further by allowing for changes to the underlying data technologies. So not only can you map once, and run anywhere, in any of Informatica’s integration tools, but you can also use the same mapping regardless of whether the data objects are stored in Hadoop, Oracle RDBMS, IBM DB2, HBase, etc. This means that business rules in data profiling tools that are used for quality checking in operational data stores, can also be used to run quality checks on Hadoop data sources. This is a significant reuse capability that will help improve data quality, integration and integrity regardless of the persistence technology.

VibeTM reduces the skills gap by allowing users with Informatica PowerCenter experience and capabilities to build the equivalent of MapReduce jobs much faster than even experienced MapReduce technologists can. This is a common trend in the market; IBM has made a similar move with the introduction of BIG SQL: providing ANSI SQL like language for querying Big Data databases and eliminating the need for complex MapReduce coding. SAP is also helping technologists familiar with their Business Objects suite shorten the Big Data learning curve by providing connectors to Hadoop and other Big data database technologies.

IBM, Informatica, SAP: all big players, traditionally expensive solutions, prohibitive in the small to mid-sized business, right? Not so much. IBM is working with the appropriate authorities to make BIG SQL a standard. Informatica will be doing the same with VibeTM: they want the VibeTM Virtual Data Machine (VDM) to become as pervasive as the Java Virtual Machine (JVM). The industry as a whole has an opportunity to benefit from these unique innovations that are also shortening the Big Data learning curve.

IBM also has community and express offerings for many of its middleware products. SAP has been moving down market with their entire product stack for some time now, making further progress in the SMB space.

Informatica is announcing an Express version of PowerCenter this week. PowerCenter Express is Informatica’s flagship product in a deployment model that meets the needs of departments and SMBs at a much more reasonable price point than the enterprise version. This will bring the capabilities of VibeTM and PowerCenter to a new market segment that is likely the one most in need of Big Data utilization accelerators.

So if you are trying to learn how to implement big data in your IT environment but don’t have the skills in your organization, look for a vendor that can join you on your big data journey. Having a partner to share ideas with, and one that will provide support as you go through the wilderness, makes the journey a little less scary, and they will learn as much from you as you will from them.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

DataInformatica, one of the leading data integration vendors just announced a product directly aimed at their high speed data processing competitors: Ultra Messaging SMX (Shared Memory Acceleration). They claim to have broken the 100 millisecond latency barrier on multi-core commodity hardware regardless of invocation in Java, C, or C#. See their press release here: http://bit.ly/13ZZ1kG

Ultra Messaging SMX has the potential to change the game for how messaging software can be used. Messaging software has traditionally been used for communication between multiple application processes or inter-process communication (IPC) via shared memory. SMX will allow developers to leverage the benefits of messaging on inter-thread communications (ITC) within a single application process rather than having to develop their own shared memory ITC solution.

Informatica has targeted the Ultra Messaging product at the low-latency, high volume electronic trading applications in financial markets. However, use cases will emerge in other applications and markets as more things are connected (the Internet of Things), mobile devices generating real-time data are integrated into systems of interaction, and the results of data analytics are required in real-time.

Informatica is not the only vendor tackling the high volume, low latency data processing problem. With data volumes growing exponentially and organizations looking for insights inside that data as soon as possible, several of their competitors are also bringing solutions to market.

  • Actian’s acquisition of Pervasive puts their high performance VectorWise database together with Pervasive’s DataRush technology to significantly reduce data analysis time and effort.
  • IBM has released a high volume, low latency, and extremely scaleable messaging appliance in their MessageSight appliance, which can handle one million connected devices, and 13 million messages per second.

Mobile Devices, cloud computing, the Internet of Things, and all the big data being generated in today’s IT environment are driving the need to break current performance barriers. Organizations are moving from wanting to know what happened in their business to predicting what is going to happen, and new technology is emerging that will help them understand what is happening right now. Real-time business analytics have emerged to help organizations dynamically adapt to changing conditions in economic, social, political, and physical environments as the changes are occurring, allowing organizations to adapt to new opportunities.

Vendors will continue to push the performance limits with new products and solutions – with these three vendors’ announcements happening within the last two months alone. I am confident we will continue to see more performance improvements across the data processing vendor landscapes as Big Data continues to push innovation.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

At IBM Impact 2013, I became more aware of IBM Pure Systems: what they are and how they work. I was surprised to learn that the implementation of functionality on PureSystems is patterns based.

Adding a puzzle piece

For example, there is an established pattern of SAP ERP using DB2 for its database. There are patterns that integrate WebSphere Application Server with WebSphere MQ, virtual DataPower appliances, and databases. The list goes on, and if there isn’t a pattern in the catalog that meets your requirements, make one and put it in your own catalog of available patterns for invocation.

I was surprised and pleased to see the concept of patterns being sold in a commercial IBM product. Early in my former tenure as an Integration Architect at IBM, I was involved in testing the revised Process Integration Patterns for e-Business, circa 2003. I won’t take credit for what is in Pure today, but part of me wonders whether or not I helped pave the way.

A pattern is a use case or commonly observed interaction among multiple components. In my experience as an integration architect, our patterns handled the exchange of messages between common end point types, and used common transformation rules. Each “interface” was an instantiation of a pattern.

Pure patterns are pre-configured instances of, and connections between each component in a solution. In the PureFlex environment, each component is a virtual machine running in a private cloud. In the PureApplication environment, each component is an application or application middleware virtualized inside the private cloud. In the PureData environment, each component is a virtual database or data cluster.

When a pattern is instantiated, it is configured for the specific business solution in order to meet given requirements. IBM has harvested the most common patterns of usage in the business world of infrastructure, applications and data and turned those patterns into products. Hats off to IBM for providing patterns based solutions to common problems of systems integration. IBM even admitted they learned a lot about making their own products work together when they built the PureApplication solution.

“Applause” from someone (me) who formerly ran a startup business on WebSphere Application Server and Message Broker connected via JMS running on top of DB2. It wasn’t a walk in the park to install, configure, and then maintain.

IBM Pure has helped solve these problems using patterns, and through the elasticity of a cloud environment, the solution can scale up and down to meet demand.

Patterns are not just useful for instantiation. They are very effective during system testing: test one instance of a pattern to make sure the pattern works before moving onto test the remaining instances. Once deployed, system monitoring and maintenance can also be designed within the context of patterns. Last but not least, governance of the solutions and change control is easily handled through the use of patterns.

Patterns are no longer limited to what we see in woven fabrics, art work, architecture, or mechanical design; they now apply to the cloud. Fortunately, IBM has figured out how to make clouds look the same through the use of patterns, and when a new cloud formation appears that we haven’t seen before, it can be captured and saved as a pattern for future reuse.

Reduce, recycle, and reuse. That’s pure.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter