Big data is moving beyond hype into solutions that are providing real insight and business value. It is no longer the elephant in the room that nobody wants to talk about: it is a growing ecosystem of data, technology, and resources that everyone wants to understand more about, and do something with.
So why is this ecosystem so complex? Well, big data itself is complex: in fact there still isn’t consensus on what it actually is; there is only a set of attributes that try to describe what it is, and even those aren’t consistent across the industry. Our definition of Big Data is “rapidly increasing amounts of data, generated by multiple sources, in many formats; analyzed for new insights.” Essentially a paraphrase of the traditional 3 V’s: Volume, Variety, Velocity – with added aspects of Veracity and Value.
In contrast, traditional data comes from known sources, at controlled volumes, with understandable content. Therefore, it’s no surprise that big data architecture is different from traditional data architecture. Today’s data architects are trying to understand the ecosystem, and deal with the paradigm shift that big data is causing in their knowledge and capabilities in data architecture.
The big differences in big data architecture include:
- Big data architecture starts with the data itself, taking a bottom-up approach. Decisions about data influence decisions about components that use data.
- Big data introduces new data sources such as social media content and streaming data.
- The enterprise data warehouse (EDW) becomes a source for big data, rather than a destination for transactional data.
- The variety of big and unstructured data requires new types of persistence.
- Data persistence is horizontal, not vertical.
- NoSQL is very different from SQL.
Architecture is much more about making decisions than creating specifications. Big data architecture requires decisions in four primary layers:
- Data: what kind of data is part of the organization’s big data value chain?
- Data Integration: how is the data captured and integrated for analytics?
- Data Persistence: how and where does the data need to be stored for analytics?
- Data Analytics: what types of analytics does the organization need to perform?
Understanding how the organization wants to leverage big data through selection of a business pattern will help with decisions about data. Data sources, types, and volumes will influence decisions about data integration and persistence technology. How the data is organized and persisted will influence decisions about what types of analysis technology is required.
Setting principles and guidelines about the use of Open Source Software (OSS) vs. vendor solutions can also influence architecture decisions. Given that most of the big data solutions originated out of OSS, the decision to use or not use OSS is a little more difficult than traditional approaches to the problem.
Without a structured approach to big data architecture, organizations could find themselves in a Big Data Mess: they risk their existing data architecture being unable to handle big data, eventually resulting in a failure that could compromise the entire data environment. Also, they risk solutions being picked in an ad hoc manner, which could cause incompatibility issues down the road.
With the rapid change of big data and associated technologies, governance is critical to maintain structure and organization in the big data environment. Big data architecture will help establish the governance structure and boundaries, and anticipate change. An Architectural Review Board and Change Management processes will be very helpful to ensuring the big data architecture continues to work smoothly and effectively into the future.
Avoid a big data mess by learning and applying an architectural approach to big data with Info-Tech’s blueprint, Create a Customized Big Data Architecture and Implementation Plan.