The new data warehouse, often called “Data Warehouse 2.0,” is the fast-growing trend of doing away with the old idea of huge, off-site, mega-warehouses stuffed with hardware and connected to the world through huge trunk lines and big satellite dishes. The replacement is very different from that highly controlled, centralized, and inefficient ideal towards a more cloud-based, decentralized preference of varied hardware and widespread connectivity.
In today’s world of instant, varied access by many different users and consumers, data is no longer nicely tucked away in big warehouses. Instead, it is often stored in multiple locations (often with redundancy) and overlapping small storage spaces that are often nothing more than large closets in an office building. The trend is towards always-on, always-accessible, and very open storage that is fast and friendly for consumers yet complex and deep enough to appease the most intense data junkie.
The top ten trends for data warehousing in today’s changing world were compiled by Oracle in their Data Warehousing Top Trends for 2013 white paper. Below is my own interpretation of those trends, based on the years of working with large quantities of data.
1. Performance Gets Top Billing
As volumes of data grow, so do expectations of easy and fast access. This means performance must be a primary concern. In many businesses, it is THE top concern. As the amount of data grows and the queries into the database holding it gain complexity, this performance need only increases. The enablement factor is huge and is becoming a driving force in business.
Oracle uses the example of Elavon, the third-largest payment processing company in the United States. They boosted performance for routine reporting activities for millions of merchants in a massive way by restructuring their data systems. “Large queries that used to take 45 minutes now run in seconds..”
Everyone expects this out of their data services now.
2. Real-time Data is In the Now
There’s no arguing that the current trends are in real-time data acquisition and reporting. This is not going to go away. Instead, more and more things that used to be considered “time delay” data points are now going to be expected in real-time. Even corporate accounting and investor’s reports are becoming less driven by a tradition of long delays and more by consumer expectations for “in the now.”
All data sets are becoming more and more by just-in-time delivery expectations as management and departments expect deeper insights delivered faster than ever. Much of this is driven by performance, of course, and metrics above will improve this, but with those performance increases come increases in data acquisition and storage demands as well.
3. Simplifying the Data Center
Traditional systems weren’t designed to handle these types of demands. The old single-source data warehouse is a relic, having too much overhead and complexity to be capable of delivering data quickly. Today, data centers are engineered to be flexible, easy to deploy, and easy to manage. They are often flung around an organization rather than centralized and they are sometimes being outsourced to cloud service providers. Physical access to hardware is not as prevalent for IT management as it once was and so “data centers” can be shoved into closets, located on multiple floors, or even in geographically diverse settings. So while this quasi-cloud may seem disparate on a map, in use it all appears to be one big center.
4. The Rise of the Private Cloud
These simplified systems and requirements mean that many organization that once may have looked to outsource cloud data services are now going in-house because it’s cheaper and easier than it’s ever been before. Off-the-shelf private cloud options are becoming available and seeing near plug-and-play use by many CIOs. Outsourcing still has many advantages, of course, and allows IT staff to focus on innovation in customer service rather than on internal needs.
5. Business Analytics Infiltrating Non-Management
Traditionally, business analytics for a business are conducted by upper-level management and staff. Today, the trend is to spread the possibilities by opening up those analysis tools and data sets (or at least relevant ones) to department sub-heads, regional managers, and even localized, on-site personnel. This is especially true in retail and telecommunications, where access to information for individual clients or small groups of them can make or break a deal being made. For sales forces, customer loyalty experts, and more, having the ability to analyze data previously inaccessible without email requests and long delays is a boon to real-time business needs.
6. Big Data No Longer Just the Big Boys Problem
Until recently, the problem of Big Data was a concern only of very large enterprises and corporations, usually of the multi-national, multi-billion variety. Today, this is filtering down and more and more smaller companies are seeing Big Data looming. In addition, Big Data is only one type of storage, with real-time, analytic, and other forms of data also taking center stage. Even relatively small enterprises are facing data needs as volumes of information grow near-exponentially.
7. Mixed Workloads
Given the varieties of data, workloads are becoming more mixed as well. Some services or departments may need real-time data while others may want deeper, big data analysis, while still others need to be able to pull reports from multi-structured data sets. Today’s platforms are supporting a wider variety of data with the same services often handling online e-commerce, financials, and customer interactions. High-performance systems made to scale and intelligently alter to fit the needs at hand are very in-demand.
8. Simplifying Management With Analytics
Many enterprises are finding that management overhead is cut dramatically when smart use of data analytics are employed. What was once an extremely expensive data outlay for storage, access, security, and maintenance is now becoming an simpler, lower-cost system because the proper use of analysis to watch data use trends means more intelligent purchase and deployment decisions.
9. Flash and DRAM Going Mainstream
More and more servers and services are touting their instant-access Flash and DRAM storage sizes rather than hard drive access times. The increased use of instant-access memory systems means less bottleneck in the I/O operations. As these fast-memory options drop in cost, their deployments will continue to increase, perhaps replacing traditional long-term storage methods in many services.
10. Data Warehousing Must Be Highly Available
Data warehousing workloads are becoming heavier and demands for faster access more prevalent. The storage of the increasing volumes of data must be both fast and highly available as data becomes mission-critical. Downtime must be close to zero and solutions must be scalable.
There is no doubt, the Data Warehouse 2.0, with its non-centralized storage, high availability, private cloud, and real-time access is quickly becoming the de facto standard for today’s data transactions. Accepting these trends sooner rather than later will help you provide an adequate infrastructure for storing, accessing, and analyzing your data in the efficient and cost-effective ways that are also consistent with the global industry trend.
About The Author: Michael Dorf is a professional software architect, web developer, and instructor with a dozen years of industry experience. He teaches Java and J2EE classes at LearnComputer.com, a San Francisco based open source training school. Michael holds a M.S. degree in Software Engineering from San Jose State University and regularly blogs about Hadoop, Java, Android, PHP, and other cutting edge technologies on his blog at http://www.learncomputer.com/blog/.