August 2023

Use AI Via An End-To-End Data Lakehouse To Increase Data Lifecycle Efficiency From Ingestion To Prediction

Enable Employees To Generate And Execute On Valuable Insights With The Right Data Platform

Enterprise organizations are on a transformational journey to improve time to value in their data engineering, analytics, and machine learning processes. An increase in the amount of structured and unstructured data, and the number of tools used to derive value from that has reduced productivity and profitability. Data decision-makers and practitioners struggle to perform their core job functions while operating in environments that do not combine data management, analytics, data science functions and extract, transform, load (ETL).

To increase productivity and profitability, data scientists need access to a seamless data experience that combines management, analytics, and data science functions to promote interoperability, automation, security, and governance. These innovations will ease deployment so that data scientists can deliver end-to-end results quickly with less resources. Data decision-makers will need to be aware of where they are in their data journey and what they can achieve by enabling their increasingly distributed data science teams with a centralized data platform experience.

In May 2023, Cloudera, Intel and HPE commissioned Forrester Consulting to evaluate how organizations are choosing the technologies that support the storage, management, and analysis of their proprietary data. Forrester conducted an online survey with 840 respondents with data practitioners and decision-makers in the United States, United Kingdom, and Australia and New Zealand to explore this topic. We found that while most organizations have begun to modernize their data environment, they must prioritize the hybridization of their teams and the centralization of data lifecycle steps to reap benefits in productivity and insight generation.

Project Team: Madeline Harrell, Market Impact Consultant; Kate Pesa, Associate Market Impact Consultant

Contributing Research: Forrester’s technology architecture and delivery research group

Key Findings

  • icon
    Data science teams — and data — are becoming more distributed across organizations.

    Data science teams are becoming more decentralized, frequently by adopting hybrid models of reporting into the business. At the same time, data decision-makers are adopting point solutions that meet immediate needs instead of making larger, strategic purchases that centralize their data and data management functions.

  • icon
    Environments with too many tools are costing employees valuable productivity time in their work day.

    Employees are not frustrated with the performance of individual solutions, but rather the large number of tools, challenges in activating machine learning models, and the lack of effective integration.

  • icon
    End-to-end data lakehouses reduce the complexity of the data environment, and improve the employee experience (EX).

    Adopting an end-to-end lakehouse that consolidates data tool functionality eases the stress of managing the full data lifecycle. It also provides data ownership clarity, ultimately saving valuable company revenue.

Today’s changing work environments are forcing organizations to evolve how they manage their data; this is thanks to an increase in distributed data across public cloud, private cloud, and hybrid cloud environments, constantly evolving compliance and governance standards, and new threats to data security. The ability to perform the steps of the data lifecycle within a single platform will become increasingly critical to the success of enterprises that want to provide excellent customer experience (CX). The consolidation of data tools into a single platform — or at least having less than eight tools — can reduce the time needed to perform core data functions, improve time to value, and also time to customer satisfaction. This demands streamlined integrations, reduced complex customization, and increased automation. Unfortunately, most respondents are still in an intermediate state of maturity, using eight or more tools to complete each step of the data lifecycle and losing hours of productive time in each workday. As data science teams become more hybridized across their organization, they will need a more access to more centralized functionality from their data tools. We found that:

  • Data science teams are more spread out across the organization.

    Organizations are moving towards a hybrid model for their data science teams; this means the data science team is managed centrally, but members are assigned to work with specific business units rather than report into one data science team or into individual lines of business. Fifty-five percent of respondents said they report into a hybrid model, with another 17% reporting into a strictly decentralized model. With this move towards hybridized team structures, any move to streamline or consolidate their data-related functions will be a boon for productivity.

  • Data science teams have plenty of tools, but no long-term strategy behind them.

    Respondents are fairly satisfied with the functionality and interoperability of their current data tools, but they seem to be settling for overlapping functionality and needless context-switching between tools or applications to manage each step in the data lifecycle. Respondents shared that each step in their organization’s data lifecycle takes at least eight tools each, with publishing into the business taking at least 10 tools. (see Figure 1. Although 97% of respondents said their tools integrate on some level with tools from other vendors, basic levels of integration are not going to cut it as teams — and data — become more decentralized. A hybridized team needs centralized data to reduce silos and increase productivity.

Figure 1

Number Of Tools Used For Each Step In The Data Lifecycle

Ingesting/streaming Data preparation/engineering Analyzing Predicting for machine learning Publishing into the business

Click to see data


Base: 840 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 236 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 188 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 285 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 131 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023

  • Data decision-makers are purchasing data tools with current needs in mind, rather than future needs.

    Not all data warehouses are created equal; while 68% of respondents currently have a data lakehouse and 83% have a data warehouse, the steps of their data lifecycle are not necessarily consolidated into the same lakehouse. When each step takes more than eight tools to complete, employees are spending valuable time switching between tools. Yet, 59% of respondents indicate that they select tools based on immediate needs. To effectively enable employees, data decision-makers will need to adopt a more holistic approach to their data lifecycle. That could mean zooming out to see the larger strategic picture and purchasing larger solutions that can both integrate their products on a deeper level and also enable employees to do more difficult data tasks, like creating machine learning models that can provide value to other departments and the business overall (see Figure 2).

icon

Data scientists are using eight or more tools per step in the data lifecycle.

POLL

“Does your data platform perform any of the following functions?”

POLL

“Does your data platform perform any of the following functions?”
Data preparation/engineering Predicting for machine learning Ingesting/streaming Analyzing Publishing into the business

Base: 840 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023

While data decision-makers continue to meet their immediate needs by packing their technology stack with new point solutions, employees are losing valuable time in their day to context switching between tools to perform each step in their data lifecycle. Employees are mostly satisfied with the interoperability and functionality of these tools, but they are not maximizing their productivity, especially when it comes to complex tasks like programming machine learning models to enable the rest of the business. On average, prediction for machine learning takes at least nine tools alone. Meanwhile, data decision-makers struggle to report the value of data to the business via their current data environment structure. Their challenges with the availability and quality of their data across different but somewhat solutions are leading to larger problems with machine learning, in driving revenue, and in reporting the overall value of data within the business. Respondents currently face:

  • Technical challenges.

    Namely, machine learning enablement and those related to insights. The most common technical challenges were related to enabling other teams with machine learning models (58%), difficulty drawing useful insights from their data (58%), difficulty integrating multiple products (58%), difficulty putting machine learning models into production (58%), and a lack of confidence in data security (57%). Using data to draw insights and build machine learning models to enable the business are core functions of data science workers, so their technical challenges with their current data environment are direct challenges to their value as employees. The difficulty in integrating multiple products is likely due to the number and manners of adoption of their data products, which also increases technical debt. As long as leadership continues to prioritize purchasing point solutions to meet immediate needs, their stack will keep growing — employees will continue with context switching between tools and applications, and they will have to hunt for the data they need to build the machine learning models they’re already struggling with.

  • Organizational challenges.

    More than half of respondents (51%) struggle with activating their data and understanding data ownership across their organizations. This can especially challenge hybrid or decentralized teams. Additionally, 60% of respondents also noted the lack of availability of all data to employees across the business. Naturally, this challenge hinders productivity and data literacy across the business. Where data practitioners could potentially enable cross-team collaboration, their data issues keep them boxed in. Unsurprisingly, this translates to issues with communicating the value of data back to decision-makers (51%). Furthermore, their habit of adopting point solutions to meet immediate needs has created a data environment with too many data solution providers or vendors (51%) Figure 3). Without the ability to demonstrate and communicate the value of data back to the business and decision-makers, data workers are stuck with their current data environment, unable to push for a larger, more strategic investment.

Figure 3
figure
  • Business consequences.

    When employees cannot easily access the data necessary for them to perform their core job functions or enable teams across the business, this has larger implications for the business. Their data availability or ownership issues, and a large number of tools with varying levels of integration creates difficulty in meeting compliance and governance standards (63%), and increases vulnerability to cyberthreats (61%). These challenges also naturally lead to revenue implications: Respondents noted increased time to value (56%) and increased technical debt (54%) eating into their profits (see Figure 4).

Figure 4
figure
  • Challenges with switching to the new solution.

    Investing in a platform they know can address these challenges is oftentimes easier said than done. An end-to-end lakehouse can address many of these challenges and enable employees to better support the business, but several things stand in their way of adopting a larger tool that can meet these needs. Beyond struggling with integration, 56% of respondents also have difficulty integrating with their cloud or on-prem infrastructure. Fifty-five percent of them are still adopting solutions that meet their immediate needs, and not strategic ones. Even if they found an appropriate platform to address their issues, they are dealing with the sunk cost of their existing vendor relationships (54%). Fearing an inability to integrate with existing infrastructure that they have already paid for with their current vendors, data decision-makers may feel like they are being limited.

75%

of respondents have acknowledged that they can save more than 4 hours each day, if the stages in the data lifecycle are integrated into a single platform.

Adopting more point solutions to meet immediate business needs can only subsist for so long. As data practitioners are bogged down from switching between platforms to perform individual tasks, they will continue to be less productive over time. Consolidating the data lifecycle into a single end-to- end lakehouse — especially when it comes to machine-learning tasks — will increase productivity, reduce the time spent on context switching, and increase employee satisfaction. In fact, 94% of respondents say having an end-to-end lakehouse will positively impact their company’s success, as:

  • Buyers know what to look for in a holistic data platform, especially for machine learning.

    At a certain point, the need to keep data secure and usable should outweigh their vendor lock-in and the need to justify sunk cost on their current technology stack. When they do select a new data platform, data decision-makers expect a shared data experience or data fabric (48%), best-in-class integration with their existing infrastructure (47%), the ability to keep metadata synced (46%), end-to-end machine learning (46%), and improved security (44%). These desired capabilities and functions reflect their top challenges faced with their current data structure, such as issues with machine learning models, integrations with existing technology, and keeping their data secure.

    • Machine learning enablement is a key priority.

      When it comes to selecting a new data storage or management tool for machine learning, data decision-makers have several key capabilities in mind (see Figure 5). They are prioritizing the tool’s usability (63%) before its ability to customize (58%) and its interoperability or ease of integration (57%). Decision-makers want to enable their practitioners to build their machine learning models immediately.

  • Organizational benefits.

    An end-to-end lakehouse solution does more than just provide additional capabilities: It has the ability to level-up data quality to improve collaboration across teams (57%), streamline the ability to communicate machine learning results to the business (58%), enhance the ability to communicate the value of data back to decision- makers (56%), and clear data ownership across the organization (54%). This addresses data activation issues and makes machine learning work for the business. Respondents also indicated that if all the stages of their data lifecycle were integrated into a single platform, nearly half of respondents (49%) could save at least 5 hours in a single workday, and 9% of them expected to save 7 or more hours (see Figure 6).

Figure 5

Please indicate if the following statement is true or false about your company: "We are actively updating our inventory of applications and processes to meet evolving business, user, and customer needs."

Usability of the solution (UX) Solutions that meet immediate needs Ability to customize/flexibility Interoperability/ease of integration Analytical capabilities/insights Number of useful features Security Cost

Click to see data


Base: 840 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 236 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 188 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 285 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023
Base: 131 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023

POLL

“How much time in a single workday do you think you would save if all the stages of your data lifecycle were integrated into a single platform?”

POLL

How impactful to your productivity is having to switch between applications?
All stages of our data lifecycle are already integrated into a single platform More than 8 hours 8 hours 7 hours 6 hours 5 hours 4 hours 3 hours 2 hours 0.5 hours to 1 hour Less than 0.5 hours

Base: 840 global practitioners and decision-makers in development and data science
Source: A commissioned study conducted by Forrester Consulting on behalf of Cloudera, May 2023

  • Positive business impacts.

    The business will experience positive benefits after investing in an end-to-end lakehouse, including scaled data management and insight practices alongside business growth (53%), improved attribution of success to data efforts (50%), enhanced predictive analytics capabilities via machine learning (49%), improved collaboration across the organization (48%), and unified security across applications or data sources (48%) (see Figure 7). An end-to-end lakehouse improves employee productivity, collaboration across the organization, and the ability to attribute credit for success where it is due, all while keeping their data safe — these capabilities are lacking in most traditional lakehouses. While data decision-makers are currently focused on meeting the immediate needs of the business, sometimes a strategic approach can meet both current and future needs.

Figure 7
figure
94%

of respondents say having an end-to-end lakehouse will positively impact their company’s success.

NEXT SECTION: Key Recommendations

Our research has uncovered the benefits that organizations can gain adopting an end-to-end data lakehouse, including reduced efforts in data integration, accelerated time to value for advanced analytics, and improved governance and productivity through enhanced collaborations. However, it is critical for IT decision-makers to consider how can they justify investing in another data product when their technology stack already contains numerous tools.

Appendix A: Methodology

In this study, Forrester conducted an online survey of 840 practitioners and decision-makers at organizations in the United States, United Kingdom, Australia, and New Zealand. Survey participants included decision-makers in development and data science roles. Questions provided to the participants asked about data storage and management solutions at their organization. Respondents were offered a small incentive as a thank you for time spent on the survey. The study began in April 2023 and was completed in May 2023.


Appendix B: Demographics

Country
Australia 34%
United States 28%
United Kingdom 22%
New Zealand 16%
Company Size
500 to 999 employees 22%
1,000 to 4,999 employees 56%
5,000 to 19,999 employees 19%
20,000 or more employees 2%
Top 5 Industries
Technology and/or technology services 14%
Telecommunications services 13%
Financial services 12%
Government 12%
Retail 12%
Respondent Level
C-level executive 24%
Vice president 28%
Director 23%
Manager 25%

Note: Percentages may not total 100 due to rounding.

Cookie Preferences

Accept Cookies

A cookie is a small text file that a website saves on your computer or mobile device when you visit the site. It enables the website to remember your actions (data inputs, website navigation), so you don’t have to re-enter data when you come back to the site or browse from one page to another.

Behavioral information collected by our web analytics vendor is used to analyze data pertaining to visitor trends, plan website enhancements, and measure overall website effectiveness. We may also use cookies or web beacons to help us offer you products, programs, or services that may be of interest to you and to deliver relevant advertising. We may use third-party advertising companies to help tailor website content to users or to serve ads on our behalf. These companies may also employ cookies and web beacons to measure advertising effectiveness.

Please accept cookies and the collection of behavioral information to receive full functionality and enhance your experience. If you decline cookies, some features of the website may not function normally.

Please see our Privacy Policy for more information.