This is the second blog in a series that explains how organizations can prevent their Data Lake from becoming a Data Swamp, with insights and strategy from Perficient’s Senior Data Strategist and Solutions Architect, Dr. Chuck Brooks.

 

In the first article in this series, I explained the five components necessary to prevent a Data Lake from Becoming a Data Swamp. The five capabilities are:

Create a Data Catalog
Create a Data Governance organization
Implement data quality analysis and reporting
Implement category-based security in the Data Lake
Have multiple data zones inside the Data Lake

In this article, we will discuss the Data Catalog.

 

The Data Catalog and Metadata Management

A Data Catalog is a collection of metadata, combined with data management and search tools, that helps corporate knowledge workers find the data that they need. The Data Catalog serves as an inventory of available data and provides information to evaluate the usefulness and quality of data to answer business questions and make better business decisions.

 

Data Catalogs have become the standard for metadata management in the age of big data and self-service business intelligence. The metadata knowledge workers need to understand and use data today continues to become more expansive than in the past.  A successful Data Lake transformation and adoption is dependent on the ability of knowledge workers to find, access, and use (reuse) data in the Data Lake. Ensuring success with enterprise data requires the formal integration of multiple lines of business, technology, and processes through data management and governance to create a comprehensive data catalog. A data catalog organizes the technical details around data assets, or metadata, into defined, meaningful, and searchable business assets that enable consistent understanding among all data knowledge workers. A data catalog is essential to knowledge workers because it combines and organizes details about data assets in the data lake by presenting them in an easy-to-understand format. The data catalog provides clarity into data definitions, synonyms, and essential business attributes so all knowledge workers understand and can leverage data as an asset. When knowledge workers have important data questions, they can turn to the data catalog, which identifies data owners, stewards, and subject matter experts, enabling easy collaboration between different organizational business units. The data catalog will keep your Data Lake from becoming a Data Swamp by providing:

Improved productivity and reduced time spent by teams searching for relevant information or data
Increased visibility on key datasets that exist in the data lake
Avoid double purchases of similar datasets by different teams
Lineage to give knowledge workers a clear view of the flow and dependencies of data through the organization and business processes.
Improved collaboration between knowledge workers
Faster processes to access and interpret the data
Facilitated compliance with growing international privacy and reporting regulations
Common KPIs and Data Definitions make data comparable and understandable
Facilitated data relevancy and usage tracking

 

Google’s Data Catalog (now part of Dataplex) and Perficient’s Frameworks

 

Google’s Data Catalog and Perficient’s Meta Data Manager

The Google Data Catalog (now part of Dataplex)  helps knowledge workers understand data assets in Google Cloud and beyond. Integrations with BigQuery, Pub/Sub, Cloud Storage, and many connectors provide a unified view and tagging mechanism for technical and business metadata. Google Data Catalog empowers all knowledge workers in the organization to find or tag data with a powerful UI, built with the same search technology as Gmail, or via API access.

Perficient’s Metadata Manager is a framework that enhances the Google Data Catalog and offers a UI that makes metadata tagging and searching easier for knowledge workers and data stewards. Perficient Metadata Manager also provides data quality analysis and reporting capabilities.

 

 

 

Perficient’s Cloud Data Expertise

The world’s leading brands choose to partner with us because we are

large enough to scale major cloud projects, yet nimble enough to provide focused expertise in specific areas of your business. Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We’ll assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.

Download the guide, becoming a Data-Driven Organization With Google Cloud Platform, to learn more about Dr. Chuck’s GCP data strategy.