Oracle Cloud Infrastructure Data Catalog – July 2021 Update

A new version of the Oracle Cloud Infrastructure (OCI) Data Catalog is now out! With this release, Oracle expedites data cataloging by introducing automation into data assets discovery and creation for technical metadata harvesting. The latest release also simplifies metadata enrichment via bulk upload features for data providers to readily populate rich technical metadata and business context on their catalogs. Similarly, data consumers can now instantly extract value from the prepared catalogs and locate assets within the enterprise.

What is Oracle Cloud Infrastructure Data Catalog?

OCI Data Catalog is a cloud-native, big data service used to explore, arrange, enrich, and trace an enterprise’s technical/business data assets. For both, data analysts and business analysts, the core value of a data catalog can only be derived by quickly finding useful business data – which is precisely what the OCI Data Catalog is designed for, i.e., facilitate advanced data cataloging.

The latest capabilities on the OCI Data Catalog

  1. Automated Data Source Discovery

OCI Data Catalog extends comprehensive visibility of data in an enterprise by integrating technical and business metadata. Given the volume, variety, and velocity of data and data sources available within the enterprises, manual asset scavenging, and creation can be awfully time-consuming. Besides, there is always a chance to overlook something valuable or make an error in creating data assets. What else other than a machine—in this case, automation—could permanently curb human error in data cataloging and metadata harvesting?

With the new release, OCI Data Catalog now affords systems to automatedly locate data sources accessible in the tenancy. Just select the region/compartments and the rest of the part is auto completed by the system. Enterprises can explore Autonomous Data Warehouse databases (AWD), Autonomous Transaction Processing databases (ATP), Oracle databases, Object Storage buckets, and so forth. The system also fetches the configurations for making the data assets creation and corresponding connections more readily performable. Simply input the remaining information such as the user credentials and start harvesting.

  1. Catalyzing Metadata Enrichment

Existing users of the Oracle Cloud Infrastructure Data Catalog are familiar with ‘custom properties’ that afford them to define their own attributes for unique metadata enrichment requirements. The capability assists users to annotate the system metadata post-harvesting in the OCI Data Catalog.

For instance, users can specify business descriptions, revise frequency, and data proprietor to offer a method for data scientists to add business contexts to technical metadata outside simple tagging/linking glossary items. Upon the introduction of the rich information for varied data sets/fields, it assists with exploration, classification, and in making sense of the data. The data providers also get a systematic method for conveying information in OCI Data Catalog to avoid inviting queries from the data consumers later.

However, populating custom attribute values individually for every object can be a struggle. With the latest upgrade, users can do bulk population in easy-to-use MS Excel format, catalyzing the enrichment process. It also extends a simpler review of the content.

The procedure is seamless. Firstly, harvest the necessary technical metadata and build the custom attributes. Followed by exporting the technical objects as well as the associated custom properties into an Excel sheet. Now, utilize the sheet to include and update the values for concerned properties, and finally import them back into the OCI Data Catalog.

Note: As of now, this feature is only available for data assets built leveraging a relational database, like the Autonomous Database, Oracle Database, MySQL, and Microsoft SQL Server. Users may readily import/export custom property values at the data entity levels and schema.

  1. Hive Metastore (HMS) in OCI Data Catalog

Oracle Cloud Infrastructure Data Flow is a fully controlled Apache Spark service that works with massive data sets. For it to read/write and manage operations on enormous data sets, OCI Data Catalog features a hive-supported, and persistent metastore. With the help of the Data Catalog metastore, OCI Data Flow users can now safely store and access schema definitions for objects inside semi-structured or unstructured data assets, including the Object Storage via the hive metastore interface.

  1. OCI Data Catalog—Oracle Analytics Cloud (OAC) Collaboration

The new release of the OCI Data Catalog also features beta-preview integration with Oracle Analytics. It serves a central repository to manage Oracle Analytics metadata in numerous business intelligence (BI) systems along with different harvested data assets.

Harvesting BI semantic model and report catalog metadata in data catalog is now a matter of a few minutes. OCI Data Catalog collaboration with OAC comes jampacked with features such as:

  • One-click discovery of OAC data for system analysts and data engineers. Analytics authors, on the other hand, can discover analytical objects to figure out data definitions, data usage areas, and the allied data objects.
  • Seamless accessibility of metadata definition across multiple OAC instances for enabling consistent definition and a single version of truth.
  • The option for business definition and glossary curation, and business terms linking to analytical data for further enhancing the analytical self-service experience.
  • User self-nomination for Oracle Analytics Cloud (beta preview) on the OCI Console Beta Preview.

Oracle Cloud Infrastructure Data Catalog: Miscellaneous Features

Apart from the major additions, such as the auto-discovery, enrichment, and OAC integration, the new release of the OCI Data Catalog also features several tweaks and enhancements. This includes:

  • Compatibility for SSL-enabled data sources
  • Option to bookmark a definite object detail page
  • Pre-approved request-based connections to Object Storage buckets, and several others