Skip to content

Releases: open-metadata/OpenMetadata

1.5.1-release

28 Aug 14:05
Compare
Choose a tag to compare

Backward Incompatible Changes

Multi Owners

OpenMetadata allows a single user or a team to be tagged as owners for any data assets. In Release 1.5.0, we allow users to tag multiple individual owners or a single team. This will allow organizations to add ownership to multiple individuals without necessarily needing to create a team around them like previously.

This is a backward incompatible change, if you are using APIs, please make sure the owner field is now changed to “owners”

Import/Export Format

To support the multi-owner format, we have now changed how we export and import the CSV file in glossary, services, database, schema, table, etc. The new format will be
user:userName;team:TeamName

If you are importing an older file, please make sure to make this change.

Pydantic V2

The core of OpenMetadata are the JSON Schemas that define the metadata standard. These schemas are automatically translated into Java, Typescript, and Python code with Pydantic classes.

In this release, we have migrated the codebase from Pydantic V1 to Pydantic V2.

Deployment Related Changes (OSS only)

./bootstrap/bootstrap_storage.sh removed

OpenMetadata community has built rolling upgrades to database schema and the data to make upgrades easier. This tool is now called as ./bootstrap/openmetadata-ops.sh and has been part of our releases since 1.3. The bootstrap_storage.sh doesn’t support new native schemas in OpenMetadata. Hence, we have deleted this tool from this release.

While upgrading, please refer to our Upgrade Notes in the documentation. Always follow the best practices provided there.

Database Connection Pooling

OpenMetadata uses Jdbi to handle database-related operations such as read/write/delete. In this release, we introduced additional configs to help with connection pooling, allowing the efficient use of a database with low resources.

Please update the defaults if your cluster is running at a large scale to scale up the connections efficiently.

For the new configuration, please refer to the doc here

Data Insights

The Data Insights application is meant to give you a quick glance at your data's state and allow you to take action based on the information you receive. To continue pursuing this objective, the application was completely refactored to allow customizability.

Part of this refactor was making Data Insights an internal application, no longer relying on an external pipeline. This means triggering Data Insights from the Python SDK will no longer be possible.

With this change you will need to run a backfill on the Data Insights for the last couple of days since the Data Assets data changed.

UI Changes

New Explore Page

Explore page displays hierarchically organized data assets by grouping them into services > database > schema > tables/stored procedures. This helps users organically find the data asset they are looking for based on a known database or schema they were using. This is a new feature and changes the way the Explore page was built in previous releases.

Connector Schema Changes

In the latest release, several updates and enhancements have been made to the JSON schema across various connectors. These changes aim to improve security, configurability, and expand integration capabilities. Here's a detailed breakdown of the updates:

  • KafkaConnect: Added schemaRegistryTopicSuffixName to enhance topic configuration flexibility for schema registries.
  • GCS Datalake: Introduced bucketNames field, allowing users to specify targeted storage buckets within the Google Cloud Storage environment.
  • OpenLineage: Added saslConfig to enhance security by enabling SASL (Simple Authentication and Security Layer) configuration.
  • Salesforce: Added sslConfig to strengthen the security layer for Salesforce connections by supporting SSL.
  • DeltaLake: Updated schema by moving metastoreConnection to a newly created metastoreConfig.json file. Additionally, introduced configSource to better define source configurations, with new support for metastoreConfig.json and storageConfig.json.
  • Iceberg RestCatalog: Removed clientId and clientSecret as mandatory fields, making the schema more flexible for different authentication methods.
  • DBT Cloud Pipelines: Added as a new connector to support cloud-native data transformation workflows using DBT.
  • Looker: Expanded support to include connections using GitLab integration, offering more flexible and secure version control.
  • Tableau: Enhanced support by adding capabilities for connecting with TableauPublishedDatasource and TableauEmbeddedDatasource, providing more granular control over data visualization and reporting.

Include DDL

During the Database Metadata ingestion, we can optionally pick up the DDL for both tables and views. During the metadata ingestion, we use the view DDLs to generate the View Lineage.

To reduce the processing time for out-of-the-box workflows, we are disabling the include DDL by default, whereas before, it was enabled, which potentially led to long-running workflows.

Secrets Manager

Starting with the release 1.5.0, the JWT Token for the bots will be sent to the Secrets Manager if you configured one. It won't appear anymore in your dag_generated_configs in Airflow.

Python SDK

The metadata insight command has been removed. Since Data Insights application was moved to be an internal system application instead of relying on external pipelines the SDK command to run the pipeline was removed.

What's New

Data Observability with Anomaly Detection (Collate)

OpenMetadata has been driving innovation in Data Quality in Open Source. Many organizations are taking advantage of the following Data Quality features to achieve better-quality data

  1. A Native Profiler to understand the shape of the data, freshness, completeness, volume, and ability to add your own metrics, including column level profiler over time-series and dashboards
  2. No-code data quality tests, deploy, collect results back to see it in a dashboard all within OpenMetadata
  3. Create alerts and get notified of Test results through email, Slack, NSteams, GChat, and Webhook
  4. Incident Manager to collaborate around test failures and visibility to downstream consumers of failures from upstream

In 1.5.0, we are bringing in Anomaly Detection based on AI to predict when an anomaly happens based on our learning historical data and automatically sending notifications to the owners of the table to warn them of the impending incidents

Enhanced Data Quality Dashboard (Collate)

We also have improved the Table Data quality dashboard to showcase the tests categorized and make it easy for everyone to consume. When there are issues, the new dashboard makes it easier to understand the Data Quality coverage of your tables and the possible impact each test failure has by organizing tests into different groups.

Freshness Data Quality Tests (Collate)

Working with old data can lead to making wrong decisions. With the new Freshness test, you can validate that your data arrives at the right time. Freshness tests are a critical part of any data team's toolset. Bringing these tests together with lineage information and the Incident Manager, your team will be able to quickly detect issues related to missing data or stuck pipelines.

Data Diff Data Quality Tests

Data quality checks are important not only within a single table but also between different tables. These data diff checks can ensure key data remains unchanged after transformation, or conversely, ensure that the transformations were actually performed.

We are introducing the table difference data quality test to validate that multiple appearances of the same information remain consistent. Note that the test allows you to specify which column to use as a key and which columns you want to compare, and even add filters in the data to give you more control over multiple use cases.

Domains RBAC & Subdomains

OpenMetadata introduced Domains & Data Products in 1.3.0. Since then, many large organizations have started using Domains & Data Products to achieve better ownership and collaboration around domains that can span multiple teams.

In the 1.5.0 release, we added support for subdomains. This will help teams to organize into multiple subdomains within each domain.

RBAC for Domains

With the 1.5.0 release, we are adding more stricter controls around Domain. Now, teams, data assets, glossaries, and classification can have domain concepts and can get a policy such that only users within a domain can access the data within a domain. Domain owners can use Data Products to publish data products and showcase publicly available data assets from a specific domain.

This will help large companies to use a single OpenMetadata platform to unify all of their data and teams but also provide more stringent controls to segment the data between domains

Improved Explore Page & Data Asset Widget

OpenMetadata, with its simple UI/UX and data collaboration features, is becoming more attractive to non-technical users as well. Data Governance teams are using OpenMetadata to add glossary terms and policies around metadata. Teams using Collate SaaS product are taking advantage of our Automations feature to gain productivity in their governance tasks.

Our new improved navigation on the Explore page will help users navigate hierarchically and find the data they are looking for. Users will see the data assets now grouped by service name -> database -> schema -> tables/stored procedures.

We are also making the discovery of data more accessible for users introducing a data asset widget, which will gro...

Read more

OpenMetadata 1.5.0 Release

26 Aug 05:19
Compare
Choose a tag to compare
Pre-release

Please wait for 1.5.1

We've found some issues that would degrade the user experience of 1.5.0. Please use 1.5.1 instead. Thanks

Backward Incompatible Changes

Multi Owners

OpenMetadata allows a single user or a team to be tagged as owners for any data assets. In Release 1.5.0, we allow users to tag multiple individual owners or a single team. This will allow organizations to add ownership to multiple individuals without necessarily needing to create a team around them like previously.

This is a backward incompatible change, if you are using APIs, please make sure the owner field is now changed to “owners”

Import/Export Format

To support the multi-owner format, we have now changed how we export and import the CSV file in glossary, services, database, schema, table, etc. The new format will be
user:userName;team:TeamName

If you are importing an older file, please make sure to make this change.

Pydantic V2

The core of OpenMetadata are the JSON Schemas that define the metadata standard. These schemas are automatically translated into Java, Typescript, and Python code with Pydantic classes.

In this release, we have migrated the codebase from Pydantic V1 to Pydantic V2.

Deployment Related Changes (OSS only)

./bootstrap/bootstrap_storage.sh removed

OpenMetadata community has built rolling upgrades to database schema and the data to make upgrades easier. This tool is now called as ./bootstrap/openmetadata-ops.sh and has been part of our releases since 1.3. The bootstrap_storage.sh doesn’t support new native schemas in OpenMetadata. Hence, we have deleted this tool from this release.

While upgrading, please refer to our Upgrade Notes in the documentation. Always follow the best practices provided there.

Database Connection Pooling

OpenMetadata uses Jdbi to handle database-related operations such as read/write/delete. In this release, we introduced additional configs to help with connection pooling, allowing the efficient use of a database with low resources.

Please update the defaults if your cluster is running at a large scale to scale up the connections efficiently.

For the new configuration, please refer to the doc here

Data Insights

The Data Insights application is meant to give you a quick glance at your data's state and allow you to take action based on the information you receive. To continue pursuing this objective, the application was completely refactored to allow customizability.

Part of this refactor was making Data Insights an internal application, no longer relying on an external pipeline. This means triggering Data Insights from the Python SDK will no longer be possible.

With this change you will need to run a backfill on the Data Insights for the last couple of days since the Data Assets data changed.

UI Changes

New Explore Page

Explore page displays hierarchically organized data assets by grouping them into services > database > schema > tables/stored procedures. This helps users organically find the data asset they are looking for based on a known database or schema they were using. This is a new feature and changes the way the Explore page was built in previous releases.

Connector Schema Changes

In the latest release, several updates and enhancements have been made to the JSON schema across various connectors. These changes aim to improve security, configurability, and expand integration capabilities. Here's a detailed breakdown of the updates:

  • KafkaConnect: Added schemaRegistryTopicSuffixName to enhance topic configuration flexibility for schema registries.
  • GCS Datalake: Introduced bucketNames field, allowing users to specify targeted storage buckets within the Google Cloud Storage environment.
  • OpenLineage: Added saslConfig to enhance security by enabling SASL (Simple Authentication and Security Layer) configuration.
  • Salesforce: Added sslConfig to strengthen the security layer for Salesforce connections by supporting SSL.
  • DeltaLake: Updated schema by moving metastoreConnection to a newly created metastoreConfig.json file. Additionally, introduced configSource to better define source configurations, with new support for metastoreConfig.json and storageConfig.json.
  • Iceberg RestCatalog: Removed clientId and clientSecret as mandatory fields, making the schema more flexible for different authentication methods.
  • DBT Cloud Pipelines: Added as a new connector to support cloud-native data transformation workflows using DBT.
  • Looker: Expanded support to include connections using GitLab integration, offering more flexible and secure version control.
  • Tableau: Enhanced support by adding capabilities for connecting with TableauPublishedDatasource and TableauEmbeddedDatasource, providing more granular control over data visualization and reporting.

Include DDL

During the Database Metadata ingestion, we can optionally pick up the DDL for both tables and views. During the metadata ingestion, we use the view DDLs to generate the View Lineage.

To reduce the processing time for out-of-the-box workflows, we are disabling the include DDL by default, whereas before, it was enabled, which potentially led to long-running workflows.

Secrets Manager

Starting with the release 1.5.0, the JWT Token for the bots will be sent to the Secrets Manager if you configured one. It won't appear anymore in your dag_generated_configs in Airflow.

Python SDK

The metadata insight command has been removed. Since Data Insights application was moved to be an internal system application instead of relying on external pipelines the SDK command to run the pipeline was removed.

What's New

Data Observability with Anomaly Detection (Collate)

OpenMetadata has been driving innovation in Data Quality in Open Source. Many organizations are taking advantage of the following Data Quality features to achieve better-quality data

  1. A Native Profiler to understand the shape of the data, freshness, completeness, volume, and ability to add your own metrics, including column level profiler over time-series and dashboards
  2. No-code data quality tests, deploy, collect results back to see it in a dashboard all within OpenMetadata
  3. Create alerts and get notified of Test results through email, Slack, NSteams, GChat, and Webhook
  4. Incident Manager to collaborate around test failures and visibility to downstream consumers of failures from upstream

In 1.5.0, we are bringing in Anomaly Detection based on AI to predict when an anomaly happens based on our learning historical data and automatically sending notifications to the owners of the table to warn them of the impending incidents

Enhanced Data Quality Dashboard (Collate)

We also have improved the Table Data quality dashboard to showcase the tests categorized and make it easy for everyone to consume. When there are issues, the new dashboard makes it easier to understand the Data Quality coverage of your tables and the possible impact each test failure has by organizing tests into different groups.

Freshness Data Quality Tests (Collate)

Working with old data can lead to making wrong decisions. With the new Freshness test, you can validate that your data arrives at the right time. Freshness tests are a critical part of any data team's toolset. Bringing these tests together with lineage information and the Incident Manager, your team will be able to quickly detect issues related to missing data or stuck pipelines.

Data Diff Data Quality Tests

Data quality checks are important not only within a single table but also between different tables. These data diff checks can ensure key data remains unchanged after transformation, or conversely, ensure that the transformations were actually performed.

We are introducing the table difference data quality test to validate that multiple appearances of the same information remain consistent. Note that the test allows you to specify which column to use as a key and which columns you want to compare, and even add filters in the data to give you more control over multiple use cases.

Domains RBAC & Subdomains

OpenMetadata introduced Domains & Data Products in 1.3.0. Since then, many large organizations have started using Domains & Data Products to achieve better ownership and collaboration around domains that can span multiple teams.

In the 1.5.0 release, we added support for subdomains. This will help teams to organize into multiple subdomains within each domain.

RBAC for Domains

With the 1.5.0 release, we are adding more stricter controls around Domain. Now, teams, data assets, glossaries, and classification can have domain concepts and can get a policy such that only users within a domain can access the data within a domain. Domain owners can use Data Products to publish data products and showcase publicly available data assets from a specific domain.

This will help large companies to use a single OpenMetadata platform to unify all of their data and teams but also provide more stringent controls to segment the data between domains

Improved Explore Page & Data Asset Widget

OpenMetadata, with its simple UI/UX and data collaboration features, is becoming more attractive to non-technical users as well. Data Governance teams are using OpenMetadata to add glossary terms and policies around metadata. Teams using Collate SaaS product are taking advantage of our Automations feature to gain productivity in their governance tasks.

Our new improved navigation on the Explore page will help users navigate hierarchically and find the data they are looking for. Users will see the data assets now grouped by `service name -> database -> schema -> tables/st...

Read more

1.5.0-rc2-release

22 Aug 13:33
Compare
Choose a tag to compare
1.5.0-rc2-release Pre-release
Pre-release

What's Changed

Read more

OpenMetadata 1.4.8 Release

21 Aug 06:54
Compare
Choose a tag to compare

What's Changed

  • MINOR: Make Include ddl disabled by default 1.4.7 by @ulixius9 in #17453
  • Made DDL configuration consistent with views. Removed Lifecycle from … by @IceS2 in #17474
  • Fix user profile task listing.
  • Fix import/export UI flow ${CollateIconWithLinkMD}.
  • Improve SAML logging backend.
  • Add Unity Catalog Lineage Dialect.
  • Clean idle connections during ingestion.
  • Fix Databricks Describe Table during metadata ingestion.
  • Glossary Upload now shows permissions errors for non-owners.
  • Fix task not showing in the right panel after clicking, in the Activity Feed.
  • Support multiple dbt run_results.json for a single manifest for improved performance.

Full Changelog: 1.4.7-release...1.4.8-release

OpenMetadata 1.4.7 Release

07 Aug 09:09
Compare
Choose a tag to compare

What's Changed

  • Resolved issue with Azure login related to email principal claims.

Full Changelog: 1.4.6-release...1.4.7-release

OpenMetadata 1.5.0-rc1 Release

05 Aug 15:09
9819d0f
Compare
Choose a tag to compare
Pre-release

What's Changed

Read more

OpenMetadata 1.4.6 Release

02 Aug 13:28
Compare
Choose a tag to compare

What's Changed

  • fix lineage PATCH API for ingestion
  • fix Trino Azure config secret masking
  • fix setuptools version due to yanked setuptools release
  • fix MSSQL busy connection error during test connection
  • Fixed test case summary updates
  • Fixed Test Suite indexing
  • Fix repeated alerts being sent after no changes in the Entity
  • Fixed an issue handling users with capital letters
  • Centralize OIDC flow handling
  • Fixed Ingestion Pipeline alert URL

Full Changelog: 1.4.5-release...1.4.6-release

OpenMetadata 1.4.6-rc1 Release

23 Jul 09:22
Compare
Choose a tag to compare
Pre-release

What's Changed

Improvements

  • Fixed test case summary updates
  • Fixed Test Suite indexing
  • Fix repeated alerts being sent after no changes in the Entity
  • Fixed table import
  • Fixed an issue handling users with capital letters
  • Centralize OIDC flow handling
  • Fixed Ingestion Pipeline alert URL

Full Changelog: 1.4.5-release...1.4.6-rc1-release

OpenMetadata 1.4.5 Release

17 Jul 13:22
Compare
Choose a tag to compare

What's Changed

Improvements

  • Improve query filtering with prepared statements.
  • Bug fix in regex to match test case when using sampled data.
  • Bug fix in global profiler config for Snowflake, Redshift, and BigQuery.
  • Bug fix for Arg mismatch for DataModels in QlikSense.

Full Changelog: 1.4.4-release...1.4.5-release

OpenMetadata 1.4.5-RC1 Release

11 Jul 06:07
Compare
Choose a tag to compare
Pre-release

What's Changed

Improvements

  • Improve query filtering with prepared statements.
  • Big fix in regex to match test case when using sampled data.
  • Bug fix in global profiler config for Snowflake, Redshift, and BigQuery.
  • Bug fix for Arg mismatch for DataModels in QlikSense.