What is the structure of the big data Atlas?
Big Data Atlas is an open-source tool for data classification and metadata management, used to create and maintain an inventory of data assets. Its architecture mainly consists of the following components:
- Data Collectors: These are used to gather metadata information from various data sources such as databases, file systems, data warehouses, etc. The collectors regularly scan the data sources and send the metadata information to the Atlas central component for processing.
- The Atlas Core component in Atlas: is responsible for receiving and processing metadata information from data collectors, as well as providing storage, retrieval, and management functions for metadata. The Atlas Core component also includes metadata type definition, relationship modeling, and query functions to support user queries and browsing of metadata information.
- Metadata Store: Used to persistently store the collected metadata information. Atlas offers various implementations for metadata storage backends, such as HBase, MySQL, etc., allowing users to choose the storage method that best suits their needs.
- Metadata Search Service: This service allows users to quickly search and query metadata information using keywords, tags, relationships, and other methods. Atlas provides a metadata search service based on Solr to enhance search performance and efficiency.
- Metadata Update Service: a service used to manage changes and updates to metadata information. When metadata information in the data source changes, the data collector informs the Metadata Update Service to ensure the accuracy and consistency of the metadata information.
Overall, the architecture of the big data Atlas adopts a distributed design approach, enabling comprehensive management and monitoring of data assets through collaboration and coordination among various components. Users can easily manage and utilize data assets through various functions and interfaces provided by Atlas, thus improving the efficiency and quality of data governance.