Automatic Semantic Model Generation in AtScale

In today’s data-driven world, making data accessible isn’t just about providing raw datasets; it’s about making it understandable and usable for everyone, not just for those who can write SQL or navigate complex databases. To achieve this, organizations need a business-friendly view of their data, which is where the concept of a semantic layer comes into play. A semantic layer acts as a bridge between the raw data and the end users, translating complex database structures into business terms and metrics that users can easily understand and work with.

However, creating and maintaining a semantic layer can be a complex and resource-intensive task. Traditionally, this has been the responsibility of data stewards—individuals skilled in crafting and maintaining semantic models. They ensure that data is modeled in a way that reflects business concepts, facilitating better decision-making. With the rise of AI and automation, tools like AtScale are revolutionizing this process, making it faster, more consistent, and more accessible to organizations of all sizes.

The Challenge of Manual Modeling

To date, the process of modeling data in a semantic layer platform has been quite manual. Data stewards must manually define dimensions, metrics, and relationships between tables and objects. This process involves a deep understanding of the data structure and requires significant effort to ensure consistency across various data sources and models. For organizations with multiple data stewards, this can lead to discrepancies and inefficiencies, as each steward may have a different approach or understanding of the data.

Automatic Model Generation: A Game Changer

AtScale’s new feature, One-Click Modeling, addresses these challenges by leveraging AI to automate the task of creating semantic models. This innovation drastically reduces the manual effort involved and ensures a more consistent approach across the organization. Here’s how it works:

  1. Single-Click Model Creation: Users can now generate a new semantic model with just a single click. Selecting a table from a data source and choosing the “Generate new model” option from AtScale Design Center automatically creates a new model based on that table. This new model uses existing objects and relationships wherever possible, significantly speeding up the modeling process and leveraging existing investment.

    AtScale Semantic Layer Platform -Generate a New Model

    Screenshot

  2. Faster and More Consistent Modeling: AI algorithms analyze the dataset to identify existing dimensions, measures, and relationships. For example, if a dataset contains columns that match those in existing dimensions (based on column names and data types), the system will automatically establish these relationships. This not only accelerates the model creation process but also enhances consistency across different models and data stewards.
    AtScale Semantic Layer Platform - Design Center

Key Use Cases

New Users: For new AtScale users, the ability to quickly generate a semantic model is invaluable. They can start building queries almost immediately, even if they are unfamiliar with the data’s intricacies and how to use the AtScale Design Center. This is especially useful when they are starting from scratch or working within a repository that already contains conformed dimensions and other semantic objects.

Seasoned Users: For current AtScale users, Automatic Model Generation simplifies the process of integrating new data sources. Users can easily create a base model on a new data source that utilizes the existing standardized definitions. They can then focus their time on modeling elements that are completely new or don’t follow common patterns.

Data Exploration: Sometimes users want to explore new data sources within a BI tool in familiar ways, creating visualizations and doing time series analysis quickly, before they’re familiar with the data itself. This feature can get them analyzing new data in seconds before deciding whether to use it in a shared semantic layer.

AtScale’s Approach to Automatic Model Generation

With AtScale, users only need to select a data source table, and the platform automatically generates a comprehensive semantic model. This model includes:

  • Relationships to Existing Dimensions: AtScale identifies existing dimensions (e.g., Product, Customer, Geography) that use join columns consistent with what is found in the base table. If columns can be found to satisfy all join columns, a relationship will be created to the dimension. All the semantic elements within the dimension (hierarchies, dimensional and metrical attributes, and embedded dimensions) are then included in the model and available to query.
  • References to Existing Objects: If the table is used in another model and has existing measures defined on it, those measures will be included in the model for querying. No new objects will be created in this case, rather a reference to the existing measure object is added to the model.
  • Creation of New Objects: If a column is not being used for another purpose, it may be used to create a new element. If it’s numerical and doesn’t appear to be used as a key, then a sum’d measure will be created from it. If it is a string, a degenerate dimension is created which is a dimension created directly on the dataset column. Occasionally the intention of a column cannot be determined in which case it is not used in the semantic layer. These cases are the exception, not the rule.

Time Series Analysis

Some of the most common and powerful analyses performed utilize a time-based dimension. If one or more columns are found in the dataset of type date or datetime, they will be used to join to a date dimension. If a conformed time dimension is found in the repository and its join key data types are consistent with the source column, relationship(s) will be created and the hierarchies and other elements will be available for querying. This gives users access to doing time series analysis immediately in their favorite BI tool and ensures their definitions of time elements are consistent across their models.

If an existing time dimension cannot be found, a new one will be created automatically. It is defined on top of a standardized query definition for the dataset. The supported query engines are BigQuery, Databricks, Snowflake, and Postgres. If the model is going to be used for production or a significant number of queries will be used, the query definition should be replaced using one of these options:

  • Materialize the query
  • Swap out the query for a table
  • Use another time dimension built on a table

Looking Ahead

AtScale plans to enhance One-Click Modeling further by incorporating more advanced data profiling features. Future updates may include AI that better understands column data and statistics, leading to even more accurate and useful model generation.

In conclusion, AtScale’s Automatic Model Generation represents a significant leap forward in making data more accessible and consistent. Organizations can reduce manual effort, ensure consistency, and empower users to derive insights more quickly and effectively by automating the semantic modeling process. As AI continues to evolve, the possibilities for automating and enhancing the semantic layer are boundless, paving the way for more intuitive and efficient data experiences.

WHITEPAPER
Enable Natural Language Prompting with AtScale’s Semantic Layer and Generative AI