Introducing Semantic Modeling Language (SML)

In this episode, AtScale’s CTO and Co-founder, Dave Mariani and VP of Engineering John Langton, discuss the exciting open-source release of the Semantic Modeling Language (SML) and its transformative potential for the data analytics industry. They dive into how SML establishes the first open standard for the semantic layer, enabling businesses to seamlessly define, share, and scale their data models across platforms like Power BI, Tableau, and Looker. John also explores the broader industry impact, highlighting how SML will foster cross-platform collaboration, enhance AI capabilities, and drive innovation by creating a unified data language that empowers businesses to unlock the full potential of their data.

For additional details on SML, check out this blog post.

See All Podcasts

Meet our Guest

John Langton

VP of Global Engineering at AtScale

John has an extensive background in ML/AI and enterprise software architecture. He was previously CTO at Linus Health, led AI at Wolters Kluwer, and founded VisiTrend, where he served as CEO until its acquisition by Bit9 +VMWare Carbon Black. John holds a Ph.D. in computer science from Brandeis University and is a frequent speaker on topics related to advanced technology. John joined the AtScale executive team to lead engineering efforts for the company’s fast-growing AI business. He oversees the development of technical strategy and delivery of core analytics capabilities.

Meet our Host

Dave Mariani

Chief Technology Officer, Founder, AtScale

Dave is the founder of AtScale and is the Chief Technology Officer. Prior to AtScale, he ran engineering and data at Klout and Yahoo! where he built the world’s largest multi-dimensional cube.

"By open-sourcing the Semantic Modeling Language (SML), we're not just introducing a new tool—we’re creating a universal standard for how data teams can collaborate across platforms and scale analytics effortlessly.” — David Mariani, CTO & Co-Founder of AtScale

“SML is set to transform how organizations integrate semantic layers across their data ecosystem. It’s a game-changer for both data engineers and business analysts, enabling faster insights and true collaboration.” — John Langton, VP of Engineering of AtScale

Transcript

Dave Mariani: Hi everyone and welcome to another version of a data driven podcast and I’m Dave Mariani. I’m the CTO and founder of AtScale and today I have with me John Langton. John Langton is the VP of Global Engineering at AtScale and he’s the inspiration for what we’re about to announce which is open sourcing.

Our semantic modeling language or SML for short. So John, welcome to the podcast.

John Langton: Thanks so much Dave and thanks for calling me out as the inspiration but I think this was something that you’ve been thinking about for quite some time and of course you’re really the tip of the spear here.

Dave Mariani: Yeah, well, it but it took a team, right? So, so people inside baseball here, but at scale, but I run product, John runs engineering, but we both sort of do each other’s jobs all the time, right? Because when it comes to the total product, you need both. And so it’s a great partnership. And, and really, what I want to do right now is just first of all, just take it through what we what the announcement entails and why it should matter.

And then really get John’s perspective on what it means as engineers who are working in this new semantic layer space that’s become so popular and what the future can hold for us. So let me share my screen real quickly because what you’re gonna see if you go to this URL is we now have a public repository. So it’s in GitHub.

It’s semantic data layer is the organization. So this is not about at scale, right? This is about the community. We really want to drive a community where we can take semantic models and really provide a single standard so that businesses, enterprises, developers can share and can collaborate on building these models.

So what is SML? It’s an object -oriented, YAML -based language for describing your data in a business -friendly way. John, stop me if I miss anything here. But it’s meant to be very comprehensive, so it can handle any type of vertical or industry. It’s based on open standards.

With integration with Git and using YAML as the of the language so it’s familiar and it’s extensible. So we expect there to be lots of community participation in making this language even better. So what we’re announcing is that we’re announcing the open sourcing the specification.

Dave Mariani: For SML, again, short for semantic modeling language. Also pre -built semantic models that have been written using SML. And then a bunch of tools and translators that will allow different platforms that produce semantics to be able to share those proprietary semantic models or definitions with each other.

Using SML as sort of like the Rosetta Stone that sort of allows you to bring them all together. So, what you’ll see in the repository is you’ll see here’s a sample of a semantic model, and you’ll see here is our object hierarchy. So, this is all the different semantic objects, and then you’ll see documentation for what the definition of those semantic objects are. And if I click on a dimension,

Semantic object, for example. You see an example of what a dimension looks like, and then an entire entity diagram that you’ll see for how everything relates together, as well as descriptions and documentation for each of the properties of each element. And then finally, besides just the definition of the language itself, it’s already preceded with a model library.

So we have some tutorials, including AdventureWorks, which is sort of like the standard that Microsoft invented for describing a fictitious business for which to build semantics on. And you can see that all the source code is all here, including calculations, connections, data sets, what have you. So lots of stuff to get busy with.

And I hope that we’re gonna really sort of spur some collaboration across the different semantic layer vendors, as well as different enterprises that really are looking to really translate and make analytics much easier for consumers to consume without having to be database experts. So John.

Dave Mariani: Now that everybody knows sort of what this is and what this means, or sorry, what it is, tell us a little bit about what you think it means and it means for the industry.

John Langton: Absolutely. So of course, there’s a lot of open source projects out there. I think one thing to point out and emphasize is that this is a standard. It’s not specific to at scale. You can use this to define semantics that could be used in Power BI, Tableau, Looker, any one of these different products, anything that uses a semantic layer. And a lot of the feedback and the challenges that we’ve heard

Folks having are that there are so many different semantic layers out there and Every vendor ends up creating a new one Right. So, you know, I think there’s I mean, I won’t I won’t throw examples out there I was gonna throw some examples out there but but I know of multiple that are being created right now by different vendors and none of them speak each other’s language and That’s of course at scale where we sit in in the environment

We integrate with a lot of different things. And so we’ve always been about integrating with these different tools and wanting to speak the same language. But what we realized is that was valuable not only for us, but it was actually valuable for the community to have this open standard so that if they want, even outside of AtScale entirely, if there’s someone who’s familiar with Power BI and they’re trying to do something for semantic layer, but they have to do it in different tool or they want to migrate or there’s some other…

You know, use cases that they’re tackling to have one consistent open standard. And just to be clear, this is an open standard. So anyone can contribute, anyone can provide feedback. This is meant to evolve and serve the community. This is not only for at scale. I think that’s a huge part of this to emphasize is that this is an open standard. It’s not just open sourcing one kind of set of code that does one thing, but it’s actually an open standard.

That could be implemented and interpreted in many different ways. On top of that, as you pointed out, we also do have code that’s part of that open standard that can help you translate from one semantic layer to a different kind of semantic layer. So just to be clear on that, if you’re using Tableau and you want to go to Power BI, you can use this open source project to translate from one to the other. Or if you’re using Looker,

John Langton: and you want to translate to Power BI, you can use this to do that. And I love your use of the term Rosetta Stone. I use it a lot myself. I don’t know how many people know what that is, but that’s exactly what it is. It’s basically that universal translator between all of these different semantic layers. So I think that’s huge and folks can develop tools that do different things with the the SML.

But yeah, think it’s a massive step forward and the fact that anyone can contribute.

Dave Mariani: So, John, I use Rosetta Stone. You just said universal translator, which is obviously a Star Trek term. you just like threw that in there. yeah, it’s, you know, one of the reasons why I think we chose at this time to actually do that in open source SML is that we’ve been working at this for 10, 11 years now. And so what we found is that we’ve

John Langton: Star Trek. I’m outing myself. Yeah. Yeah.

Dave Mariani: We’ve worked on this semantic modeling language for so long that we feel like we have the superset of everything out there. Of all sorts of the Johnny -come -lately’s or some of the new semantic layers, they haven’t implemented the full feature spec. And so I think one of the reasons why we can become that universal translator is that we do have the superset of functionality that can allow a tableau to be able to describe itself to a Power BI.

or vice versa.

John Langton: Yeah, that’s a fantastic point. So we were looking at what is the intersection across all of these, or I think more appropriately what you were describing, it’s really a superset, not just the intersection. It’s so fully featured. And so there’s things that you can do in that that some tools might not support. But I mean, I think it’s hugely powerful to be able to translate from one to the other and really define your data. And I remember, you

Dave Mariani: Mm

John Langton: Much earlier in my career, when I was sort of like director of data science, I would have loved to have something like this because there’s so much complexities after all of the ETL workloads when you’re trying to define like what is a metric, what is a measure, what is a KPI, not only for measuring performance to be able to influence it, change it, optimize it, but those definitions get really, really complicated.

Dave Mariani: Yeah, yeah, yeah, no doubt. You know, there’s, we’ve also seen, John, a lot of, we’ve written some of this about, we’ve written some of this ourselves, but a lot of the industry is recognizing that semantics, semantics and a semantic layer is really critical for making gen AI work. Can you talk a little bit about, about, about that?

John Langton: Absolutely. So we actually have, I want to be careful about the folks I mentioned because I don’t want to steer awry of anything. But we have active partnerships with some very large companies that you would absolutely recognize where we’re working on LLM projects and the

The semantic layer is absolutely critical. I think this is becoming common knowledge. There was a really popular publication that came out in Data World where folks talked about a semantic layer and they showed an evaluation. We actually reproduced that exercise, but we wanted to use real world data. So thank you for your support on that project, Dave. So Dave’s very familiar, of course, a veteran in the industry. And so we use a very common benchmark called TPCDS.

To do that and we again demonstrated the fact that with a semantic layer the accuracy is massively improved if you don’t have a semantic layer. And it makes obvious sense because in LLM it’s a large language model. It’s all about the semantics of language and that includes language of code, language of metrics, but also just text, business language. And so it’s all about mapping the semantics if you want to ask a natural language question.

What does it semantically mean? What are you asking about? And so having a semantic layer is very obviously something that’s critical in that technology. And so in these projects that we’re working with these larger companies, in fact, some of them are the very examples I was gonna mention before, but I shied away from, they’re inventing their own semantic layer, but it’s such a small, small subset of what we have in SML that it’s the…

Some of the functionality’s crippled. So we’re able to just be a massive accelerator to that technology. So that’s why we’re partnering with them and exploring those opportunities now. And that’s gonna be the case with any generative technology that this really unlocks. It takes it from being kind of a neat, nifty project that looks kind of neat if you touch it just right and do only the happy path. It takes it from that into something that’s actually practically usable.

Joh: In a real business environment, a real business use case.

Dave Mariani: Yeah, because a lot of the tools out there that do NLQ are really demo -ware, aren’t they? mean, because you can’t really, once you start to ask them a question or use company -specific terminology, like a customer of ours said, it’s like GD, which means gross demand. It’s like, they can write, they speak in GD. They don’t spell it out. It’s like, there’s no way an LLM would ever understand that.

And then there’s the complexity of the actual query itself because schemas are really complex, right? You got hundreds or thousands of tables. And so how is an LLM going to figure out how to join all those tables together in a consistent deterministic fashion to give you the same and the right answer every single time? Very, very difficult without a semantic layer or a semantic layer engine, isn’t it?

John Langton: That’s right. And that’s even if your database schema makes sense and just as complicated. Anyone who’s worked for long enough knows that the schemas usually don’t make sense, right? A lot of the time, I’ve seen crazy stuff of people will add one more column and just do this special join that exports the value into this new column because they can’t change the schema, right? Because there’s too many applications that depend on it. And so the way it evolves over time are these

Dave Mariani: Mm -hmm.

John Langton: Hack after hack after hack. It’s just the nature of the beast. So actually making sense across those schemas is exceedingly challenging. yeah, so having this unified standard that everyone can, like it’s not like under the hood of AtScale. Like everyone can see it now. It’s completely out in the open. Not only that, you can contribute to it.

Dave Mariani: Mm -hmm.

John Langton: And so that supports all kinds of business applications. Like every dashboard you’ve ever seen can now be supported by this. You can define these metrics. So you can take this spec and create your own code, your own tool that actually understands the spec, utilizes the spec. It’s meant to be a really comprehensive way to describe data, right? It’s that metadata. So any of these tools that you’re used to…

Using, I was going to start throwing out examples, but it’ll show the last time I coded, I was going to say Glue and EMR and Amazon, all these basically that kind of pin ultimate or ultimate layer where you’re trying to show that business translation or the KPI or the metric or the feature if you’re doing AI, all of that now can be specified within this kind of YAML that’s completely open. Anyone can contribute to it.

If there’s something it’s not addressing, then let us know. And you can even contribute directly yourself, make the suggestions. So yeah, I mean, think this is going to be hugely powerful. It’s going to grow much larger than at scale. And I’m really excited to see the directions it takes.

Dave Mariani: And know, John, it even changes our own sort of product development process too, right? Because now that SML is open source, any change we want to make even to our own at -skill semantic layer platform, you know, we’ll have to contribute or suggest changes in SML and have those be made in the open source project, right? If we want to continue to support and use SML in our own product.

John Langton: That’s absolutely right. Yeah, I I think a really exciting parallel is what I think the data format, IceCube, Or Iceberg, I’m messing it up. Yeah, that Snowflake and Databricks are using. So now that’s completely open standard. Anyone can create a query engine that runs on top of that. So this is a very similar analog, but for semantics and analytics, yeah.

Dave Mariani: Mm -hmm. Iceberg. Iceberg.

Dave Mariani: Yeah, well, there you have it. John, you did a great job explaining it. And we’re really looking forward to getting you guys all involved. And we’ll be reaching out to other vendors and bringing them into the fold. And hopefully, we’ll develop a real healthy ecosystem around business semantics and semantic layers. So John, thanks a lot for joining me today. And to all of you out there.

Thank you for listening to our Data -Driven Podcast. And like I said, stay data -driven. Thanks, John.

John Langton: Thanks, Dave.

Introducing SML: The Open Standard for Seamless Semantic Layer Integration

Meet our Guest

John Langton

Meet our Host

Dave Mariani

Transcript

Be Data-Driven At Scale