Revolutionizing Data: New Research on Natural Language Prompting with AtScale’s Semantic Layer and Generative AI

Data-Driven Podcast

Welcome to the latest episode of the Data-Driven Podcast! Host Dave Mariani is joined by Jeff Curran, Lead Data Scientist at AtScale, for a deep dive into the transformative power of Generative AI in business intelligence. Jeff shares his pioneering research, demonstrating how AtScale’s semantic layer bridges the gap between natural language queries and accurate business insights.

Explore the challenges and solutions in integrating AI with BI tools and learn how this innovation can make data-driven decision-making more efficient and precise. Don’t miss this opportunity to stay ahead in the evolving landscape of AI and business intelligence.

For more information, download the white paper now!

See All Podcasts Button Arrow
Quote icon

“There's been a lot of good news about the power of Gen AI and LLMs. Chat GPT has taken over the world and is being applied to every possible use case. You see these kind of cycles happen with AI as a field in general, where, some huge model will come up, some huge breakthrough people will think can be applied to anything in the world. And then what happens is, you end up finding, okay, maybe it's not usable for everything and maybe not, it's not gonna like, solve world peace or anything like that, but it has some really great applications to specific domains.”

Transcript

Dave Mariani: Okay, so welcome everybody to another data-driven podcast. and today, I have a special guest. our special guest is Jeff Kern, who’s lead data scientist at AtScale. And, I’m, I invited Jeff on the podcast because he’s got some, he did some really important research, and, everybody talks about, gen AI and how important Gen a ai, gen AI is to sort of our everyday lives. Well, it’s also really important to businesses. And, and so we’re gonna be really diving into how, how can we apply gen gen AI to actually help facilitate answering business questions. So, Jeff’s, first of all, welcome to the podcast.

Jeff Curran: Yeah, thanks for having me.

Dave Mariani: Jeff, you know, the, I always like to start these things out just by, I’ll letting the listeners know a little bit about yourself and sort of how you ended up, being a lead data scientist at, at scale. So can you tell us a little bit about your path

Jeff Curran: Yeah, yeah, of course. so I’ve been with at scale about two years now. before that I was a data scientist at a large financial firm, helping stand up their data science teams from scratch. And what I saw was basically, I spent about half my time getting to do the data science work, but half the time was spent in meeting with, in meetings with folks, trying to redefine these, these business definitions and make sure all of the, the features I was including in my models actually matched up with what they were using in their day to day. so when it, I was thinking about moving back to Boston and I stumbled upon at scale and saw that, you know, the semantic layer as a whole actually, you know, solved a lot of the issues that I was seeing in my previous, previous, role. And I thought, you know, it, it had a great application for not only BI tools, but also these, these data science, use cases. So, you know, I joined up the team and it’s been a great time ever since.

Dave Mariani: Yeah. You know, you know, as listeners out there at scale, you know, we provide a universal semantic layer for businesses. And so we’re, we’re really focused on bi, but as of late, you know, the worlds of business intelligence and artificial intelligence are sort of crossed paths. and, you know, traditional bi, which is sort of a, you know, hunting for the needle in this and then in the haystack, is a, is a, a a, a sort of outdated way of doing bi. And, I think, you know, the future is clearly combining artificial intelligence with business intelligence to make users much smarter, allow ’em to do their jobs faster. So, so, Jeff, I, you just sort of made some news here. ’cause you re you just released an important white paper, and, you know, the white paper, is called, enabling natural Language prompting with AT Scale’s Semantic Layer and Generative ai. so you led the research on this and, and it’s really focused on how can you take, an LLMs out there and really give them business context so they can do a better job at answering business questions. So, can you, can you talk a little bit about sort of what sort of made you wanna, go and explore this and run the experiments so that you could actually write this white paper

Jeff Curran: Yeah, yeah, definitely. so there’s been a lot of good, you know, news about the power of gen AI and LLMs and things like that. you know, chat GPT is taken over the world and is being applied to every possible use case. and you see these kind of cycles happen with AI as a field in general, where, some huge model will come up, some huge breakthrough people will think can be applied to anything in the world. And then what happens is, you know, you end up finding, okay, maybe it’s not usable for everything and maybe not, it’s not gonna like, you know, make world peace or anything like that, but it has some really great applications to specific domains. And what the current realm of, you know, generative AI models has, has kind of found out to be good at is, you know, generating, novel text from inbound user prompts.

Jeff Curran: So think about this as, you know, the, the textile complete on your phone just dialed up to 11. You know, instead of saying, you know, I can suggest one or two words, I might be able to write an entire paper based off of a user’s inbound, question or prompt. but in our case, it solves a really interesting pain point that we’ve seen in the business, as a whole, or just businesses as a whole, where someone wants to gain access to their business’s underlying core data. traditionally for anyone that’s not super tech savvy, that requires ’em to go through, say, a business analyst or go through a BI tool to act as, have an interface between the question and this, underlying data. And what, as I was just saying before, LMS are really good at translating from one body of text to another.

Jeff Curran: So the idea was can we get an LLM to translate a natural language question into a SQL query so that it could actually access these underlying tables directly So kind of let people directly interface with their underlying data without having another human element in between there. And this is something that has been explored in a couple other white papers. but you know, there there’s been some significant drawbacks that have been hit in the past, specifically around understanding, you know, business KPIs or, you know, key performance indicators. You know, these, these things are hyper specific to individual businesses, maybe even the individual teams. And if you ask, you know, an LLM, you know, what were my quarterly sales by team, it’s gonna just generate something that it thinks satisfies those conditions based on as set, it’s not necessarily gonna use your team’s definition. so

Dave Mariani: LM because, because Jeff, right, these LLMs are trained on sort of the internet, they’re not trained on a company’s own data typically. So how would it know what it, what it even what the, what it team meant to Proctor and Gamble, for example, or what, what does it mean What is gross margin How do you calculate that for business A versus business B It’s gonna be completely different. but given the context of the business. So it’s sort of taking sort of like a general approach, a chat DBT approach, which has been trained on really generic data that’s out in the ether can’t really work specifically for, for a particular business. Is is that what you’re getting at

Jeff Curran: Exactly, yes. These things are generative and generalized. when they’re out of the box, they are really good at dealing with things based on this massive training set they have off the entire internet. But when you want something hyper specific to one use case, you need to give it all the context relevant to solving that use case. and that’s where, you know, something like a semantic layer that provides all that incredible context in a way that’s consistent to the business really can provide value to the LLM.

Dave Mariani: So it’s even kind of harder when you get to clearing databases, right Like your Snowflake or your Databricks or your BigQuery, because not only do you have to figure out the business, the business context, those business definitions, those KPIs, but you also have to construct a query against random tables that again, every business is gonna store their data in a different way, in a different set of tables with a different set of relationships. So you kind of even really amplify that problem of not just sort of terminology, but also if you gotta generate a query, how do you make sure you’re querying the right tables to come up with the right, the right KPI

Jeff Curran: Exactly. Yeah. So you’re not only moving from a business term to say a warehouse term, which may be, you know, how you and I talk about, say our business teams or something like that might have a completely different column name that it maps to on the under on the underlying warehouse. And without context, you know, the element’s never gonna be able to make that jump or not be able to make that jump consistently. And then it gets even more complicated when you start dealing with multi table operations where joins are required or complex aggregations, these things that require very specific join methodologies and join paths to create the correct answer. So again, this is something where LLM struggle

Dave Mariani: And, and you talked, you used the word correct. So, you know, if if I’m going on chat GBT and say, you know, give me the, give me a an agenda for, or, you know, for my outing, and in Rome, Italy, if it’s, if it’s wrong, it’s not the end of the world, right But if I’m a CEO asking for what was my gross margin this month versus last month, and I go into a board meeting and it’s wrong Mm-Hmm, the consequences are, are much more acute. So, so, so, so really it sounds like we have the problem of how do you define A KPI for a training set that’s been not trained on your data. We have an additional problem of how do you translate that into a database and database query where each database and each company’s database may look completely different. and then we have the problem of accuracy, because, you know, business questions, if you’re gonna meet making financial decisions, they better be right. because, the first time they’re wrong, you know, you’re likely to be looking for another job.

Jeff Curran: Exactly. Yeah, exactly.

Dave Mariani: Okay. So, so talk to me a little bit about, you know, what you did in this white paper. So, you know, I think much credit goes to, I mean, we’ve had the feeling, that business context need to be combined with an LLM to really make natural language query work. It’s amazing, and it’s been an always an amazing sort of, true north to allow users just to use, you know, English, to ask a question of their data. That’s always sort of been the goal, rather than having to point and click and do all these other things. It’s so much more natural and it opens up data to so many more people. But the, the fact is, is that those three problems we mentioned have been a big barrier. and, and it really took, like Juan Cicada on Data World did sort of the first sort of paper where he combined, and did a, did a, a test that combined, or that compared, a, LLM answering, doing text to sequel versus an LLM with a semantic layer doing text to sequel. And he found a big benefit. I mean, he found that there was a big lift in combining semantics, obviously with the LLM. So how did your sort of methodology and how does this white paper, that you authored, how does it differ from that previous analysis

Jeff Curran: Yeah, yeah. So it differs in a few key areas. the first one is just gonna be the dataset that we used. the dataset that we used for our white paper was just a little bit more fleshed out than what was done in the data that world, paper. So, we use the TPC Ds, set of tables, so it’s about like two dozen tables, a few hundred million rows. And the idea there was that if we were gonna do a stress test against this, we’d really want to use, you know, an example of data that’s been vetted by the industry and used for a number of other benchmarks in the past. not only that, but this data comes preloaded on I think most Postgres instances and a ton of data warehouses across, you know, the industry. So if we want to be able to recreate this and give it, and let people test this out on their own, we want the data set to be easily available.

Jeff Curran: so that was one of the differences that we, that we used here. I mean, the second big one is that instead of using, you know, the data world, knowledge graphs and things like that that they use for their semantic layer, we use the AtScale semantic layer, which provides a key, a few key benefits, just one being its own semantic layer definition and two, and that it can interface directly with the AtScale query engine. So, where the data world paper, had to deal with complex joint operations and things like that, we were able to leverage the Ascal query engine and basically completely remove the need for the LLM to ever define both joins and KPIs. so measures, aggregations, things like that. So we drastically reduced the scope of the questions that we’re asking the LLM by leveraging at scale’s existing technology, and we also were able to provide it the great metadata that comes from the semantic layer. So enabling the LM to map between those, human readable terms and how those objects are defined within the semantic layer. So, okay.

Dave Mariani: Okay. So this is recap here. So it sounds like, you know, you solved two of the three problems that I mentioned. So you solved the problem of how to define A KPI, that KPI is being defined in the semantic layer. And then it’s like, well, how do you then turn that user query into a query, against a complex schema Well, it turns out the semantic layer does that mapping, doesn’t it So, and if you use the query engine of the semantic layer, all those relationships are already in, are already defined in the semantic layer. So what you’re telling me is that the LLM doesn’t have to do as much work. Exactly. the LLM needs to do the work of mapping the end user’s prompt into a semantic query, you know, using the terminology mapping to the terminology that’s been defined and exposed in the semantic layer. Is that right

Jeff Curran: Exactly. Yeah, yeah, yeah. So we are able to ask the LLM to, as you said, break queries that are relevant to ad scale’s, single logical table that represents the semantic layer. So instead of having to pass a ton of metadata about all these individual tables that exist in the warehouse, we’re able to just say, Hey, we have one massive table. just write a select statement that’s relevant to this table and include the filters that are relevant to the user’s, question. So drastically reducing the scope of the question for the LLM into a realm where it’s known to be really, really good.

Dave Mariani: Okay. So, so then the third problem was accuracy. Can I trust these numbers Right. And I’m just looking through the results of your paper, which are, which you have a, you know, a very nice table which is compares both with a semantic layer and without a semantic layer for the same queries, against the, the T-P-C-D-S schema and the T-P-C-D-S schema is, I, it’s, it’s, it’s very close to my heart ’cause I use that to do our, our, our benchmarks for performance. and it’s a great retail data oriented data set that’s very, it’s a, it’s a complex schema, so it’s complex enough to be hard, and it’s real life to be something that is trustworthy. So I’m looking at these numbers here. Jeff and I see that, that the accuracy is without an, a AtScale semantic layer is only 20%, but with a AtScale semantic layer, overall you showed 92.5% accuracy with really, you know, with, with really only the, one category where you had less than a hundred percent accuracy. So can you talk a little bit about just how you sort of came up with those, with those, those, how, how’d you come up with the, for the test How did you develop sort of like the, you know, the elements of sort of how you came up with these sort of percentages or these accuracy per percentages

Jeff Curran: Yeah. Yeah. So, so for the question set that we fed the both our control system and our LLM, and just to note, the control system we used here, again, used the same LLM that we use for our testing system, but instead of giving it at scale semantic metadata, we gave it, information about the underlying warehouse tables and all the, the joins that is are associated with those tables. So it knew what existed and it knew how to work with those tables in general. to evaluate the various LLMs we had, we had a nice system go through where user question would come in, we’d ask the LLM to translate that to, in the control systems case directly, a warehouse query in the test systems case and at scale semantic query or an at scale inbound query. those queries were then processed, either by the warehouse or at scale respectively.

Jeff Curran: in the case of a syntax error, we would ask the LLM to generate something else. So for example, if the LLM, had a minor hallucination in the name of a column or something like that, that we could catch, and we could know is a hallucination based off of the existing metadata, we ask it to do something else and just say, Hey, give us a different response. So we gave it a little bit of extra wiggle room in there. but once we had a value at the end of this, so once either the control system or the evaluation system gave us, a table to work with as a result, what we do is basically look at the number of categorical columns and numeric columns in the response, get the shape of the response, take averages over all the numeric responses and look at what the expected shape and averages are for those individual, tables versus a known expected value.

Jeff Curran: so basically what we would see consistently as we looked through the control was that either, the control was pulling in columns that weren’t relevant to the user’s question. So for example, we have, a number of numeric columns in this data set around sales numbers and, you know, just numbers of products sold, as well as, you know, the prices and the discounts and all that. So we would frequently see that the LLM for the control would grab the incorrect column, or it would grab the correct column and it would aggregate it incorrectly. So it would give you a result for your question, but it wouldn’t be an accurate result or wouldn’t be the result that the user needed to actually de deem this. Correct. On the at scale side, things were a lot, simpler obviously, because the LM just had to say, give me K-P-I-O-X and then at scale would translate that KPI into the underlying sql. So it ended up being that the control system struggled to define these KPIs pretty frequently. but if you look in the paper, you can see that for our lowest complexity questions that we asked it, so in the case where it didn’t have to define KPIs, it did okay, actually it did, it, it got about 60% accuracy, but the second we started asking it to define KPIs or deal with joins, that number drops to at best 10%.

Dave Mariani: Yeah. And, and listeners, I, I’m, I am displaying sort of that key chart, again, this is in the white paper, so go to atscale.com and go to resources and you can download this for free and get all the methodology as well as these results. But what Jeff is saying is like, in the case of a, of a low question complexity, so a simple question and low schema complexity, so you’re talking about the, basically the number of tables, right, that would be acquired to satisfy the query. it’s 60% is not good, Jeff, that’s like if you’re, that’s like a little bit better than 50 50. and I wouldn’t be, I wouldn’t feel comfortable, with, relying on a system that’s gonna give me a little bit more than 50 50, chance that it’s gonna give me the right answer. And you can see that as you sort of get into the higher complexities. I mean, the high, high is 0%. I mean, so that’s crazy. So what does high, high mean to you in terms of the question complexity versus the schema complexity Can you explain that a little bit

Jeff Curran: Yes, yes, definitely. So, high, high basically means that, the LLM is being asked to define KPIs, and then also do an operation that requires joins across multiple tables. so you can think of the varying types of complexity as, is it just pulling a raw column from the underlying warehouse table If so, that’s a low complexity Mm-hmm, , is it being asked to just operate off a single table or does it have to deal with joins that also brings it from low to high complexity, respectively. So a high high basically means we’re asking the LLM as complex a question as we can. It might be a join, it might require joins across three to four tables is I think the threshold we set for, for high complexity. so join across, so multiple joins across multiple tables and also defining these complex KPIs.

Dave Mariani: Wow. So, so just, it only takes three or four tables, Jeff, to, to, to send it to zero without a semantic layer Pretty much. Yeah. That’s pretty crazy. you can, you know, if anybody has written SQL queries before, like I have, having three or four tables send it to zero is pretty scary. Okay. So, so talk to me a little bit about, you know, how how does it get better So I love the a hundred percent, I don’t love the 70%. It’s better than, it’s definitely better than, anything else we can get. But what are some of your thoughts, Jeff, for sort of where we go from here So how do we get better, how do we get, how do we get to a hundred percent, across the board, do you think

Jeff Curran: Yeah, so the, the best she fire way of improving an LLM in general is to provide a training set. so as we were talking about earlier, you know, these LMS are really built to work off of this massive training set across the whole internet that, you know, tries to prepare for a little bit of everything. so when we’re working with a generalized LLM like we are now and we’re asking it to write SQL that’s only relevant to one table no matter what, we’re kind of telling it to go against all this training a little bit, you know, you can get away with that with prompt engineering and providing examples within the prompt, which we do. but the best way to do it is to give it an additional training set and the LLM will weigh this training set a little bit more heavily.

Jeff Curran: So if we really wanted this to be really rock solid and bump that 70% number up, we could provide a training set that says, Hey, this is how you interact with the AtScale semantic layer. So instead of giving it three or four examples, we give it a few thousand. and we do that as a one-off. And then honestly, it ends up being better for a number of reasons, one, from performance and second, from just a cost. You know, you can provide the LLM less information and still have that great performance. so that would be a great way to produce some, some improvements. the other way to do this is to start having better interactions between the semantic layer and the LLM. So right now I mentioned that feedback loop where the LLM can get a little bit of information that, hey, we need you to recreate this or regenerate this query. We saw an error. If we can make those errors more informed and tell the LLM how to process those error a little bit more robustly, we might be able to get that that number up a bit higher as well. So those are the two main areas,

Dave Mariani: You know, and given the fact that the semantic layer is operationalizing queries, in other words, we’re seeing end users submit queries, through a variety of means, through their BI tools, through their Jupyter notebooks, through, applications now, through, through, a chat bot, whether that be teams or Slack or, or a prompt on a, on an application, we also can probably use that query history data to better understand, you know, to improve accuracy as well, right Because we have, we have all the explicit questions that people are, are asking and then the results that we’re returning. So that’s pretty valuable data in itself. And it’s all specific to an individual organization.

Jeff Curran: Exactly. Yeah. That, that’s the glory of these things is as you get more information, you know, they’re meant to handle these tremendous, tremendous amounts of input data. So if you just get users working with it and save all the information, it’s just gonna make your, your end product more and more powerful.

Dave Mariani: So Jeff, so this was a lot of work on creating this white paper, which, you know, I think does, goes a long way to sort of, provide some valuable data. The methodology’s all spelled out in the white paper, so I love that you took a very scientific approach here. Didn’t hide anything. People can reproduce these results, especially since you can go to ask l.com and, and download our developer community edition, and you can do this for free. We have the T-P-C-D-S model right there within the, the, in the, as as one of our, model samples. So, in, in Snowflake and, you and Snowflake and other data platforms like Databricks, you also have those data sets, those TPC datasets already there for you. So, so it’s something you can reproduce and I encourage the industry to go out and reproduce that. So given that all this research on the white paper, I’m guessing that there’s probably some, product feature, that, that this might be related to. Can you talk to us a little bit more about that

Jeff Curran: Yeah, yeah, definitely. so, you know, the, the end goal of any research paper is to hopefully end up getting a product or something out of it. So, the current version of this, LLM is going to be included in private preview for ad scale’s, container August release. So it’ll be working off of just a select couple LLM providers to start and a private preview capacity. But we’re gonna keep iterating on this and hopefully get it out to public preview and, you know, ga sometime this fall. So hopefully we’ll be able to get, get folks using this more and more.

Dave Mariani: Well, I love it. you know, my whole sort of point starting the company was to provide, you know, data to anyone and everyone wherever they live. and, and that means sort of like, providing analytics and database access to people who aren’t database experts. to this, to date really sort of our integration with Excel was probably the most democratized, you know, most demo democratized interface because anybody with Excel can query data in Databricks and Snowflake and BigQuery all using that business friendly semantic layer. So we really sort of opened up analytics to a lot more people, with those sort of direct integrations. But we’re talking about now, Jeff is like, is actually inserting that back into Slack or Teams so people can just literally write, English questions and get their data. So I’m really excited about this ’cause this means that we’re now opening up, analytics, not just to the Excel jockey, which there’s a billion of them out there, literally a billion in the world, but anybody who, you know, really can construct a sentence. So, super important research. Thank you for doing it. and I wanna thank everybody for joining us, today on this podcast. Like I said, go to@scale.com, and you can download the white paper under resources. it’s a really good read. It’s a, it’s, it’s a short read. It’s not a, it’s not a lengthy read. It’s really good stuff. So go and do it and also sign up for our private preview so you can give the product a spin yourself and provide us feedback. So Jeff, thanks for all your work.

Jeff Curran: Yeah, thank you for having me.

Dave Mariani: All right, thanks everybody. And stay data-driven all.

Be Data-Driven At Scale