The Semantic Layer, Data Analytics and Artificial Intelligence with David Jayatillake, Co-Founder & CEO at Delphi Labs

Data-Driven Podcast

In this interview, Dave and David J. will discuss the value of the semantic layer in data management. They explored two different perspectives on delivering this value and shared insights about their respective paths that led them to co-found and start companies. The conversation will revolve around data analytics, the role of the semantic layer, and their experiences in the field of data management.

See All Podcasts Button Arrow
Quote icon

I absolutely love the concept of the data mesh and for people who are hub and spoke or whatever, however you wanna call it. But the reason for that, for people who aren’t familiar with what a data mesh is or hub and spoke model is, it is the organizations within your business who own the, the business aspects, be it finance, be it operations, be it hr, sales, whatever, whichever group it is, they own and manage the data within their organization.

I’d have a centralized team of people who are highly skilled and focused in that area to help design and build out those patterns. So that they could enable the other people across the business to follow them. So that the people who are in finance could just leverage that knowledge, or vice versa.

Transcript

Dave: Hi everyone, and welcome to another edition of the Data-Driven podcast. I’m Dave Mariani, I’m the CTO and Co-founder of AtScale. And today’s guest is David Jayatillake. And David is the co-founder and CEO of Delphi Labs. So, David, welcome to the podcast. 

David J.: Thanks for having me. It’s really great to be here. 

Dave: Great. It’s great to have you. Cause, as we sort of mentioned, you know, we’re, we got to know each other, from LinkedIn, because we’re fans of the semantic layer, so we’re gonna be talking a lot about the semantic layer and there are different angles towards, towards, you know, to delivering on that, on the value of the semantic layer. So we’re taking two different sort of paths, that are pretty interesting. So, David, why don’t you tell the listeners a little bit, about, yourself and then your path into co-founding and starting Delphi Labs.

David J.: Yeah, sure. So I’ve been in data for about 12 or 13 years now. I’ve started out as a, an analyst and, but in that time I did many different, you know, roles and activities, which we’d now call analytics, engineering, data engineering, little bit of data science. And then the back end of that time I led data teams at, at different organizations in different industries. So, you know, e-commerce, grocery, FinTech, teams from two, where I was quite a heavy ic and then up to like a 25 person data org where I was quite hands off and it was like managing teams of teams. so that’s kinda like how I’ve, you know, that’s been my journey in data. And then after that time, I’ve entered like startup land and worked at a couple of startups like, met in Aurora before moving on to found Delphi, which is what I’m doing today. 

Dave: So, so how, how did you actually get started in data What, what can you tell, tell me a little bit about your background in terms of education and the like, and so, and was that something you always wanted to do or is this something that just, you, you, you end up falling into from another route 

David J.: So it’s definitely not, you know, it’s very hard to be at school thinking I’m gonna be like a daycare. That’s not the right, I don’t think, I don’t think school children think that way even today. 

Dave: Not at all, not at 

David J.: All . but I was always that child, say if you ask my mom, she’ll say, I was always asking why or how much, and I was always quantitatively minded, so mm-hmm. , when I was at school, I, I studied, science and maths and community science predominantly. And then when I went to university, I studied, maths with like a mixture of economics and finance and management. So I think it, what I studied was, is actually quite well geared towards working in data. And, then when I left university, I ended up in like big four accounting at E N Y. And I, I realized that there, that I really enjoyed the analytical side of the work, which was effect, it was, it was the advisory side rather than the audit side of the business. So I did a lot of, I, I did some modeling in that time, and I really enjoyed that and I, I realized I wanted to do that full time. So I looked for analyst jobs that this one, this as little did I know that this involved me, you know, learning databases and sql and I didn’t, I had no idea. I just knew I liked analysis. So that’s how I ended up looking for my first role as an analyst. 

Dave: Yeah, I really, I really, I really liked that, you know, I always ask people how they got into it and there’s never obviously a direct path into analytics. It’s always sort of from another angle, but I like what you said about you were, as a kid, you asked a lot of questions, you asked a lot of questions about why and how I feel like the same way from my path, it was economics. So, you know, . So nothing, nothing really to do with data or databases at all, or even engineering for that, for that matter. but yeah, here we are, co-founders of our, of our respective companies centered in databases and analytics. Yeah. So, so, so with that, tell me a little bit about Delphi Labs and, and the kind of kinds of problems that you guys are attacking for customers. 

David J.: Sure. So at, at Delph, like what we’re trying to solve is the overall workflow of how, someone who is probably a non-technical, user at an organization, wants to ask a question that should be answered to the data, and then how do they get to that answer. So the whole workflow is, is really what we want to solve. so if you think about how they might ask a question, you name, they might ask a question like, oh, can you tell me what our revenue was by marketing channel last week Mm-hmm. . And what we do is we use the large language models, which are, you know, taking over the world today to, interpret that question into a, a semantic layer API request. And when we do that, we can find out, well, what objects in the semantic layer are related to that question, and therefore, do we have existing work, existing questions sort of similar to the question that could be offered as a good solution to the question. And then mm-hmm. , finally, if that’s not possible, we will generate a new semantic layer request to gen to pull data to answer the question as well. 

Dave: Yeah. So it, it’s, you know, so there are sort of, there are sort of, you know, there’s, BI platforms out there that sort of use natural language query. So is it, is it, you know, and I always looked at those and thought, how do they do that without a semantic layer behind it Because how can you actually map, you, you, you map the language to actual entities and objects to be able to ask the right question. So, so talk to, talk to me a little bit about that. So the approach of sort of natural language query sort of tools versus, versus your approach, which really leverages the semantic layer. what, what’s it, what’s the difference if, is it about fidelity or is there more than that 

David J.: So I think the, the, I think if you think about the tools that existed before large language models became, front and center, and I mm-hmm. , I would say they’re probably things like meta base and ThoughtSpot are the ones that most people think of, mm-hmm. , they, they require users to know how to speak to, speak in the right syntax and then, and have the right sentence structure so they’re expecting a certain sentence with certain verbs. And then, you to input, exact metrics and dimensions that exist in the semantic layers, in order for it to work and otherwise it, it won’t even allow for, you to put in a noun, which isn’t a metric dimension in that semantic layer. And so in some ways, they’re not as intuitive as you, you might, as you might hope, but with Delphi, someone can just ask a question and there’s no restriction on how they can write their question. 

David J.: It’s just like talking to a person and then using the large language models to, to do things like vector similarity and, generating a prompt using them as well. We will then find the appropriate metrics and dimensions of the semantic layer as much as we can. Obviously there’s probabilistic, element to it as well. but that’s how we work. We don’t ask people to write in a rigid way or necessarily even know the names of the metrics. Exactly. So for example, things like, gross merchandise value and revenue are kind of synonymous at times. Delphi, you know, can handle that kinda, similar, names, inside a, inside an organization for the same thing. 

Dave: Yeah, that’s, so, so what I expected, or what I suspected I guess I, cuz I should say is sounds like it’s, you really tackled the problem by, by leveraging that, leveraging that semantic layer and the large language model, putting those together to make it so that you can, so that people can ask questions naturally versus having to actually understand what those entities are and getting it exactly right. So that’s, that’s, that’s really fascinating. So, so, you know, so David, like, you know, you know, I always like to, you’re a, you’re a fellow entrepreneur and a co-founder, you know, what sort of drove you to to, to take that leap have you been a co-founder before or, so, you know, what was your, what was actually your path to becoming a co-founder and starting a company to solve this problem 

David J.: Yes, I have been a co-founder before. So, my first role, kind of like outside of being a data team lead was as chief product and strategy officer at the company called Laora. Yeah. It was actually a company that was being run by someone I knew and I’d known for a long time. And he, he was looking after like a couple of, subs at the same time and was struggling a bit. And I, I thought, you know, I can, not only can I help you by take trying to take, spin this new business out of the existing one, but I also have ideas of what I’d want to do with that business. Incidentally, it’s, it’s in a not too dissimilar space, which is metrics observability, mm-hmm. on, on top of a semantics layer. but mm-hmm. . Yeah. so that’s kind of like my first experience of co-founding. And the great thing at the time was if I had someone who was an experienced co-founder next to me, kind of showing me the ropes, which I, I think without that I would’ve probably not had the confidence or the understanding to, to become one. And whereas this time around, I, I feel like I’ve gained a lot and, and I kind of have a good idea of how it works. 

Dave: Well, definitely takes, definitely takes a lot of courage to take the leap. So I’m glad you, I’m glad you have. so, so let’s, let’s turn our attention to semantic layers. you know, in, in your mind, you know, what’s a semantic layer and, and why is a semantic layer important in our, in our the data and analytics ecosystem 

David J.: So for me, a semantic layer is a map between objects in the real world. So they could be customers, users, revenue, orders and data, like data structures. And so if you think about how, you know, a table with a column is a data structure and the column could be, you know, sales amount, but summing that column from that data structure gives you revenue, which is a known entity, right That kind of mapping is what I think of as what a semantic layer holds except a semantic layer. In addition to having things like metric definitions, which a metrics layer would have also has the under has the mapping of what an entity is to data structure as well. Like this ID is a customer id and that’s related to a, a customer. And that’s how you uniquely measure customers. 

Dave: So, you know, there’s, there’s, the semantic layer and metric stores sort of get used interchangeably as terms, David, so what’s your, what’s your opinion on that same or different If they’re different, how are they different 

David J.: So I think metrics layers are a subset of semantics layers. And I feel like the main thing that people understand as additional in a semantics layer is a metrics layer can have many metrics or measures and dimensions about entities, but they don’t actually have to define what the entity is mm-hmm. , because you don’t need to in order to use the data. Whereas, semantic layer will, define the entity as well. And, and that allows for some additional ergonomics in how you can develop with it because you, you can start doing things like inheritance of entities, you know, so, you know, customers are, you know, an inherited class from user and things like that, which aren’t, aren’t as easy to do if you don’t have entities defined. 

Dave: Mm-hmm. , I love that. That’s great. So, you know, semantic layers have been around probably since, I don’t know, business objects 

David J.: At least 92 universe. 

Dave: Yeah. So, so why are we talking about them so much now Do you think, David, like why is, why is it a popular theme and people talking about semantic glares I know I’m talking a lot about it, but wanted to get your opinion on that. 

David J.: I think it’s because, as much as, I don’t like this sort of term, but like data is the new oil. I think like that era of, from the big data era onwards, where we’ve had disruption in the data stacks and we’ve had increased interest and then with the increased interest, we’ve had all this hype around data, but then it’s been very, very difficult to actually deliver on any of it. Cause we’ve lacked fundamental like, skills and tools and data engineering, which have become better, but we still lacked some of the things like, what the, some of the things that semantic layers offer and also on the ML side feature stores offer. And I feel like the two are somewhat aligned as well. So I think it’s, it’s the, the demand has, has, has led to semantic layers coming about. 

Dave: Yeah. And, and there’s, you know, and we sort of lived through the sort of the self-service revolution, didn’t we Where Yeah. was sort of like, it became sort of anything goes, it’s like, it seems like the pendulum is swinging a little bit back towards the middle, hopefully, before we had it delivering and doling out all the analytics to the business, then we had the business doing it themselves. And neither of those approaches really work very well, do they Yeah. so it seems like we’re sort of swinging a little bit back into the middle where we have some governance over those over, you know, metrics and dimensions and, and the semantics of the business to allow for self-service, but not in anything goes kind of a, of a realm. No. so, so business intelligence and business intelligence tooling has changed a lot. and, you know, so from your perspective it’s like, how do independent semantic layers and, and BI tools, how, how do they play together or how should they play together in your mind 

David J.: so I think OB obviously we’re talking about some kind of integration, mm-hmm. and I, I feel like the integration should happen. I, I’ve seen, I’ve seen a few different ways, and I sometimes it’s, it’s like I feel like the semantic layer pretends to be a database in the BI tools eyes mm-hmm. , so it’s, it’s, it’s, the database can speak postgre SQL to the semantic layer, but the semantic layer is hiding the fact that the entities, it’s showing as tables to the, the idle are actually, mm-hmm. abstractions on top of a real database that it’s hiding mm-hmm. 

David J.: And mm-hmm. . I, I feel like I understand the reasoning behind doing that, but I also feel like it’s a, it’s not as good as a true integration between the BI tool and the semantic layer, which is maybe over, over a better API post like rest or GraphQL or something like that, where that the BI tool is, you know, using the full features of the semantic layer, being able to just ask for the catalog of metrics and dimensions and then request data using, using those terms as well. I think that’s a much better way for them to play together. 

Dave: Yeah. You know, it, it’s, you know, that we’re still dealing with like, sort of like that, that generation of sort of visualization tools and they’re all sort of are, are at fault really here, where they’re really sort of wired for talking to a database. Yeah. So, as a, as a semantic layer, the only sort of way you can share the semantics of that data model is through a database, which means you’ve gotta turn everything into database tables and columns, right Yeah. and, and they’re, you know, so we’ve been pushing our, our visualization tool partners to give us, provide us, you know, a way to deliver those semantics, you know, and, and, and yeah, you could still generate queries that look like sql, but we need to be able to deliver the hierarchies and the, the metrics definitions and the dimensions and their descriptions and all the, the goodness that comes with a fully baked out, semantic layer. 

David J.: Yeah. And I, I, and I fully understand why that kind of duct talk sequel needs to happen because of, because of they, they’re not willing to give you a different interface, and so therefore you’ve worked Yeah. You’ve worked with them as as, as you can. So I get it. I just think it would be ideal in an idealistic way. It’d be better to do it, you know, using a true api, 

Dave: You know, David, so that’s, that’s how we, you know, like for, that’s why we implemented MDX as an interface as well as Dax, because at least in, in those interfaces you’re dealing with metadata and, and obviously the ability to generate queries. so it’s sort of like an ideal interface for, for Tableau, which wants to speak sql. You know, we use the TDS as a way of transferring our information, so that, you know, so that it does look like, a fully looks, it looks like the semantic layer that the modeler and the, and the, and the semantic modeler designed. so there’s, there’s different tricks of doing it, but it certainly isn’t, isn’t, as I as ideal as we’d like. Yeah. So, you know, so given, you know, given sort of the, the state of the, the BI market, like what, what, what do you, what’s your view of, of, of, of where things you think are gonna go from here Because we’re still like dealing, I think with that, that generation, that self-service first generation of BI tools. Do you think that we’re gonna see them, move in a different direction Or where do you think it’s, we we go from here when it comes to consumption and BI tools and the, the analytics consumer 

David J.: Yeah, I, I, I do think, we all see the semantic layer kind of split out from the BI tool, but then be designed to be used with the Beit. So if you, as you saw recently, G ccp, pulled out Looker lel, Trump looker by offering looker modeler as a standalone semantic layer. And I think, you know, that’s probably gonna be the first move of the hyperscalers to do such a thing. I, I imagine that this will happen more and you know, the, like, even the likes of Tableau, I know organizationally they, there are some challenges at the moment, but they have acquired semantic layers in the parts, you know, but they haven’t deployed them fully. So I can imagine, you know, with lookers, with Looker and Gcps move that some of the other players will also consider doing something similar in the near future. 

David J.: because I feel like the, the, the organization’s owning those tools, like if you think Microsoft and Google owning Looker and Power bi, what they really want is people to increase their cloud spend and their overall spend with, with Microsoft or Google as an organization. Actually what consumes the semantic glare isn’t so important. It’s actually more that the upstream resources from the semantic layer of the data warehouse or the cloud compute of some kind is, is theirs. And so I think it, it does logically fit that they just want to allow people to consume it more readily to many different mediums. 

Dave: Yeah. You know, like one, one of the goals for starting AtScale for me was that, you know, it’s, it really seemed like as we moved away from the, the, the all in one BI platforms, and we got more towards visualization tools, the visualization tools, sort of like forced users to understand the data, the data platforms, they, they forced them to understand how to write sql, and how to join tables and how to create relationships. And to me that was really a step back because it, you know, there’s not everybody who could write sequel, right Most people are, are, are working within Excel or something, a tool like that. Yeah. And so, and so for me, a semantic layer, and I, and I love what you’re doing cuz you added, you know, the natural language query part to it, which even makes it more ubiquitous and easier to use for everyone because really everyone in the organization should be an analyst. There shouldn’t even be a role called analysts. Everybody should be an analyst in asking questions of the data. But we’ve made it so hard with the kind of tools, haven’t we to, to really be able to allow everybody truly democratize access to data. It’s just, right now it’s just, if you know how to write sql, which seems like a, seems like a, a bad idea. 

David J.: And even, even the self-serve, the so-called self-serve tools, they, while they don’t require someone to be able to write sql, they do require like a level of knowledge similar to being able to use a pivot table in Excel. Mm-hmm. mm-hmm. and I, and I feel like sometimes the tech industry forgets that that’s actually quite a high level of the team skill. Actually, if you think about large organization, a huge percentage of that organization isn’t at that level of skill. And yeah, so that’s one of the great things about Delphi is that they don’t have to have, have that skill. They just need to ask things in a natural, logical way and they can get an answer. 

Dave: Yeah. You think about it, right It’s like every, everybody can Google something, and, can anybody create a, a pivot table or, or figure out how to connect to a database I think there’s a big gap there between those two things. Yeah. So, so I, I do love what you’re doing with Delphi, cuz you’re really making it Google, like right Google search, like, except it’s all connected to data empowered by a semantic layer. So I love it. alright, so let, let’s, let’s, let’s, it’s actually kind of a, a, a, a related topic is, you know, large language model, you said it, generative ai, people, know it as G B T and chat, G B T being sort of like the first killer application that’s really sort of rocked everybody’s world. so what do you think about, what do you think about that Is it hype or is it something, is it something real and, and, and a sea change What’s your, what’s your opinion on that, David 

David J.: I, I think it’s real. And you see people like Bill Gates say this is like the most important thing since the graphical user interface. And yeah, I don’t think someone like him would say that. Like, likely, you know, that’s his, that was the thing that made him and Mike and his company. and I agree with that I think, but I, I also think it’s, it’s this technological step change much like, you know, you go through human history, you’ve got like a wheel, the loom, the printing press, then you’ve got the computers database, cloud, this is then the next one of those things and it will drive a step change just like the previous technological ones have done. And, but I don’t see it as being as scary as some people or what it was or as truly amazing, you know, I don’t think they’re actually like a real mind, you know, that that’s not the way I see them. 

Dave: Yeah. There is a lot of, have a lot of fear out there about people losing their jobs or, or, you know, getting to the point where Skynet is, you know, self-aware. Yeah. 

David J.: I think the, the, the former concern is probably more real, you know, because yeah, fundamentally a lot, a lot of, white collar jobs in in the Western world fundamentally generate content of some kind, whether that’s emails or, or actual like articles or blog posts or whatever, that a lot of, a lot of jobs generate content, but even spreadsheets are kind of content. so those things can be automated. There is a risk of losing jobs. But I think what we’ve seen in the past is whenever like humanity has had this increased ability to do work, they just do more work. They don’t do the same work with fewer people, they just do more work. 

Dave: Yeah. That, that’s, I’m, I’m on the same being, so being an e economist in or in training. So, you definitely see that that’s, that it makes people more productive. Just, God, just last week, a there was a, a, a prospect or sales said, Hey, we got this R F P and there’s, here’s the questions. And I just thought, you know what, it’s like this already exists. I’ve already written this a hundred times seriously, but now I gotta come up with another version of it. And so I just went and used chat G B T and it came up with a great, I just was literally just changed two, one or two words and was there, and they freed up a a bunch of time, it would’ve taken me half an hour to do what I was able to do in about two minutes. Yeah. that’s pretty, that’s pretty incredible. 

David J.: Yeah. Anything repetitive like that, I can just see it being, being automated, which is a good thing. I think 

Dave: , I I think it is too. So, so how, how do you think that re how, how do you think that affects the job of, of the, or sort of the, the data scientists and, and, and machine learning and, and that, that whole sort of ecosystem that has been so top of mind, you know, in the past few years 

David J.: So if you think of the ML world, I’m not sure, like I, I, I think what will happen is rather than them spending ages writing a lot of boiler plate code to access data or mune data, and I think that that part where they’ll be able to just ask some chat G P T type or co-pilot type interface, and it’ll just generate those functions and script for them and they’ll just read it knowing, having a good understanding of the language and understanding what it’s doing. And because it runs for the understand it. Great. So that will just save them days of time. And then when it gets to some of the more intricate, things, which there isn’t so much pre-trained data on, then that’s when they’ll actually put their hands to work. But that’s good because that’s, you know, where they’re adding the most value. They’re not spending their time doing all of the boiler plate work. So I think it’s, I think it will be like a, it will just increase productivity. 

Dave: So David, like, sticking with the data scientists theme here, do you, do you think that, a data scientist can take advantage of a semantic layer going, tying those two together Is that, is that some, and have you, have you seen examples of that in your, in your work 

David J.: Yes. I have. So one, one of the first, like semantic bears I played around very heavily with, was, lel. So the organization where we had Looker and lel, we had a data scientists around the business who did use Looker quite quite a lot. And so whether they’d used it for just monitoring their models or, or even possibly generating inputs to their models, they were doing that. So, and it, and it made it more, you know, they, it made their code dryer because they could just use something and then reuse it in another model. mm-hmm. . So there, there are benefits and I, I see, and if you think about on the input side, I think feature stores and semantic layers have a lot in common. and then on the monitoring side, there’re almost, you know, if you think about, ml metrics monitoring, and then normal bi, there’s very little technically different to them at all. 

Dave: Yeah, totally, totally agree. Yeah, we see the same thing. It’s just that, you know, the semantic layers, it’s, the focus has always been on that sort of business analyst persona, but we see a whole lot of value coming out of, of data science just for the same reason. you know, being able to get access to verified consistent metrics without having to engineer those yourself Yeah. Is a big time saver and drives consistency. So I, it makes sense. 

David J.: I think one of the main changes is probably that semantic layers are sometimes not as concerned, especially in the way they’re set up sometimes are not as concerned with the state of things at given time. And feature stores are very, very concerned with that because they always want to make sure that the event horizon that they’re predicting, you know, they have data at that state, and I think mm-hmm. as semantic layers provide, you know, better support for things like SCDs, like SCD type twos, like, then they’ll become more powerful as feature stores as well. 

Dave: Love it. Love it. That’s a great, that’s a great call out. So, so we’re just about up on time, David, is there anything like you wanna, you know, predict in the next five years or think that things that, that, that listeners should be aware of in this, in this brave new world, that we’re facing in data and analytics 

David J.: I, I mean, this is something that I, you know, as someone who, who used l and had to deal with a semantic layer baked into BI tool, I think data leads will be less tolerant of that, especially now that you’ve got the likes of AtScale Cube and D B T having standalone semantic layers that they can use and that they can then move around to different BI tools as, as they should say, please. And it de-risks their situation as well. So I think we will see a move towards them, because, you know, nobody wants to be locked in and at, at the behest of, of a, of one vendor. And so I think, I think we will see that move. And I, and I definitely hope, we hope that you’ll, 

Dave: Well, we obviously agree, we, we agree to on that point. So, this has been great, David, I’ll, I’m gonna leave it there. you’re, you’re doing some amazing things, and, you know, hope to, hope to be able to combine the two technologies to, you know, get more people making data-driven decisions. So with that, David, thank you so much for joining, me today in our conversation. And thanks for, to everyone out there who’s listening and, for another data-driven podcast. Thanks, and have a great day. 

David J.: Thanks Dave. Thanks for having me. Great to speak. 

Be Data-Driven At Scale