In this episode of the Data Driven Podcast, Dave Mariani, CTO and Co-Founder of AtScale, sits down with Dale Williamson, EMEA CTO at Databricks, to explore the transformative role of open source standards and semantic layers in modern analytics. They discuss the importance of interoperability in data infrastructure, the impact of generative AI on democratizing analytics, and how the right semantic layer can bridge complex data to natural language for everyone. Together, AtScale and Databricks envision a future where data insights are accessible to all, empowering organizations to build with flexibility, trust, and freedom from vendor lock-in.
Transcript
Dave Mariani: Hi everyone and welcome to another Data Driven Podcast. I’m Dave Mariani, CTO and co-founder of AtScale. And today’s special guest is Dael Williamson. And Dael is the EMEA CTO for Databricks. So welcome to the podcast, Dael.
Dael Williamson: Awesome, thank you Dave, great to be here.
Dave Mariani: So I always like to start these things out, Dael, with just a little bit about your background and what was the path that got you to be the CTO, the EMEA CTO for Databricks? How did you get here?
Dael Williamson: I have the most convoluted path, so this could take an entire talk. So I’m a biochemist. So I spent time, I did this really strange kind of sort of academic background in comp sci on the one side and genetics, biochemistry on the other and found myself doing kind of professional programming.
Dave Mariani: I love it. I love it. That’s crazy.
Dael Williamson: To make money while doing my academic work. So I’ve kind of constantly been a full stack engineer in a sort of variety of different domains, both supply chain systems on mainframes, both, you know, insurance systems, both kind of telephony frameworks and things like that. But the academic side was collecting a whole lot of random data sets on X-ray diffraction patterns and things like that. And then
Dave Mariani: Okay.
Dael Williamson: Applying them to create, do kind of predictive, rational drug design or industrial waste by remediation. I did a bridge into the sort of more executive realm when I was a customer for many years. I was on the sort of buy side. And then I was the global CTO of Accenture’s Microsoft Business Group and Avanade’s global COE for a good chunk of time. And then joined Databricks about sort of three years ago. So I’ve sat in the executive buyer seats, I’ve sat in the executive advisor seats, and now I’m sitting on the vendor side. But I have this completely incredibly insane detailed background of being a real practitioner. So it gives you that diversity of really weird, colorful thinking. So anyone who gets to know me know I often do a kind of biochemistry left turn.
Dave Mariani: Mm-hmm.
Dael Williamson: When I come up with explanations. So my old boss used to joke with me, was like, when are you going to talk about proteins?
Dave Mariani: I love it. I love that. you know, we’re similar in that way. I’ve been on both sides too as a customer and a vendor. Never a consultant or a biochemist, that’s for sure.
But it does give you a different sort of view. mean, for me, think it makes you more empathetic to the cause of your customers being that you’ve been in their shoes in the past. So, yeah.
DaeI: that’s an insanely good point because what you understand is the buyer principles for making decisions and how that applies to like what is your CFO going to look at? What is your procurement department going to look at? What are your risk departments going to look at? And it informs the decision making process and it really gives you an understanding of the operating rhythm too.
Dave Mariani: Easy.
Dael Williamson: So it’s both of those things that it’s really hard unless you’ve spent time in that role to really appreciate that empathy. So I agree with you. The empathy is one of the key superpowers it gives you.
Dave Mariani: Mm-hmm.
Dave Mariani: So Dael, when you were spending that time coming up to your position with Databricks, you more on the machine learning AI side or on the analytics side or a little bit of both? was your focus during those years?
Dael Williamson: Well, I tended to be quite sort of broad around it. Interdisciplinary is probably the best way to describe it. I was far more on how do I apply these things to solve sort higher order problems. And that’s sort of quite key. From a practitioner perspective, I was always a full stack engineer, so it was kind of more end to end. And that meant
Dave Mariani: Mm-hmm.
Dael Williamson: eeding to learn AI out of necessity, sometimes being a team of one and having a constraint or team of maybe five and having a constraint of skills. That’s the thing a lot of people don’t appreciate is that, you know, sometimes it’s about kind of picking up a new skill on the way. So that’s why I also like, find sort of appreciating that, that how teams are formed in companies might not be how the philosophy of what a team should look like.
Dave Mariani: Mm-hmm.
Dael Williamson: Exists because, know, they, they may have people that are schooled in SSIS that have to now modernize into a new technology. I was always big on open source, which is, think where the, sort of that alignment to communities and open source and sort of democratizing how you build out, technology, and, having a, having different contributors kind of driving a project forward in a, in a far
Dave Mariani: Mm-hmm.
Dae: Like I would say less biased way. That was always the camp I more belonged to. Was anything that sort of followed the open source trajectory. I tended to be more on that side. I often found the walled gardens, the very proprietary closed systems. I just, they drove me mad. And because of the fact that I was always in R &D on the biotech side, you wanted to basically build your own workflow and open source allowed you to do that. So I’d say that.
Dave Mariani: So given my guess is that Databricks is probably pretty attractive based on the fact that Spark was, know, Spark is obviously open source and a core part of the platform. Is that how you sort of came to…
Dael Williamson: Probably is where more my natural leaning.
Dave Mariani: Came to look at Databricks as based on open source.
Dael Williamson: So when I was doing my sort of master’s work and in parallel was like working in sort of telco engineering space, if I had Spark in the mid 2000s, I would have probably saved myself about 20 days of work per simulation. so when I saw Spark, I kind of knew what it represented really early.
Dave Mariani: That’s crazy.
Dael Williamson: So I was arguably a big early adopter in the enterprise space of using spark in a kind of DIY wrapper. So Databricks made sense because it took away the maintenance overhead of using spark. And then as Databricks has progressed things like Delta, and ML flow, those donations have been huge for what would have reduced even more time. So like to put it in relative terms and, and it’s an analogy I’ve used quite often.
Dave Mariani: Awesome.
Dael Williamson: If I ran a proteomic simulation to figure out how to create the structure to do drug discovery, it would take me on average about 45 days to run a series of simulations against that. If I had what represents Databricks today, it could probably be done in a matter of a few hours. So that’s 45 days to a few hours in literally a sort of 20 year horizon. That’s significant.
Dave Mariani: That’s insane.
Dael Williamson: In terms of compute innovation.
Dave Mariani: Insane, insane. So with sort of Gen.ai sort of being sort of a big focus now, how do you think that sort of has changed? How’s that changed the market? How’s that changed? How’s it changed what Databricks focuses on? mean, everybody is so focused on Gen.ai and I see that enterprises are still struggling really to sort of put it to good use. So what’s the angle there for Databricks when it comes to making GenAI work for business.
Dael Williamson: So, I mean, Databricks is, it’s a sort of ML born company. Like Sparks first use case, if you go dig into it in a very, very old Apache resource paper, it was ML, right? And classic ML has not gone away. Like lots of people think it has, but in the last six months, we’ve seen the rise of ML go through the roof. Like the usage patterns are, they’ve never been more popular.
And I think what Gen.ai has done is kind of shine a lens on AI as a whole as a discipline. Specifically on Gen.ai, it makes a very, very fascinating interface. So one of the first things we did sort of after acquiring Mosaic, which was, you know, what Databricks did incredibly well was data processing and machine learning on CPUs. What Mosaic allowed us to do was extend that capability onto reducing model flop utilization on GPUs. So building generative models or so either transformer or diffusion models and a variety of other types. So it’s given us that potential to be able to have a bold motion. To be honest, it’s more interesting on the inference side. So being able to kind of bold more natural systems.
Dave Mariani: And. Business. Business. Business.
Dael Williamson: So one of the first use cases where we applied this internally was to on text to SQL. And lots of people now know that as Genie. So that was basically what we effectively referred to as a compound AI system. A lot of people in the world would call that kind of an agentic framework. So it depends on whether you want to be scientific or sort of marketing led on the explanation, but in effect, you’re building like an application or workflow.
Dave Mariani: Okay.
Dael Williamson: What we built was a really good model that was excellent at text to SQL. Now the whole ambition there was Databricks had key personas around data engineers. We were a big tool that had a huge following because of Spark, because of Delta, because of MLflow. We had a huge following in the data science practitioner community. We also had a bigger increasing following in the analytics community. So if you went to our summit.
You know, out of the 16,000 people, you’d have 15,000 practitioners. Now that’s special. Where generative AI gives us a huge opportunity is pushing that to way more of the consumers of data across the enterprise. So creates a natural language bridge. But it’s only the beginning. It’s basically like a new kind of translation interface because code is really sort of
Dave Mariani: Easy, Easy.
Dael Williamson: Deterministic in nature in terms of the rules and being able to kind of link that to a natural language is super interesting. The thing though is in order for it to be incredibly successful, it needs to have the enterprise domain context. And that’s where we’re sort of moving to how do we build these solutions that then can actually become more domain specific.
To humans in the enterprise. Like you know this, cause that scale probably have the same thing. We have a lot of language that’s internal to Databricks. You know, the simple one is how we run our fiscal cycles. So if I want to look at a quarter, I need to know that the quarter actually, the first quarter starts at the beginning of February and it needs to be able to do that maths and that kind of semantic interpretation to be able to come up with the right where clause.
Dave Mariani: Mm-hmm. business.
Dael Williamson: in sequel. That is the holy grail is that kind of semantic layer.
Dave Mariani: Mm-hmm.
Dave Mariani: Yeah, it’s like we did some studies and really sort of following on the heels of Juan Sequeda, who I think you’ve been on some podcasts with Juan. And it’s really pretty shocking because, you know, especially for text to sequel to be successful, you know, it’s got to be deterministic. It can’t be wrong. So, you know, it’s like even an accuracy of 95 % is a little bit scary, right? And so
You know, one of our customers, they run the business on GD. Do you know what GD is? You shouldn’t know what GD is. It means gross demand for them. But that’s their own internal metric, right? And you can scrape the internet all day long and you’re never going to find GD on the general internet. So there needs to be some way of contextualizing, like you said, bringing your own business language to the power of that LLM.
So that you can put the two together and generate the right query that’s gonna be right for your business. And so I’m super excited, Dael, about that because it’s a real important use case for the semantic layer. You know, I started the company and it was about democratization and then Databricks and your peers moved data to the cloud and analytics to the cloud. So then they became another sort of value proposition of, you know, cost management and performance improvement and acceleration. And now we have this business context. So given that you’re sort of working on that, you’ve got Genie and AIBI. I’ve also, you also mentioned, or are you also at your last conference announced Unity catalog metrics. So is that sort of the mechanism with which you can partner to sort of bring that context into the enterprise?
Dael Williamson: Yeah, so for us, and it’s cool you mentioned Juan, because we do collaborate more than just the podcast. So there’s a lot of conversations he and I have. And what I love about where he is, is it’s far more on the ontology space. So you kind of have sort of semantics that are involved in more the business intelligence space. And then you have this kind of more
Dave Mariani: Is it?
Dael Williamson: Like how do you create a representation of the different domains in a company? So I think all of these things are relatively related. What I find super interesting is we tend to think about abstractions as we move forward. So if you think about the stack, and this is something I actually talked about on the Data.World podcast,
Dave Mariani: Mm-hmm. Mm-hmm.
Dael Williamson: If you look at the OS layer, know, Linux is the de facto standard. If you then move one up and you start to look at containerized sort of containerization, you’ve got Kubernetes. These are two open source standards that have allowed the world to keep focusing on higher order problems. If you then think of Parquet as being the de facto standard for storage. And there was a big kind of tussle that always takes place. There’s kind of multiple standards and then you know, the world kind of gravitate to one. And then you start to think about Delta, Iceberg and Hoody. And, you know, with, with us acquiring Tabula this year, we start to unify those in a massive way. And we open-sourced Unity catalog with the view of making it more about the interoperable catalog. That leaves us with a very interesting point. So it’s a funny story. Like UC Metrics came out of our internal implementation of Databricks.
Dave Mariani: Mm-hmm. Mm-hmm. Mm-hmm.
Dael Williamson: So we were finding that we needed a kind of space where there was an immutable and clear understanding of where metrics lived, sort of along the line of the metric layer paper that was written by the Airbnb team around Minerva a few years ago. Now, if you follow that through, we then saw how powerful that was internally, so we flipped it into the product. Now, with us open sourcing UC, that means that UC Metrics will be a big
Dave Mariani: Okay.
Dael Williamson: Sort of feature in there. What Unity catalog and UC metrics enables is the ability to start to connect with other open source and closed source to a degree semantic layers. And that’s where the relationships with at scale is so powerful because what you’ve been solving for is how do you unify the semantic layer.
Dave Mariani: Mm-hmm.
Dael Williamson: And your recent open sourcing of it was one of the most exciting things. I even mentioned it on the call, on the podcast with Juan. Because it’s a big deal. Because it turns out as you kind of, and this was something he brought up, as you move higher up the stack, you get closer and closer to humans. But as you move higher and higher up the stack, you move to a far more proprietary format problem.
Dave Mariani: Yeah.
Dave Mariani: Please.
Dael Williamson: So we have arguably some of the most locked in proprietary formats sit in that layer and they house business logic. They house the context of a company. They house a huge amounts of material, you know, value that number one makes model training more accurate and more domain specific. So they’re incredible fuel for training models and for fine tuning models. But the second thing is they’re a huge
Dave Mariani: Mm-hmm.
Dael Williamson: Lock in to interoperability. typically if somebody were to migrate from one semantic type of tool to another one, like a big company that has a huge kind of BI lock in layer, that could take like five years to do that migration because of the impact and how close it is to humans. So this is why I find that to be the next frontier.
Dave Mariani: Yeah.
Dave Mariani: Yeah. Yeah. And, look, it’s you, you, called it out, right? Lock in. mean, it’s semantic layers are not new. They’ve been around for, for since BI tools were born, you know, and I think that business objects gets credit for the universe as the sort of the first sort of call out of the semantic layer. But they’ve always been tightly coupled with consumption.
And like you said, that creates lock-in and no customer wants to be locked in to a single vendor. So we did open source, we call it SML, semantic modeling language, open source that to really make these semantic layer platforms interoperable. And so that we can work with partners like Databricks to make that sort of the default language for describing your business and keep that separate and distinct.
From how the data gets consumed. that’s like, obviously there’s other approaches to that. mean, we know there are companies out there who are trying to use that semantic layer as that lock-in layer to get not just your analytics consumers, but also to get your Lake House and Data Warehouse workloads. And I think you and I are the same in that
You you got decouple those layers so that you have the maximum amount of flexibility to mix and match technologies and move with the maturity of these technologies. you know, we know how fast things move and we know that there’s going to be some shiny new tool that you’re going to want to use in now it’s not 10 years, it’s not five years, it’s maybe two to three years now in terms of these cycles of technology.
So yeah, I feel very, very passionate about that. So go to and look up SML, Semantic Modeling Language, which we open-sourced it about a month ago and got a lot of really great feedback. Thank you, Dael, for your feedback as well. And so we really are looking to do that. I’m very excited about UC Metrics because it gives us a way of integrating our two products seamlessly and giving customers really text to SQL.
Dave Mariani: With accuracy and making it deterministic. So if I’m asking for gross demand and I’m asking for gross demand last fiscal quarter for a product category, it’s gonna give me the same answer every time, because it understands product category, it understands gross demand, and it understands my time calendar and knows what my fiscal quarter is.
So that’s really exciting.
Dael (21:31.46)
I think so too. think one of the cool things is that it also puts like, there’s sort of a two worlds problem, right? So I often talk about this with customers where I go, do you believe in five years time, you will have a single vendor that will do everything for you? And the answer is typically no. And that means that you have to have components of your stack that are highly valuable to you.
Dave Mariani: Mm-hmm. No.
Dael Williamson: That have to be interoperable. So one, you have access to those material assets. And two, it also positions you because I believe the interface is gonna fundamentally change over the next sort of two to three years. Like my daughter said something to me the other day, she’s six for context. So I use perplexity quite a lot at home and she talks to perplexity because she can’t really type yet. So she said something to me, she said, daddy,
Dave Mariani: Thank you.So.
Dael Williamson: Why do you and mommy look so silly when you’re typing on your phones? Why don’t you just speak? And it’s such a innocent astute finding because she’s right. if you, as you become more aware of like the dumbest, smallest keyboard in the world and how humans are kind of doing this at scale, like what’s fascinating is what does a future interface look like if we don’t have to do that?
Dave Mariani: Yeah. Okay.
Dael Williamson: And if our tech stack is supporting whatever that future interface will be. Now that’s where this partnership with us is so powerful because it supports a pivot from a kind of dashboard like experience to maybe a voice enabled or natural language to text, mean to SQL enabled with semantics sort of enriched experience.
And that is sort of the direction of travel. So when you’re choosing, you want to not only be able to future proof, but you also want to not be locked in and have access to all of this really valuable, semantics is actually a form of data. So you would think of it as a class of data asset that is super, super valuable.
Dave Mariani: Yeah.
Dave Mariani: Yeah, it’s my prediction is that look, you know, the whole sort of semantic layer and how it democratizes access to data is sort of like, like I said, the original sort of motivation to build a semantic layer company. And what I really loved is that when we turned on the Excel users with live access to the Databricks lake house, crazy stuff happened because all of a sudden there’s a class of users who could never use a tool like Power BI or Tableau because they want a spreadsheet interface and they live in a spreadsheet interface. And so we brought the data to them and they were able to use those skills to then do analytics where they couldn’t do analytics before. Texas SQL is that on steroids, right? Because now it’s like anybody who can talk and who could speak can now be running analytics. And the whole idea of dragging and dropping and pointing and clicking and creating these charts and where you’re dragging things around, I mean, that’s going to look insanely antiquated in a very short period of time. If, Dael, we get this right. And I think that that’s the key is that we got to get it right. And we only have one choice. And it’s about trust.
Dael Williamson: Hmm. I know.
Dave Mariani: And trust in the customer can trust that interface. Cause if we can make the customer trust the accuracy of those queries, then we’re really truly bringing analytics to everyone. And you don’t have to be a data expert. I’m an OLAP guy. And so when we moved away from OLAP and we said all of a sudden we gave them Tableau and now they have to be a database.
Dael Williamson: I agree. I agree.
Dave Mariani: They need to a database person. They need to understand tables and foreign keys and primary keys and how to relate tables together. And they did all this data modeling. We forced them to do this data modeling before they could actually ask their first question. That was a step backwards for me. So I see that with the semantic layer and with a Databricks lake house where you can put any type of data in one place.
Now we can describe it for the business, and then we can ask a question using just English or whatever your native language is. And voila, everybody is a business analyst. And so I’m really excited about that.
Dael Williamson: I think it does. think you’re solid there. Like democratizing is going to be achieved in a different way to how anyone envisioned it two years ago. And I think that’s sort of the key. like I was in Denmark the other day and they were talking about the fact that they have a Danish model at the start. So the Danish model takes the prompt, converts it to English.
Dave Mariani: Is it?
Dael Williamson: So then ultimately, they’re sort of underlying database structure and everything because of the inherent nature of some of the technologies that they’ve acquired over the years. A lot of it’s kind of somewhere between English and Danish. And then there’s a Danish model at the end of the chain that takes the interpreted results and converts it back to Danish and converts it into sort of…
Dave Mariani: Wow.
Dael Williamson: More of the dashboards and the tables and the representative structures that it expects to see at the end. And that tells you already that these models are going to be used in chains. They’re going to be used in a fascinating kind of way, and they’re going to have a specific purpose. So translation will play a role, but so will translation down and translation up into the key systems.
And I love the fact that you brought up OLAP, because when I first saw, we called it Project 1B, the whole project was, can we reach a billion users in the platform? So one billion. And the whole basis of that when I first saw it was like, my word, that looks like an OLAP cube just with natural language.
Dave Mariani: Yeah.
Dael Williamson: And it’s funny because that was like, that was over a year ago and look at how far we’ve come in just a short space of time. So, so I do agree with you. And I think the more people have access to these key data assets and, and, and we often think about data just being the kind of structured data that we pull through, but increasingly this is something Juan and I talked about a lot. Knowledge is a, a, a data asset. So is the semantic logic that drives your business.
Dave Mariani: Absolutely. Yep.
Dael Williamson: So is code. And most companies are going to start to look at how do I protect these assets and how do I harness these assets to create sort of really cool future applications. And I think that’s where we’re at. We’re in builder mode. It’s brilliant.
Dave Mariani: It’s a fun, time. It’s a fun, fun time. It’s like, you know, it’s a fun, time for her. I’m a BI guy, I’m an analyst guy, and it’s such a fun time. And, and Gen.ai has really sort of opened a lot of people’s eyes for doing things differently and really making Texas SQL to work. we, again, Texas SQL hasn’t been, hasn’t been new.
That’s been around for a while. There’s been vendors trying to pull that off, but never with the right tools. I think we have the right tools in place. And I love what you said. It’s like, you know what, you know, don’t build a proprietary stack, you know, make sure that you use open source. You assemble things together. You, you focus on what your core assets are and make sure those are, are, are translatable and, and, and we’ll now stand the test of time. And to do that, go open source.
Dael Williamson: Mm.
Dave Mariani: So Dael, so this has been a great talk. I’ve learned a lot. I hope that our listeners have learned a lot. And Lurie, looking forward to making some big announcements and doing some very, very exciting Flashy demos very soon. So I can’t wait to get back on with you and show the world what we’ve done together.
Dael Williamson: Awesome, thanks, Dave. And it’s been awesome, like hanging out and just talking about how things have progressed. Because one of the things that would be really cool is to go back in time and tell ourselves that there is hope.
Dave Mariani: Yeah, I’m telling everybody out there that there is. I think we’re very close to really changing the way people ask questions. with that, Dael, thank you so much. Thanks to Databricks and thanks to all you out there. And stay data driven. Thanks a lot, everybody.
Dael Williamson: Awesome. Thank you.