Local-first software with Martin Kleppmann // Metamuse podcast episode 41

Metamuse Episode 41 — October 14, 2021

Local-first software with Martin Kleppmann

Local-first is a set of principles that enables collaborative software without the loss of data ownership associated with the cloud. Martin is a computer scientist on the frontier of this movement, and he joins Mark and Adam to discuss how creative people put their souls into their work; a vision for a generic AWS syncing service; and why local-first could be a breakthrough for indie app developers.

Episode notes

Transcript

00:00:00 - Speaker 1: And I feel like this idea of really changes the abstractions that operating systems should provide because maybe OSs should not just be providing this model of files as a sequence of bytes, but this higher level CRDT like model and how does that impact the entire way how software is developed.

00:00:23 - Speaker 2: Hello and welcome to Meta Muse. Muse is a tool for thought on iPad. This podcast isn’t about Muse the product, it’s about Muse the company and the small team behind it. I’m Adam Wiggins here with my colleague Mark McGranaghan. Hey, Adam, and joined today by Martin Klutman from the University of Cambridge. Hello. And we talked before about Mark’s dabbling in playing the piano. I understand this is a hobby you’re starting to look into as well, Martin.

00:00:49 - Speaker 1: Oh yes, I’ve been playing the piano, like trying to do it a bit more consistently for the last year and a half or so. My lockdown projects.

00:00:57 - Speaker 2: And do you have a technique for not annoying your neighbors, or is this an electronic piano, or how do you do that?

00:01:03 - Speaker 1: It’s an electric piano, although I don’t think it’s too bad for the neighbors. Lately I’ve been trying to learn a WBC 400 piece that I can play together with my wife, so she’ll play two hands and I’ll play the other two.

00:01:15 - Speaker 2: Nice. I suspect a lot of our listeners know you already, Martin. I think you’re within your small, narrow niche, you’re a pretty high profile guy, but for those that don’t, it’d be great to hear a little bit about your background. What brought you on the journey to the topic we’re gonna talk about today?

00:01:31 - Speaker 1: Yeah, well, I’m a computer scientist, I guess. I started out as an entrepreneur and started two startups some years ago. I ended up at LinkedIn through the acquisition of the 2nd startup. And they worked on large scale stream processing with Apache Kafka and was part of that sort of stream processing world for a while. And then I wanted to share what I had learned about building large scale distributed data systems. And so I then took some time out to write a book which is called Designing Data Intensive Applications, which has turned out to be surprisingly popular.

00:02:10 - Speaker 2: Yeah, you wrote a nice kind of tell-all, you showed the numbers on it, which it’s been financially successful for you, but also one of the more popular O’Reilly books just by kind of copy sold in recent times. I like that post, like the candor there, but yeah, it makes you a pretty successful author, right?

00:02:28 - Speaker 1: Yeah, it’s sold over 100,000 copies, which is, wow, way more than what I was expecting for something that it’s a pretty technical, pretty niche book, really.

But the goal of the book really is to help people figure out what sort of storage technologies and data processing technologies are appropriate for their particular use case. So it’s a lot about the trade-offs and the pros and cons of different types of systems. And there’s not a whole lot on that sort of thing out there, you know, there’s a lot of sort of vendor talk hyping the capabilities of their particular database or whatever it might be, but not so much on this comparison between different approaches. So that’s what my book tries to provide.

Yeah, and then after writing that book, I sort of slipped into academia, sort of half by accident, half by design. So I then found a job at the University of Cambridge where I could do research full time. And since then I’ve been working on what we have come to call the first software, which we’re going to talk about today. The nice thing there is that now then academia compared to the startup world, I have the freedom to work on really long term ideas, big ideas which might take 5 or 10 years until they turn into like viable technologies that might be used in everyday software development. But if they do work, they’ll be really impactful and really important and so I’m enjoying that freedom to work on really long term things now as an academic.

00:03:53 - Speaker 2: And certainly it struck me when we got the chance to work together through these Ink & Switch projects that because you have both the commercial world, including startup founder, but obviously you’re very immersed in the academic kind of machinery now and again just that long-term mindset and thinking about creating public goods and all that sort of thing. And I found that I actually really like now working with people that have both of those. Another great example there would be another former podcast guest Jeffrey Litt. He was also in the startup world, now he’s doing academic work at MIT.

00:04:26 - Speaker 1: Yes, and I’m actually doing a project with him right now, right, I forgot about that.

00:04:29 - Speaker 2: There’s a current Ink & Switch project there.

So I find that maybe if you live your whole life in one of those two kind of commercial slash industry or academia, you get like a fish doesn’t know what water is kind of thing, but if you have experienced both models, then it’s easier to know the pros and cons and understand the shape of the venue you’re doing your work in in the end.

The point is to have some meaningful impact on humanity through your work, whatever small piece of the world you hope you’re making better. In our case, it’s computer things, but that the venue you’re in is not the point, that’s just a vehicle for getting to where you want to go, and each of these styles of venue have different trade-offs, and being aware of those maybe makes it easier to have your work have an impact.

00:05:19 - Speaker 1: Yes, I think it is really helpful to have seen both sides and I find it allows me to be a little bit more detached from the common mindset that you get like in every domain you get, you know, there are certain things that everyone believes, but you know, they’re kind of unspoken, maybe not really written down either. And so like in academia, that’s like the publishing culture and the competitiveness of publication venues and that sort of stuff, which seems ridiculous to outsiders. But if you’re in it, you kind of get accustomed to it.

And likewise in startups, it’s like the hype to be constantly selling and marketing and promoting what you’re doing to the max crushing it, always crushing it, exactly, and to an outsider that seems really. it’s kind of a ridiculous show that people put on frankly.

But to an insider, you know, you just get used to it and that’s just your everyday life. I find that having seen both makes me a bit more detached from both of them and I don’t know, maybe I see a little bit more through the bullshit.

00:06:21 - Speaker 2: So as you hinted, our topic today is local first software.

So this is an essay that I’ll link to in the show notes. It’s about 2 years old, and notably there’s 4 authors on this paper, 3 of them are here, kind of almost a little reunion, and actually the 4th author, Peter van Hardenberg, we hope to have on as a future guest.

But I thought it would be really fun to not only kind of summarize what that philosophy is, particularly because we’re actively pursuing that for the Muse sinking persistence model, but also to look at sort of what we’ve learned since we published that essay and revisiting a little bit. What do we wish we’d put in, how’s the movement, if that’s the right word for it, how’s that evolved, what have we learned in that time? But I guess before getting into all that, maybe Martin, you can give us the elevator pitch, if I’m to reference the startup terminology, the brief summary of what is local first software.

00:07:18 - Speaker 1: Yeah, local first software is a reaction to cloud software and so with cloud software, I mean things like Google Docs, where you have a browser window and you type into it and you can share it really easily. You can have several people contributing to a document really easily, you can send it for comments. Very easily and so on. So, it has made collaboration a ton easier, but it’s come at a great cost to our control and our ownership of the data, because whenever you’re using some cloud software, the data is stored on the cloud provider servers, like Google servers, for example. And you know, as users, we are given access to that data temporarily.

Until that day where Google suddenly decides to lock your account and you are locked out of all of the documents that you ever created with Google Docs, or until the startup software as a service product you’re using, suddenly goes bust and decides to shut down their product with 2 weeks' notice and maybe allows you to download a zip file full of JSON files as your data export. And I find that tragic because as creative people, we put a ton of effort, time and our souls and really our personalities into the things that we create. And so much now the things that we create are computer-based things, you know, whether you’re writing the script for a play or whether you’re negotiating a contract or whether you’re doing any sort of endeavor, it’s probably a file on a computer somewhere. And if that file is in some cloud software, then there’s always this risk that it might disappear and that you might lose access to it. And so what we try to do with local first software is to articulate a vision for the future where that does not happen, where We have the same convenience that we have with cloud software that is we have the same ability to do real-time collaboration. It’s not back to the old world of sending files back and forth by email. We still want the same real-time collaboration that we get with Google Docs, but at the same time we also want the files stored. On our own computers. Because if there are files on our own computers, then nobody can take them away. They are there, we can back them up ourselves. We can optionally back them up to a cloud service if we want to. There’s nothing wrong with using a cloud service as long as the software still continues working without the cloud service. Moreover, we want the software to continue working offline so that if you’re working on a plane or working on a train that’s going through a tunnel or whatever, the software should just continue to work. And we want better security and privacy because we don’t want cloud services scanning through the content of all of our files. I think for creativity, it’s important to have that sense of privacy and ownership over your workspace. And so those are some of the ideas that we try to encapsulate in this idea of local first software. So how can we try to have the best of both worlds of the convenience of cloud software. But the data ownership of having the files locally on your own device.

00:10:15 - Speaker 2: Yeah, for me, the core of it is really agency and much of the value of cloud, and I think there’s a version of this also for mobile apps, let’s say and app stores and that sort of thing, which is not what we’re addressing in the paper, but maybe there’s a theme in computing that we’ve made computers vastly more accessible by In many cases, taking agency from people, and that’s actually a good thing in many cases, right? You don’t need to defrag your hard drive anymore. You lose your device, your email, and your photos and all those things are still in this cloud that’s managed by experiencedmins and product managers and so forth at companies like Google and So forth, and they can often do a better job of it in a lot of cases than an individual can. I mean, I think of managing my own email servers, SMTP servers, years back and needing to deal with data backup and spam filtering and all that kind of thing, and Gmail came along and I was just super happy to outsource the problem to them.

Absolutely. They did a better job managing it.

So I think that’s basically in many ways a good trend or as a net good in the world, and I don’t think we feel like we necessarily want to go back to everyone needs to do more of those data management tasks, but I think for the area of creative tools or more, I guess you call them power users, but it’s like you said, if you’re writing a play, that’s just a very different kind of interaction with a computer than the average person doing some calendar and email and messaging.

Yeah, maybe they want different trade-offs. It’s worth doing a little bit more management and taking a little more ownership to get that greater agency over something like, yeah, my work product, the script of my play or my master thesis or whatever it is that I’m working on is something that really belongs to me and I want to put a little extra effort to have that ownership.

00:12:08 - Speaker 1: Right, exactly. And I feel like it’s not reasonable to expect everyone to be a sys admin and to set up their own services, you know, you get this self-hosted cloud software, but most of it is far too technical for the vast majority of users, and that’s not where we want to go with this.

I think you still want exactly the same kind of convenience of clouds software that, you know, it just works out of the box and you don’t have to worry about the technicalities of how it’s set up.

But one part of local first software is that because all of the interesting app specific work happens client side on your own device, it now means that the cloud services that you do use for syncing your data, for backing up your data, and so the cloud services become generic. And so you could imagine Dropbox or Google Drive or AWS or some other big cloud provider just giving you a syncing service for local first apps.

And the way we’re thinking about this, you could have one generic service that could be used as the syncing infrastructure for many different pieces of software.

So regardless of whether the software is a text editor or a spreadsheet or a CAD application for designing industrial products or music software or whatever it might be, all of those different apps. Could potentially use the same backup and syncing infrastructure in the cloud, and you can have multiple cloud providers that are compatible with each other and you could just switch from one to the other.

So at that point, then it just becomes like, OK, who do you pay 6 cents a month to in order for them to store your data, it becomes just a very generic and fungible service. And so that’s what I see makes actually the cloud almost more powerful. Because it removes the lock-in that you have from, you have to use a single, the, the cloud service provided by the software offer. Instead, you could switch from one cloud provider to another very easily and you still retain the property that you’re using all of the cloud providers' expertise in providing a highly available service and you don’t have to do any admin yourself. It’s not like running your own SMTP server. So I feel like this is a really promising direction that local first software enables.

00:14:24 - Speaker 3: Yeah, for sure, indeed, and you could even describe local first software, I think, as sort of generalizing and distributing the capabilities of the different nodes.

So in the classic cloud model, you have these thin clients, they can dial into the server and render whatever the server tells them. And then you have the servers and they can store data and process it and return to clients and when you have both of those at the same time, you know, it works great, but then if you’re a client like you said, who’s in a tunnel, well too bad you can’t do anything, and the local first model is more that any node in that system can do anything, it can. Process the data, I can validate it, it can store it, it can communicate it, it can sync it, and then you can choose what kind of typologies you want. So it might be that you just want to work alone in your tunnel, or it might be that you want to subscribe to a cloud backup service that does the synchronization storage part for you while you still maintain the ability to process and render data locally. This actually gets to how I first got into what we’re now calling local first software. I was in a coffee shop with Peter Ben Hartenberg, who’s one of the other authors that Adam mentioned. And we’re talking about working together at the lab when he was a principal there, he’s now the director, and he showed me the pixel pusher prototype. So Pixel Pusher was this Pixel art app where you color individual pictures to make a kind of retrographic thing, and it was real time collaborative, but the huge thing was that there was no server, so that you had this one code base and this one app, and you got real time collaboration. Across devices, and that was the moment that I realized, you know, I was a fish in the cloud infrastructure water and I didn’t realize it. Just assumed, oh, you need servers and AWS need a whole ops team, you’re gonna be running that for the rest of your life, it’s the whole thing. Well, actually, no, you could just write the app and point at the other laptop and there you go. And we eventually kind of realized all these other benefits that we would eventually articulate as the desiderao property to the local for software article, but that was the thing that really actually kicked it off for me.

00:16:19 - Speaker 1: Yeah, and that’s aspect that the apps become really self-contained and that you just don’t have a server anymore, or if you have a server, it’s like a really simple and generic thing. You don’t write a specific server just for your app anymore. That’s something that I’m not sure we really explored very well in the local first asset as it was published, but I’ve been continuing to think about that since, you know, this has really profound implications for the economics of software development.

Because right now, as you said, like if you’re a startup and you want to provide some SAS product, you need your own ops team that is available 24/7 with one pager duty so that when the database starts running slow or a node falls over and you need to reboot something or whatever, you know, there’s just all this crap that you have to deal with, which makes it really. to provide cloud software because you need all of these people on call and you need all of these people to write these scalable cloud services and it’s really complicated, as evidenced by my book, a lot of which is basically like, oh crap, how do I build a scalable cloud service.

And with local first software, potentially that problem simply goes away because you’ve just got each local client which just rights to storage on its own local hard disk. You know, there are no distributed systems problems to deal with, no network time outs and so on. You just write some data locally and then you have this syncing code which you just use an open source library like automerge, which will do the data syncing between your device and maybe a cloud service and maybe the other services. And the server side is just non-existent. And you’ve just like removed the entire backend team from the cost of developing a product and you don’t have the ops team problem anymore because you’re using some generic service provided by some other cloud provider. And you know, that has the potential to make the development of collaborative software so much cheaper. Which then in turn will mean that we get more software developed by smaller teams, faster, it’ll improve the competitiveness of software development in general, like it seems to have so many positive effects once you start thinking it through.

00:18:22 - Speaker 2: Yeah, absolutely. For me, yeah, maybe similar to both of you, my motivations were, well, both as a user and as a, let’s say a software creator or provider on the user side, we have these 7 different points we articulate and I think you can, in fact, we even set it.

Up this way is you can give yourself a little scorecard and see which of the boxes you tick. It’ll be fun to do that for the muse syncing service when that’s up and running, but the offline capability is a huge one to me, and it’s not just the convenience. I mean, yeah, it’s.

Every time I’m working on the train and my train goes through a tunnel, and suddenly I can’t type into my documents anymore, for example, or I don’t know, I like to go more remote places to work and have solitude, but then I can’t load up Figma or whatever else, and Yeah, that for me as a user is just this feeling of it comes back to the loss of agency, but also just practically it’s just annoying and you know we assume always on internet connections, but I wonder how much that is because the software engineers are people sitting in office. Or maybe now at home in San Francisco on fast internet connections with always connected devices versus kind of the more realities of life walking around in this well connected but not perfectly so world we all live in. That’s on the user side.

00:19:42 - Speaker 1: Yeah, I feel like there’s a huge bias there towards like, oh, it’s fine, we can assume everyone always has an internet connection because yes, we happen to be that small segment of the population that does have a reliable internet connection most of the time. There’s so many situations in which you simply can’t assume that and that might be anything from a farmer working on their fields using an app to manage what they’re doing to their crops and something like that and you know, they won’t necessarily have reliable cellular data coverage even in industrialized countries, let alone in other parts of the world where you just can’t assume that sort of level of network infrastructure at all.

00:20:17 - Speaker 3: Yeah. Yeah, it’s funny you mention this because we often run into this on the summits that we have for Muse. So we were recently in rural France and we had pretty slow internet, especially on upload. I think it was a satellite connection, and we always had this experience where there are 4 of us sitting around a table and you’re looking at each other, but you can’t, you know, send files around cause it needs to go to, you know, whatever, Virginia and come all the way back.

00:20:43 - Speaker 1: It’s crazy if you think about it, it’s ridiculous.

00:20:46 - Speaker 2: Yeah, and I don’t think you even need to reach as far as a farmer or a team summit at a remote location.

I had a kitchen table in the house I lived in right before this one that was like a perfect place to sit and work with my laptop, but the location of the refrigerator, which it really couldn’t be any other place, just exactly blocked path to my router, and the router couldn’t really be any other place.

I guess I could run a wire or something, but I really wanted to sit right there and work. But again, it’s this ridiculous thing where you can’t even put a character into a document and I could pick up the laptop and walk a meter to the left, and now suddenly I can type again and you can be that to something like. G, which does have more of a local is probably one of the closest thing the true local first software where you can work and yes, you need an internet connection to share that work with others, but you’re not stopped from that moment to moment typing things into your computer.

00:21:38 - Speaker 3: Yeah, and furthermore, from the implementation perspective, even when you have a very fast internet connection, you’re still dealing with this problem.

So if I’m using an app and I type in a letter on my keyboard, between the time when I do that and when the round trip happens with the AWS server, which might be 50 or 100 milliseconds, the app needs to do something useful. I can’t wait for that full round trip. It needs to immediately show. Me what’s happening. So you inevitably have this little distributed system where you have your local process and app trying to display something immediately, and you have the remote server.

And the great elegance, I think, of the local first approach is that that’s just like another instance of the general problem of synchronizing across nodes, whereas often in other apps, that’s sort of like an ad hoc kind of second special case thing, like, oh, It’s only going to be like this for 100 milliseconds. So just kind of do a hacky solution and make it so that most of the time the right letter shows up. And that’s why you have this behavior where apps will have like an offline mode, but it like never works, because I think we mentioned this on the podcast before, there’s systems that you use all the time and systems that don’t work. This is a maximum we can link to. But again, with local first, you’re kind of exercising that core synchronization approach all the time, including when it’s just you and another server on a good connection.

00:22:50 - Speaker 1: Yeah, and from a sort of fundamentals of distributed systems point of view, I find that very satisfying because I just see this as different amounts of network latency. Like if you’re online you have network latency of 50 or 100 milliseconds. If you’re offline, you have network latency of 3 hours or however long it’s going to be until you next come back online again. To me those are exactly the same, you know, I don’t care if it’s a few orders of magnitude apart. Both the network latency both need to be dealt with and if we can use the same techniques for dealing with both standard online latency and being offline, that just simplifies the software dramatically.

00:23:25 - Speaker 2: Going back to sort of the infrastructure, fewer moving parts thing and speaking to our personal motivations, for me, the experience of running Hiroku was a big part of my motivation or fed into my interest in this because Hiokku was an infrastructure business. I didn’t quite grasp what that meant when we went into it. I just wanted a better way to deploy apps, and in the end, I enjoy writing software, I enjoy creating products that solve problems for people, but infrastructure is a whole other game. And you know, it became the point where once you’re, I don’t know if mission critical is the right word, but just something people really care about working well and you’re in the critical path.

So for example, our routing infrastructure, if it was down for 3 seconds, people would complain. So the slightest hiccup, and as they should, that was part of the service that that company is providing, and so that’s fair enough.

But then when I go, OK, well, I’m building software, when I think of, for example, Muse, where I’m providing this productivity tool to help people think and that sort of thing. I don’t want to be paged because someone went to move a card 5 centimeters to the right and our server was down or overloaded or something, so then they can’t move the card and so then they’re writing into support angrily. I’m pretty comfortable with there’s some kind of cloud syncing problem and OK, I can’t easily like push my changes to someone else, and that is still a problem, but it feels like it’s on this slightly different timeline. You’re not just blocking the very most basic fundamental operation of the software.

And so the idea that exactly as you said, it changes the economics, for me personally, I want to spend more of my time writing software and building products and less of my time setting up, maintaining and running infrastructure. So I guess looking back on the two years that have elapsed, I would say that this is probably, it’s hard to know for sure, but the Inkot switch essays, there’s a number of them that I think had a really good impact, but I think this one probably just my anecdotal feeling of seeing people cite it by. it in Twitter comments and things like that. It feels like one of the bigger impact pieces that we published, and I do really see quite a lot of people referencing that term, you know, we’ve sort of injected that term into discussion again, at least among a certain very niche narrow world of things. So, yeah, I’d be curious to hear from both of you, first, whether there’s things that looking back you wish we’d put in or you would add now, and then how that interacts with what you make of local first movement or other work that people are doing on that now.

00:25:59 - Speaker 1: I’m very happy that we gave the thing a name.

There’s something we didn’t have initially when we started writing this and we’re just writing this like manifesto for software that works better basically.

And then at some point we thought like it would be really good to have some way of referring to it and you know, people talk about offline first or mobile first, and these were all kind of established things and terms that people would throw around. And we also wanted some term X where we could say like I’m building an X type app. And so I’m very glad that we came up with this term they first because I’ve also seen people even outside of our direct community starting to use it. And just, you know, put it in an article casually without even necessarily explaining what it means and just assuming that people know what it is. And I think that’s a great form of impact if we can give people a term to articulate what it is they’re thinking about.

00:26:50 - Speaker 2: Yeah, language, a shared vocabulary to describe something as a very powerful way to, one, just sort of advance our ability to communicate clearly with each other, but also, yeah, there’s so many ideas. I mean, it’s a 20 something page paper and there’s so many ideas, but you wrap this up in this one term and for someone who has downloaded some or most of these ideas, that one term can carry all the weight and then you can build on that. You can take all that as a given and then build forward from there.

00:27:19 - Speaker 1: Yeah. One thing I wish we had done more on is I think trying to get a bit more into the economic implications of it.

I guess that would have made the essay another 5 pages longer and so at some point we just have to stop. But I feel like it’s quite an important aspect like what we talked about earlier of not having to worry about back ends or even just like not having to worry generally about the distributed systems problem of like you make a request to a server, the request times out. You have no idea whether the server got the request or not. Like, do you retry it? If so, how do you make the retry? Is that potent so that it’s safe to retry and so on. Like all of those problems just go away if you’re using a general purpose syncing infrastructure that somebody else has written for you. And there are other implications as well that are less clear of what about the business model of software as a service. Because there a lot of companies' business model right now is basically pay us, otherwise you’re going to get locked out of your data. So it’s using this idea of holding data hostage almost as the reason why you should pay for a product. And you know, it’s like that with Slack. Like you put all of your messages in Slack those messages were written by you and your colleagues. There’s nothing really Slack did to own those, they just facilitated the exchange of those messages between you and your colleagues. But then once you go over the, whatever it is, 10,000 messages limit, then suddenly you have to pay slack to see the messages that you wrote yourself. And generally that’s the business model with a lot of software as a service. And with local first, it’s not clear that that business model will still work so clearly. But of course, software developers still have to be paid for their time somehow. So how do we find a way of building sustainable software businesses for collaboration software but without holding data hostage? I think that’s a really deep and interesting question.

00:29:07 - Speaker 3: Yeah, I think as an aside that uh you might call it the political economy of software is understudied and underconsidered, and I would put in here like the economics of software business, but also the interaction with things like regulation and governments and the huge amount of path dependence that’s involved.

I think that’s just a huge deal.

I think we’re starting to realize it, but yeah, there’s a ton of stuff we could do and think about just for local first. Like just one kind of branch that I hope we get to explore is we mentioned how local first enables you to do totally different topologies. So with cloud software almost by definition, you have this hub and spoke model where everything goes through. The central server and the central corporation. Well with local first, you can very easily, for example, have a federated system where you dial into one of many synchronization nodes and you could even have more like a mesh system where you request and incentivize packets to be forwarded through a mesh to their destination, sort of like TCPIP networking, but for like the application layer, and it may be, you know, it’s still kind of TBD but it may be that a mesh or a distributed approach has totally different political implications from a centralized node and that might become important, so. I just think there’s a lot to think about and do here.

00:30:15 - Speaker 1: Yeah, I think so too.

And like, I would take email as an analogy maybe, which is a federated system just like what you described, like you send your email to your local SMTP server and it forwards it to the recipient’s SMTP server and the system works really well.

Certainly as criticisms like spam filtering is difficult in a really decentralized way.

Maybe spam is not a problem that local first software will have as much because it’s intended more like for collaboration between people who know each other rather than as a way of contacting people you don’t know yet.

But certainly like I think taking some inspiration from that federation and seeing how that can be applied to other domains, I think would be very interesting.

00:30:55 - Speaker 3: Yeah, and this brings us to the topic of like industrialization and commercialization, and I feel like there’s more promise than ever around local First and more people are excited about it, but I still feel like we’re just in the beginning phases of getting the ball rolling on industrial and commercial applications. And if I’m being really honest, I feel like it might have been slower than I had initially hoped over the past few years, so I’m just curious if Adam and Martin, you would reflect on that.

00:31:21 - Speaker 2: It’s always hard to say, right? The thing with any technology, but certainly my career in computing, this has always proven to be the case, is that something seems right around the corner and it stays so for 20 years. I don’t know, maybe VRs in that category, but then there’ll be a moment, suddenly it’ll just be everywhere, broadband internet or something like that.

So as a people who are both trying to advance the state of the art but also making Business decisions, you know, should I start a company? Should I invest in a company? Should I keep working on the company I’m working on based on what technologies exist or where you see things going? Yeah, you’re always trying to make accurate predictions. So yeah, I agree on one hand it felt very close to me on the basis of the prototypes we’d built, the automerged library, the reference, Martin I’ll link that in the notes here, but basically that’s a JavaScript implementation of something called CRDTs which Just I guess as a sidebar, it could be easy to think that CRDTs and local first software are kind of one and the same because they are often mentioned together and in fact our paper talks about them together, but CRDTs are a technology we find incredibly promising for helping to deliver local first software, but local first is a set of principles. It doesn’t require any particular technological solution.

Yeah, based on the strength of those prototypes, many of which worked really well, there’s the networking side of it and whether you can have that be fully kind of decentralized versus needing more of a central coordination server. But once you get past that hump, it does work really, really well.

But I think that comes back to the point you both made there about the economic model side of things, which is we have a whole software industry that’s built around people will pay for software when there’s a service connected to it.

Right, so Sass, in particular B2BASS is just a fantastic business to be in, and as a result, we’ve seen a huge explosion of software around that, but connected to that is, for example, the Fremium model, exactly like what you mentioned with Slack, the docs is one of those. Notion is one of those. They do this kind of free for individuals, but then you pay when you’re a business and then you need to come up with the feature stuff, the kinds of features that seem to be selecting for you being a business with more serious needs and something like retaining your message history is there.

I wrote a whole other shorter essay about paying for software. I’ll link that in the notes, but I think we got into a weird corner. The industry got itself into a weird painted itself into a corner because Things like Google giving you so much incredibly high quality software, Gmail, Google Docs, Google Maps, etc. for quote unquote free, but then how you’re really paying for it is your attention and your data, right? And that being kind of montizable through being able to essentially serve you ads.

And I think that’s fine and I’m very glad for Google’s existence and they found that model, but it almost feels like then it taught people that good software should be free and that you shouldn’t pay for, maybe that’s a little bit connected to the concept that software R&D basically costs nothing to make additional copies of it. So therefore, if you make this big upfront investment and then the software exists and you can duplicate it endlessly, but I think there’s a lot of things flawed about all of that. But the end place that gets you to is, OK, if someone has my data and I’m paying them to maintain it and run the servers that it’s on, I can stomach that. OK, now I’ll pay $5 a month, $10 a month for my Dropbox account or something like that. But other than that, we’ve become accustomed to, oh, if it’s an app on the App Store, the App Store is a good example of these kind of consumer economics, we just expect it to be vastly lower cost or free and good software costs money to make, and as we kind of talked about earlier, I would rather be building the software, not maintaining the infrastructure, but when you set it up so that The only way you can make money is to build software that has infrastructure, you’re actually incentivized, build that back end as soon as you can and get the user’s data in there and not necessarily hold it hostage, but just take ownership of it, because that’s what people will pay for. They won’t pay for software where they own the data themselves.

00:35:34 - Speaker 1: Yes, one thing that a friend has suggested is that when talking about the business model of local first software, we should just call it SA, like label it as SAS, market it in exactly the same way as SAS. Don’t even tell people that it’s local first software and just use the fact that it’s a lot cheaper and easier to implement local first software and use that for your own benefit in order to build the software more cheaply. But don’t actually market the local first aspect.

And I thought that’s quite an interesting idea because, you know, it is an idea that people are accustomed to and to be honest, I think the amount of piracy that you would get from people like ripping out the sinking infrastructure and putting it for something else and then continuing to use the app without paying for it, it’s probably pretty limited. So you probably only need to put in a very modest hurdle there of say, OK, like this is the point at which you pay.

Regardless of whether that point for payment is, you know, necessarily enforced in the infrastructure, it might just be an if statement in your client side up and maybe that’s fine.

00:36:38 - Speaker 2: Muse is basically an example of that. We have this membership model that is, you know, subscription is your only option and there are a lot of folks that complain about that or take issue with it, and I think there are many valid complaints you can make, but I think in many cases it is just a matter of what.

Folks are accustomed to and we want to be building and delivering great software that improves and changes over time and maps to the changing world around it, and that’s something where as long as you’re getting value, you pay for it and you’re not getting value anymore, you don’t have to pay anymore. And then a model like that basically works best for everyone, we think, again, not everyone agrees.

But then again, you do get this pushback of we are running a small service, but it’s not super critical to the application, but maybe that would be a good moment to speak briefly about the explorations we’re doing on the local first sync side, Mark.

00:37:32 - Speaker 3: Yeah, so right now Muse is basically a local only app, like it’s a traditional desktop app where files are just saved to the local device and that’s about it, and you can manually move bundles across devices, but otherwise it just runs locally and the idea is to extend be used with first syncing across your devices and then eventually collaboration across users using a local first approach.

Now we don’t plan to do, at least initially the kind of fully distributed mesh networking peer to peer thing. It will be a sync service provided by Muse and kind of baked in to the app, but it will have all those nice local first properties of it works offline, it’s very fast, all the different nodes are first class and so forth while eventually supporting syncing and and collaboration.

So, yeah, we’re going through this journey of, we had a lot of experience with um basic prototypes in a lab, but there’s a big jump to have a commercialized and industrialized product, not just in terms of charging for the business model and stuff, but in terms of the performance and it’s like all the weird things that you deal with in the real world, like versioning and schemas and the idiosyncrasies of networking and All the things that go around the core functionality, like one thing we’re thinking a lot about is visibility into the sync status and how it’s different in a local first world. Yeah, so I’m excited that we are now investing a lot in bringing local first into the real world with MS.

00:38:57 - Speaker 1: Yeah, and I feel like more generally, if we want the local first ideas to be adopted, we need to make it easy in a way that people can just take an open source library off the shelf, not have to think too much about it, plug it into their app, have a server that’s ready to go, that’s either it’s already hosted for them or they can spin up their own server and make that path. Super easy and straightforward and that’s kind of where my research is focusing of trying to get the technologies to that point.

So right now, we have some basic implementations of this stuff. So automerge is a library that does this kind of data synchronization. It includes a JSON like data model that you can use to store the state of your application. It has a sort of basic network protocol that can be used to sync up two nodes. But there’s so much more work to be done on making the performance really good, like at the moment it’s definitely not very good. We’re making progress with that, but it’s still a long way to go.

Making the data sync protocol efficient over all sorts of different types of network link in different scenarios, making it work well if you have large numbers of files, for example, not just single file and so on. And so there’s a ton of work still to be done there on the technical side. I think before this is really in a state where people can just pick up the open source library and run with it. Part of it is also like just uh getting the APIs right, making sure it has support across all the platforms, just having a JavaScript implementation is fine for initial prototypes, but Obviously, iOS apps are written in SWIFT and Android apps will be written in coin or whatever people use and so you need to have support across all of the commonly used platforms and we’re gradually getting there, but it’s a ton of work.

00:40:43 - Speaker 2: And conceptually seeing how autoerge is evolving.

And how people are trying to use it sometimes very successfully, sometimes less so, but I see this as a case of technology transfer, which is an area I’m incredibly interested in because I think it’s kind of a big unsolved problem in HCI research, computer science, honestly, maybe all research, but I’ll stick to my lane in terms of what I know, which is there is often this very excellent cutting edge research that does sit in the labs.

So to speak, and never graduates or it’s very hard or there isn’t a good path often for it to jump over that hump into what’s needed in the production world and of course in the research world you’re trying to do something new and different and push the boundaries of what was possible before and in the production commercial side you want to choose boring technologies and do things that are really reliable and known and stable, and those two, there’s often a bridge that’s hard to divide there.

Sitting in your seat as again someone who’s enmeshed in the academic world right now and you’re creating this library, you know, started as a called a proof of concept for lack of a better term, and then you have customers if that’s the right way to put it, but as an academic you would Shouldn’t have customers, but you sort of do because people want to use this library and in fact are for their startups and things like that.

How do you see that transition happening or is there a good role model you’ve seen elsewhere or just kind of figure it out as you go?

00:42:16 - Speaker 1: Well, I think where we’re trying to go with this is it’s great for automerge to have users. I don’t think of them as customers. I don’t care about getting any money from them.

But I do care about getting bug reports from them and experienced reports of how they’re getting on with the APIs and reports of performance problems and so on.

And those things are all tremendously valuable because they actually feed back into the research process and so I’m essentially using the open source users and the contributors as a source of research problems.

So with my research hat on, this is great because I have essentially here right in front of me a goldmine of interesting research problems to work on.

I just take like the top issue that people are complaining about on GitHub, have a think about how we might solve that. And often there’s enough of a nugget of research problem in there that when we solve the problem, we can write a paper about it.

It can be an academic contribution as well as moving the open source ecosystem gradually towards a point where we’ve ironed out all of those major technical problems and hopefully made something that is more usable in production.

So, I actually feel those worlds are pretty compatible at the moment. There are some things which are a bit harder to make compatible like sort of the basic work of porting stuff to new languages on new platforms, that’s necessary for real life software engineering, but there’s no interesting research to be done there, to be honest. But so far I’ve found that quite a lot of the problems that we have run into actually do have interesting research that needs to be done in order to solve them. And as, as such, I think they’re quite well compatible at the moment.

00:43:56 - Speaker 2: And like imagining or the mental picture of someone submits a bug report and one year later you come back and say, here’s the fix and also the paper we published about it.

00:44:08 - Speaker 1: I’ve literally had cases where somebody turns up on Slack and says, I found this problem here. What about it? And I said, oh yeah, I wrote a paper about it. And the paper has a potential algorithm for fixing it, but I haven’t implemented it yet, sorry.

And they go like, WTF what you put all of this thought into it. You’ve written the paper. And you haven’t implemented. And I go, well, actually sorry for me, that’s easier because if I want to implement it, I have to put in all of the thoughts and convince myself that it’s correct. And then I also have to implement it and then I also have to write all the tests for it and then I have to make sure that it doesn’t break other features and it doesn’t break the APIs and need to come up with good APIs for it and so on.

So for me actually like implementing it is a lot more work than just doing the research in a sense, but actually, Doing the research and implementing it can be a really useful part of making sure that we’ve understood it properly from a research point of view. So that at the end what we write in the paper is ends up being correct.

In this particular case, actually, it turned out that the algorithm I had written down in the paper was wrong because I just haven’t thought about it deeply enough and a student in India emailed me to say, hey, there’s a bug in your algorithm, and I said, yeah, you’re right, there’s a bug in our algorithm. We better fix it.

And so probably through implementing that maybe I would have found the bug, maybe not, but I think this just shows that it is hard getting this stuff right, but the engagement with the open source community, I found a very valuable way of Both working towards a good product but also doing interesting research.

00:45:39 - Speaker 3: I think it’s also useful to think of this in terms of the research and development frame.

So research is coming up with the core insights, the basic ideas, those universal truths to unlock new potential in the world, and it’s my opinion that with local first, there’s a huge amount of development that is needed, and that’s a lot of what we’re doing with Muse. So analogy I might use is Like a car and an internal combustion engine. If you came up with the idea of an internal combustion engine, that’s amazing. It’s pretty obvious that that should be world changing. You can spin this shaft at 5000 RPM with 300 horsepower, you know, it’s amazing, but you’re really not there yet. Like you need to invent suspension and transmission and cooling and It’s kind of not obvious how much work that’s gonna be until you go to actually build the car and run at 100 miles an hour.

So I think there’s a lot of work that is still yet to be done on that front. And eventually, that kind of does boil down or emit research ideas and bug reports and things like that, but there’s also kind of its own whole thing, and there’s a lot to do there.

There’s also a continuous analogy. I think once the research and the initial most obvious development examples get far enough along, you should have some. Unanticipated applications of the original technology. So this should be someone saying like, what if we made an airplane with an internal combustion engine, right? I don’t think we’ve quite seen that with local first, but I think we will once it’s more accessible, cause right now to use local first, you gotta be basically a world expert on local first stuff to even have a shot. But once it’s packaged enough and people see enough examples in real life, they should be able to more easily come up with their own new wild stuff.

00:47:11 - Speaker 1: Yeah, we have seen some interesting examples of people using our software in unexpected ways.

One that I quite like is the Washington Post, as in the newspaper, everyone knows. They have an internal system for allowing several editors to update the layout of the home page. So the placement of which article goes where, with which headline, with which image, in which font size, in which column. All of that is set manually, of course, by editors.

And they adopted automerge as a way of building the collaboration tool that helps them manage this homepage. Now, this is not really a case that needs local first, particularly because it’s not like one editor is going to spend a huge amount of time editing offline and then sharing their edits to the homepage.

But what I did want is a process whereby multiple editors can each be responsible for a section of the homepage and they can propose changes to their section. And then hand those changes over to somebody else who’s going to review them and maybe approve them or maybe decline them. And so what they need essentially is this process of version control, Git style version control almost, but for the structure of representing the homepage.

And they want the ability for several people to update that independently. And that’s not because people are working offline, but because people are using essentially branches using the Git metaphor.

So different editors will be working on their own local branch until they’ve got it right and then they’ll hit a button where they say, OK, send this to another editor for approval. And that I found really interesting. It’s sort of using the same basic technologies that we’ve developed with CRTTs, tracking the changes to these data structures, being able to automatically merge changes made by different users, but applying it in sort of this interesting unexpected context. And I hope like as these tools mature, we will expand the set of applications for which they can be sensibly used and In that expansion, we will then also see more interesting unexpected applications where people start doing things that we haven’t anticipated.

00:49:19 - Speaker 2: Maybe this reflects my commercial world bias or maybe I’m just a simple man, but I like to see something working more than I like to read a proof that it works.

And both are extremely important, right? So the engineering approach to seeing if something works is you write a script, you know, fuzz testing, right, you try a million different permutations and if it all seemed to work, kind of the Monte Carlo simulation test of something, and it seems to work in all the cases you can find, so it seems like it’s working.

And then there’s, I think, the more proof style in the sense of mathematical proof of here is an airtight logical deductive reasoning case or mathematical case that shows that it works in all scenarios that it’s not a Monte Carlo calculation of the area under the curve, it’s calculus to determine precisely to infinite resolution area to the curve.

And I think they both have their place kind of to Mark’s point, you need to both kind of conceptually come up with the combustion engine and then build one, and then all the things that are gonna go with that.

And I think we all have our contributions to make.

I think I probably, much as I like the research world at some point when there’s an idea that truly excites me enough, and local first broadly and CRDT. in this category, and I wanna see it, I want to try it, I want to see how it feels.

In fact, that was our first project together. Martin was, we did this sort of trello clone essentially that was local first software and could basically merge together of two people worked offline, and they had a little bit of a version history. I did a little demo video, I’ll link that in the show notes.

But for me it was really exciting to see that working, and I think maybe your reaction was a bit of a like, well, of course, you know, we have 5 years of research. Look at all these papers that prove that it would work, but I want to see it working and moreover feel what it will be like because I had this hunch that it would feel really great for the end user to have that agency. But seeing it slash experiencing it, for me that drives it home and creates internal motivation far more than the thought experiment is the right word, the conceptual realm work, even though I know that there’s no way we could have built that prototype without all that deep thinking and hard work that went into the science that led up to it.

00:51:42 - Speaker 1: Yeah, and it’s totally amazing to see something like working for the first time and it’s very hard to anticipate how something is going to feel, as you said, like you can sort of rationalize about its pros and cons and things like that, but that’s still not quite the same thing as the actual firsthand experience of really using the software.

00:52:01 - Speaker 2: All right, so local first, the paper and the concept, I think we are pretty happy with the impact that it made, how it’s changed a lot of industry discussion, and furthermore, that while the technology maybe is not as far along as we’d like, it has come a long way, and we’re starting to see it make its way into more real world applications, including news in the very near future. But I guess looking forward to the future for either that kind of general movement or the technology. What do you both hope to see in the coming, say, next 2 years, or even further out?

00:52:38 - Speaker 3: Well, the basic thing that I’d like to see is this development of the core idea and see it successfully applied in commercial and industrial settings. Like I said, I think there’s a lot of work to do there and some people have started, but I’d like to see that really land. And then assuming we’re able to get the basic development landed, a particular direction I’m really excited about is non-centralized topologies. I just think that’s going to become very important and it’s a unique potential of local first software. So things like federated syncing services, mesh topologies, end to end encryption, generalized sync services like we talked about, really excited to see those get developed and explored.

00:53:18 - Speaker 1: Yeah, those are all exciting topics. For me, one thing that I don’t really have a good answer to but which seems very interesting is what does the relationship between apps and the operating system look like in the future? Because like right now, we’re still essentially using the same 1970 Unix abstraction of we have a hierarchical file system. A file is a sequence of bytes. That’s it. A file has a name and the content has no further structure other than being a sequence of bytes.

But if you want to allow several users to edit a file at the same time and then merge those things together again, you need more knowledge about the structure of what’s inside the file. You can’t just do that with an opaque sequence of bytes. And ICRDTs is essentially providing a sort of general purpose higher level file format that apps can use to express and represent the data that they want to have just like Jason and XML are general purpose data representations and CRDTs further refined this by not just capturing the current state, but also capturing all the changes that were made to the state and thereby they much better encapsulate the what was the intent of the user when they made a certain change and then capturing those intents of the user through the operations they perform that then allows different users changes to be merged in a sensible way.

And I feel like this idea of really changes the abstractions that operating systems should provide because maybe OSs should not just be providing this model of files as a sequence of bytes, but this higher level CRDT like model and how does that impact the entire way how software is developed.

I think there’s a potential for just rethinking a lot of the stack that has built up a huge amount of craft over the past decades. And potential to like really simplify and make things more powerful at the same time.

00:55:17 - Speaker 2: Yeah, local first file system to me is kind of the end state, and maybe that’s not quite a file system in the sense of how we think about it today, but a persistence layer that has certainly these concepts baked into it, but I think also just reflects the changing user expectations.

People want Google Docs and notion and Figma. And they expect that their email and calendar will seamlessly sync across all their devices and then you have other collaborators in the mix, so your files go from being these pretty static things on the disk, you know, you press command S or ControlS and every once in a while it does a binary dump of your work that you can load later and instead it becomes a continuous stream of changes coming from a lot of different sources.

They come from, I’ve got my phone and my computer and Mark’s got his tablet and his phone, and Martin, you’ve got your computer and we’re all contributing to a document and those changes are all streaming together and need to be coalesced and made sense of.

And I think that’s the place where, for example, Dropbox, much as I love it, or iCloud, which I think in a lot of ways is a really good direction, but both of those are essentially dead ends, because they just take the classic static binary file and put it on the network, which is good, but it only takes you so far, cause again, people want Google Docs, that’s just the end of it.

And that puts every single company that’s going to build an application of this sort. They have to build the kind of infrastructure necessary to do that. And we’ve seen where I think FIMA is the most dramatic example, they just took sketch and ported it to a kind of a real-time collaborative web first environment, and the word just there is carrying a lot of weight. Because in fact, it’s this huge engineering project and they need to raise a bunch of venture capital, but then once you had it, it was so incredibly valuable to have that collaboration, and then, of course, they built far beyond the initial, let’s just do sketch on the web. But any company that wants to do something like that, and increasingly that’s not table stakes from user expectation standpoint, they want to be able to do that, but you have to do that same thing. You’ve gotta drop, you know, tens of millions of dollars on big teams to do that. It seems strange when I think many and most at least productivity applications want something similar to that. So, if that were built into the operating system the same way that a file system is, or, you know, we proposed this idea of like a firebase style thing for local first and CRDTs which could be maybe more developer infrastructure, maybe that’s also what you guys were speaking about. Earlier with kind of like AWBS could run a generic sync service. I don’t know exactly what the interface looks like, it’s more of a developer thing or an end user thing, but basically every single application needs this, and the fact that it is a huge endeavor that costs so much money and requires really top talent to do at all, let alone continue running over time and scale up, just seems like a mismatch with what the world needs and wants right now.

00:58:25 - Speaker 3: Yeah, and now I want to riff on this, Adam, because the strongest version of that vision is not only are all these apps using a local first file system, they’re all using the same one in the same way that now for our legacy apps, all your files from different applications are written to the same disk in the same way.

And furthermore, any application can access and read and write any other data, so you sort of disconnect the data from the application. And you can end to end them on top of each other.

And then this gets to the final thing here, which is one of those programs could be programs that users right.

So you sort of end user programming against real-time synced and collaborative data.

And not only is that cool because end user programming is interesting, but programming against data doesn’t really work when it’s halfway around the world, like it just kind of physically, if you need to navigate the data or follow links, it’s just too slow. You need all the data locally, which is indeed the whole promise of what we’re talking about here.

00:59:23 - Speaker 2: Well, not to mention maybe the off token dance and whatever, and you’ve got to register your application. It just comes back to this, yeah, end user programming is 100% about agency, which, as we said in the start is kind of at the core of local first, and yeah, it’s gotten increasingly harder to program your own stuff for a bunch of reasons, but one is, yeah, the data is way over there and in the care of this company and they give you their one front end to it.

If you’re very lucky, they’ll build an API and if you’re even luckier, they’ll.

You allocate an API token as an individual, not a company, to just write a little script to do something, whereas I did a lot more automation in my personal life back when so much was just a Unix shell and a file system on my local computer and a world where you can write, not so much scripts, but I think of them more as bots.

I think we even Prototype this at the very tail end of that Trello clone project which as we said, now that you’ve got this stream of changes that you are consuming from different places, the bot could be just one more of these.

And if you want to do something like automatically moving a card to a new location when something is triggered, that should be straightforward to do.

And some world like that, where you have now these streams of events, streams of data that they are being coalesced, that includes not just the devices of all the people, but also the individual programs that you may choose to write that sort of contribute to this whole evolving document. That’s a very exciting future for me.

01:00:55 - Speaker 1: Yes, and if we can get to that point where it’s so easy to write the collaborative software that you can just have software be collaborative by default. And so easy to have this streaming integration with bots that you just do it by default. Then we’re in a situation where like this can actually be used in practical reality.

01:01:15 - Speaker 2: Well, I think we should wrap it there. Thanks everyone for listening.

If you have feedback, write us on Twitter at @museapphq, and we’re on email, hello at museApp.com.

You can help us out with a review on Apple Podcasts, and Martin, I’m so glad that you’re pushing this vision of the world forward, even though we’re not working together as directly right at the moment.

We hope that our efforts over here to try to prove local first in a commercial context, both that it can be viable for a small team. but also produces a great user experience that’s at least for now our contribution, and you’re continuing to push the state of the art in the science world.

Hopefully we can together and along with all the other folks who are doing great work in this field, see it reconvene maybe in 2 years and have some good news to report.

01:02:04 - Speaker 1: Definitely. Thanks for having me.

Discuss this episode in the Muse community