← All episodes
Metamuse Episode 70 — December 08, 2022

Launchers with Thomas Paul Mann

A command line and a GUI are two completely different ways to operate a computer—but quick launchers and command palettes have found a way to bring them together. Thomas is building Raycast, an extensible quick launcher for macOS. He joins Mark and Adam to discuss the evolution of launchers from Quicksilver to Spotlight to the Chrome address bar; reasons to embed web technologies into a native app; and how voice interfaces like Siri and Alexa fit into this story.

Episode notes

Transcript

00:00:00 - Speaker 1: We can do the basics that Spotlight can do, but also much better. We invested a lot in the speed to make it faster to launch. We invested in file search to search files in a more predictable way. And then when you have those basics, then there’s the question, what else can you bring to this so you can start navigating and controlling your computer in a new way.

00:00:26 - Speaker 2: Hello and welcome to Meta Muse. Used as a tool for deep work on iPad and Mac, but this podcast isn’t about amuse the product, it’s about the small team and the big ideas behind it. I’m Adam Wiggins here with my colleague Mark McGranaghan. Hey, Adam. Joined today by Thomas Paulman of Raycast.

00:00:43 - Speaker 1: Hey there, happy to be here.

00:00:46 - Speaker 2: And Thomas, I understand you have some travel coming up for you and your team.

00:00:50 - Speaker 1: Yeah, that’s correct. So yeah, we had Raikas, a fully distributed company, but once a year we get together with the whole team and it’s gonna happen soon. So next week, we’re gonna go all to Greece, having a good time there. And we really enjoyed it. It’s the second time we do it. The first one we did was a huge success. It was especially the moment when the pandemic came a little bit to an end as well. So it was really good for everybody getting there. It just makes a huge difference as a remote company seeing each other in person.

00:01:19 - Speaker 2: Yeah, that’s been sort of a secret weapon for us, or maybe not so secret, which is those in-person summits fill quite a lot of what you do get out of being in an office together and gets coupled with getting to go to nice destinations and so forth.

00:01:33 - Speaker 1: It’s also cool because last year we had a few people joining us before they actually worked at Rayos, which was also the perfect onboarding for those kind of people, because, yeah, in a remote company you usually don’t see everybody always in person, but it’s made a huge difference for them.

00:01:50 - Speaker 2: And tell us a little about Raycast.

00:01:52 - Speaker 1: Sure, yeah. So for the ones who don’t know about Rayos, we often describe it as a general productivity tool, mostly targeted towards developers, but also designers and other people who really work on a computer use it.

For Mac users, the easiest to describe it is actually A spotlight on steroids. So everybody works on a Mac. No spotlight.

The basics are to launch an app, search files, do a few calculations. But with Breakers, we put another level on top of that.

So we’re connecting to third party apps like GitHub, Linar, Figma, and have like a public store where people can build extensions for, but other people can experience.

So you can think of it a little bit like an app store. So people can build something, share it with others, others can immediately install it. So it makes your work more productive, faster to do.

It’s all driven by keyboard shortcuts. It came out of an idea from me and my co-founder.

We, like, hugely obsessed with productivity, and we’re a little bit frustrated that nowadays on a computer, oftentimes there’s a lot of friction in the small and little tasks that pile up. And we thought we can do better and basically build it right cause there’s this layer on top of all the other apps that you can use them in a frick. And less way.

And so far that seems to be working very well. A lot of people enjoy that.

Building extensions with us together. We have a huge community behind us that’s helping us building those experiences. And sometimes they’re ranging also to more fun things like a gift search that you can put in a request, a nice gift, and those kind of things.

00:03:24 - Speaker 2: And we’d love to hear a little about your background, what brought you to this venture.

00:03:28 - Speaker 1: Yeah. So I’m a software engineer and my career started in mobile development.

So I worked in iOS and Android. For me, the passion there was I could build something that I can immediately experience.

And that basically, since then, I enjoyed doing, like building something that I can experience and share with others.

Before Aos, I worked at Facebook on a desktop application, also on the Mac, which was called Spark AR.

What I often described as a Photoshop for augmented reality.

So for the ones who don’t know it, it looks a little bit like Photoshop. You have a few port in the middle. You can track in 3D objects, and then you can, for example, attach it to your nose and it sticks to your nose with the augmented reality efforts that were there.

What was really interesting there, it was also community driven. So it was a tool to create something and then you can share it with others on Instagram and Facebook, and they can use those effects.

And this community aspect is really something that I fell in love with, because if you build a tool that other people can produce something with, it’s really interesting to see what they’re gonna produce with.

And so with Rayos early on, what we did there is we wanted to make our work flows faster, right? So we build up the stuff for us. And then after a while, we realized there were so many things out there or tools that we may never heard of that like a platform where people can build extensions for and share it with others is actually the way to go.

So now we have an API. People who are familiar with React can use the API very seamlessly, and then they can get into creative ways, building those extensions and share it with others. So now it, you have pretty much for every service you know of, you can find one of those extensions, can install it immediately, and can basically gain little productivity boosts throughout the day.

Which then oftentimes cut away entire friction points by interacting with slower tools, and that’s what brought us initially to rate us, right? We wanted to make. Little things faster that then have this compound effect that you just enjoy you work more, and that to this day is still our mission which we operating on to every day.

00:05:34 - Speaker 2: And we’ll link the Raycast store in the show notes.

I can certainly see the connection between the Spark AR and, you know, that’s a creative tool, certainly you’re helping other people create things, and then the joy one gets from seeing someone make something with a tool you have created that does seem to be a common theme across people that are drawn to building tools as opposed to sort of end user experiences.

I’d be curious to hear a little bit about the technical stack. So it is a native Mac app, but I noticed when I just briefly poked at trying to build an extension for Raycast that the hello world is very much like a React component, feels very web technology-ish. How do you do that? Is it ultimately kind of all a pretty fancy electron app or is it just the extensions are kind of like using web technologies, but you use classic native development for the core app?

00:06:24 - Speaker 1: Yeah. It’s actually a question which we get asked quite often. So the app itself is 100% native. It’s written in SWIFT and doesn’t involve any HTML or CSS. So everything is rendered through Apple’s A kit.

Actually, we don’t use Swift UI yet. So that was an early decision because we felt like we’re building this app which sits on top of the system and we want to make it really part of the system with the look and feel, but also What you can integrate it with.

So early on, we thought like, hey, Swift is the way to go.

Also, like, we worked on iOS and Mac OS before, so we knew the tech stack really good, which helped us initially to just bootstrap the app really, really quickly.

But then when it comes to building an extension platform, you have a different problem to solve, right? So they actually want to extract the system away and rather want to make it accessible to as many developers as possible. And we went there a little bit on the journey to really figure out how we should build those extensions and especially the API for the extensions.

So initially, we started with like more of a version where you have basically finding a chasing schema that you give the app, and then the app renders basically what you describe in this chasing file.

But then this brought a lot of like issues when you want to build something more complex, like think about networking requests and then depending networking requests, maybe some optimistic updates to make it snappy.

So what we then saw is like, OK, there are already really good UI frameworks out there, and React is one of the most known ones.

So why not using React to build extensions and What we did is basically, you can almost describe it as a lightweight react native. So what we do is you literally write react, but instead of rendering HTML, we’re actually rendering swift components. So we’re exposing components like a list and a form, and then you can use those elements to build your extension. And then we just render that with our native engine in, in the application.

And it has two benefits, like one, Every developer who knows React can immediately write a Rayo extension without learning anything new. And 2, we keep it very consistent across extensions because we expose these high-level components like a list, and then a list has list items with a leading icon and a title and a subtitle. So all of the extensions look and feel very similar, which was very important to us.

But we also have basically the flexibility of React where you can write something really, really complex. So you see now extensions like Gitlab is a good one. It integrates with everything from Gitlab and is nowadays quite complex. It involves all else and optimistic updates, caching, and makes it really, really fast. So it’s a nice abstraction away. And it’s funny now when you have built those things initially natively, and now look at our extensions API. You actually can build those things oftentimes much faster with the extensions API now than what we have done initially natively.

00:09:24 - Speaker 2: It’s a pretty clever way to slice it because for sure, something like a quick launcher of this sort, first of all needs to be really fast, and second, absolutely has to be integrated to the operating system in a way, I think that would be hard with one of these web technology shims, but on the other hand, extensions are something that are pretty naturally.

Yeah, using some variation of web technologies fits naturally with that one because I think so many developers know it, and then maybe there’s other benefits as well in terms of, I don’t know what sandboxing or something like that, but yeah, you’re using each technology for the thing that it best suits for and then sort of bridge that gap through your system.

00:10:03 - Speaker 1: Yeah, exactly. I think it also is just a nice separation of concerns, right? So you have natively where you can make this pixel perfect UI components, and then you expose a very high level API that extension developers can use.

We’re working at the moment on a file picker, for example. It’s entirely built natively because, well, you need to interact with the operating system to pick files, right? You need to open the finder and so on. And then on the UI side or on the extension side, you can just make it a lot easier, but just say, I want to pick this file or this directory and show me hidden files as well if you want to. So you’re abstracting like a lot of stuff away that an extension developer just doesn’t need to care about anymore.

00:10:47 - Speaker 3: Yeah, I was so interested when I saw the extensions angle on Raycast, because it connects to this idea that we’ve been thinking about for years in the lab, and it’s still a background for us in Muse. It’s like end user programming, extensibility, and so on.

Yeah, and the holy grail that I’ve been after is how do you get the very high performance of a low level language like C or objective C with the security or something like a high level language and the end user approachability or something like JavaScript and React. And I think fortunately, in your case, it’s constrained enough that the performance, for example, of extensions isn’t as big of a deal in the sense that like it’s like you’re doing wild computations in the extension itself, right? So that’s kind of a degree of freedom that you have.

But the end game that I’ve long imagined is being able to write extensions that are no compromises, and that can eventually be promoted all the way up into the app and even the system, so that you don’t have the like extension world in the app world, in the OS world. It’s more like a continuum where you move back and forth according to your degree of certainty and trust. So I’m always interested to find out how people are tackling this problem because As much as I want that thing, that thing doesn’t exist, you know, it’s an open research problem determine if even can be made. So I’m always curious to see how people are tackling it.

00:12:05 - Speaker 1: Yeah, it’s a super tough problem, right? You want to have flexibility, but on the same side, you want to constrain a little bit that it fits still in the system.

So one thing which we did initially when we build the first extensions, we just build them natively to figure out essentially what we need to build and to understand DUI and the UX of something. And then we quickly came up with a paradigm. It’s like, OK, everything you do in Rao is launching a command and that command is basically a standalone thing. That can operate on its own.

And then this is basically a constraint you’re giving to a developer. Hey, as soon as this thing is launched, you can do what you want to do, but you need to launch it, right? It cannot run just randomly.

That adds certain constraints.

And then when we then came to basically the, the extension world, that was really nicely applicable because we then can say, OK, you build commands, they get executed when you launch them run within Ray cost. And then we had enough of this primitives like lists and forms that we can expose, that they can use. They’re very high performance, and then look and feel like the system. So it plurs this line of like, what is actually part of Rayos versus what is an extension to it. Like, a lot of people nowadays don’t longer know that, right? Initially, there was what we had, like core extensions and then some third party ones, but nowadays it’s like just a blurred line because all of them look and behave very similarly. And one missing ingredient that I haven’t mentioned before is like we also have all of the extensions open source and refill them. So to submit an extension that goes into the store, you essentially open a pull request with your extension. So that helps us to also keep the UX and the UI and all the behaviors very similar across extensions because I think that’s, especially for Ray cars, um, which is a tool that you use, basically about muscle memory at some point, it’s very important that the things behave very, very similar.

And then from the performance aspect of things, so one thing which we did is we run Node as our JavaScript run time. So with Node, it’s actually very performant for the little operations we do. You have also the benefit if it becomes performance and issue, you could get native modules going as well to integrate them, to get performance out of it. And then React is also for the sizes of extensions to build fast enough to produce DUI. And then we are not constrained and rendering the UI because that again we do natively. And then one thing which I think is very interesting when you integrate something in your main app, you want to make sure there is a certain boundary between main and extension. So if an extension crashes, the app should stay alive, right? So what we do is we run all of that extension code out of process. So it has a separate process. So if there is something corrupt going on in the extension that doesn’t block the main app, it stays responsive, can go back.

And I think that’s just a good user experience, right? Because we all know as developers, they’re gonna be bugs, unpredictable things, networkers fails, you maybe don’t handle it properly. So you want to make sure that the app itself behaves correctly, especially with an app like Rayos, which I use hundreds of times a day. You can’t really afford that this thing is gonna crash when there is something wrong by a third-party developer.

00:15:21 - Speaker 3: Yeah, I feel like we could do a whole podcast on this area. I think we sometimes caught it on the show the platform problem. How do you navigate the performance, security, sandboxing, isolation, consistency, developer experience. I also suspect that Adam, you want to talk about the space of wrong.

00:15:40 - Speaker 2: Indeed, as you were talking there, Thomas, I was flashing back to our platforms episode with Joe Webkin, where we talked about maybe not some of the OS level stuff you’re referencing there, Mark, but definitely some kind of store slash plug-in directory and a review process, as well as the constraints that are created for the extension developers you mentioned, for example, you know, these lists and You know, an icon next to an entry is kind of a standard thing to get back as a result of one of your extensions, and that’s potentially desirable. When Joe talked about building a slack app, he said, this is really nice. We don’t need to do much design because there’s so many constraints, like it’s just an icon and some text.

It’s only kind of so many ways to do it and in a way that is nice because you have fewer decisions to make and you can just focus on getting the thing built.

00:16:27 - Speaker 1: Yeah, I think what’s also interesting with constraints comes creativity. I mean, that’s a common phrase that you probably hear a lot in those tools that allow people to create something. But it’s actually really true. Like, yes, we provide just lists, but then if you look around with lists, you can actually build a lot of stuff, right? And then with forms, you can do a lot of data inputs. And then we have things like grits, which you can do a little bit more visual style, like showing images. But you will be surprised what people come up with.

Like one thing which I remember is just, we can render markdown and we have this detail for you where you can rend the markdown, and somebody just came up with playing snake and just rendering markdown. So it’s obviously a huge constraint if you just have markdown, but developers are creative, right? And so you can build an entire game with just markdown rendering.

So it’s always inspiring, and that’s what I mentioned initially with communities. You have like certain ideas what you can build with it, right? When you decide an API, you think like, oh, there are just certain use cases and you maybe prototype, but then when you put it out, the minute you put it out, people interpret it differently and come up with something new. And that’s super exciting about wait and see other use cases that you haven’t thought of before. And I think that’s always the interesting bit was Pretty much every platform that is built out there. It was initially with the iOS App Store as well. There were fun apps initially and then people figured out what are good apps. And then it shaped this whole ecosystem, which we now nowadays live in. But I bet at the beginning, there wasn’t really a plan where this leads, and it’s this iterative process. You put it out and see what works, what doesn’t work, and iterate on it. And a lot of people are involved in this process. By sometimes not even knowing about it, right, because they’re just building for the platform and coming up with something that pushes the boundaries.

00:18:23 - Speaker 2: So our topic today is launchers and a closely related element, which is command palettes.

Now for me, launchers is usually the term I use or the category to describe, you mentioned Spotlight earlier there, Thomas, that’s the Mac OS built in. There’s a similar one for iOS and iPad OS if you swipe down on your home screen on your phone or your tablet, you get a kind of search bar slash launcher thing. There’s quite a long history of this stuff.

One of my first introductions to it was actually the KDE Linux desktop system, and I think somewhere in, I don’t know what it was like, 2002-ish, they introduced a feature, I think it’s called KRunner, but you essentially you would press a hot key Al FF2 and you’d get this little Mini command line where you could just run a program or do some very basic things, but it was an absolute revelation because before there was always this trade-off of you’re in the terminal, the command line’s great for a lot of things, but of course it also is, can’t do many things from the GUIY world or you then you’re in the GUIY world and everything’s about clicking on menus or the occasional hot key.

So that was my introduction to it, but I feel like there’s a pretty rich history of this, and I’d love to hear. You probably are one of the most knowledgeable people on it, so I’d love to hear you walk through that a little bit.

00:19:38 - Speaker 1: Yeah, I mean, we’re working in this space, right, and looked in a lot of those things.

And as I say, it has a long history.

I think, actually, I would almost take a step back and like, you touched on the terminal, right? I think that’s kind of where all of this sparked.

I mean, it was the first interaction we had with computers where you can just interact with it by text input. And I think this is for the launchers and command pallets that dimensions.

It’s still the thing today, right? You navigate this without a mouse by text input and keyboard shortcuts. I think the true roots come from the terminal and also what I mentioned in Rayo, you run commands, which you have in a terminal.

And we also have arguments for those commands that you can give it arguments to do different things.

And when I look back into the history, I mean, after the terminal, the GUI came, right, where we made functionality available with, with elements you can click, which obviously made it a lot more user friendly. But it also came with a little bit of a downside. What you have there is you just have limited real estate, right? So you have buttons and you can only place that many buttons on a screen. And I think that at some point, you run into the limitations of that, and there are these clutter to your eyes. I think one of them. It’s quite known for its Photoshop, which has just a ton of menus, which you’re losing yourself, and there are tutorials on how to use it. But I think it just came out of the need of like software growth and functionality and you’re adding more and more, but the display stays the same, right? Uh, the real estate you have to put those things stay the same, and it comes at some point very cluttered.

00:21:14 - Speaker 2: A metaphor I used to use when explaining to people why I used the terminal.

This was, I don’t know, decades ago and I think Folks once, for example, Windows came along and made GUI’s pretty mainstream and they would see me using this computer, what they saw as a more archaic way, and I would usually describe it as, OK, well, menus are like going into a restaurant and ordering from a menu where you’re pointing to pictures on the menu, but like, exactly, you can’t have a lot of nuance. You can point to this picture or that picture, but that’s kind of it. Whereas if you want to have a more in-depth conversation with the chef about all the fine flavors that are in it and how you’re going to tweak it and that sort of thing, like, then you need the power of full language. And that’s to me what a command line is more like having a conversation with the computer.

00:22:03 - Speaker 1: Yeah, I think that describes it actually very nicely because it’s oftentimes even a back and forth, right, where you give the computer one command, it gives you back an answer.

You use this answer to pipe it to a different command and do something with it. And this is just very hard to replicate in GUI, right? If not even impossible.

But I think in GUI, but then at some point, people realize that there is too much and they try to put in a search for those functionality.

And one of the first ones that I remember was in Mac the help menu. So if you click on Help, you have this search field, you can put something in and you search all the menus that you have there, and then you can click that. And that was actually quite nice to use the software you have there. And so it made it more accessible, can find those things. But it was a rather hidden feature, right? Like, it was behind the help button, and usually you don’t like to click help, right? It feels like you can’t use the software. You need to press the help button. So it’s really not. What do you want to click that often?

00:23:02 - Speaker 2: Yeah, I use the help Mac OS search as a quick launcher all the time for, yeah, functions I don’t use all that often, maybe like spell check and sublime text. The way I invoke it is I click help and I type SPE and then I click, you know, basically the first result, and probably there’s a similar thing with, yes, so my video editing software where there’s just so many functions in it. And yeah, I guess that one thing, there is some key command I could memorize, but I just don’t use it quite enough, but I know what to search for and it’s very quick to type it in, so I just do that.

00:23:34 - Speaker 1: Yeah, and it’s quick enough, right? And the other thing is like what the help menu, I think struggles with is it’s just constrained to one application at a time.

So you couldn’t use it in your video editor to search something completely unrelated to the video editor and do an action outside of it, like launching a link or another app, right? So it wasn’t possible back then.

But then other apps also picked up this behavior. I think one of the first ones that I know that had kind of like a built-in command palette. It was for me sublime. I think initially it was just the file search, but it was extremely efficient. It just popped up the keyboard shortcut, you search for it. And that’s how I learned navigating around files. I lost then basically the sidebar wasn’t really relevant for me anymore, right? It was really just the opening it up, search for the file, continue where you want to program, and then again, opening it. And then they added also functionality in a similar menu. It had a different keyboard shortcut. But you can then search the actions that you can do. This were just menu items I think initially and then even more functionality which wasn’t available in the menu item. And I think a big difference there was it was just front and center. You press this one keyboard shortcut, and you know, you can do everything with it. So for me, that was the point when I almost stopped using keyboard shortcuts that heavily because I knew there was a lot of functionality in there that I don’t use that regularly, but I know how to find it and it’s reliable. And I think that’s, for me, was one of the first experiences where I felt like, this is really good. User interface, it’s still has a very clean UI. It’s not distracting, but you have this full power available via the keyboard without touching a mouse or navigating around in the menu, which is quite cumbersome.

00:25:22 - Speaker 3: Yeah, I actually find it helpful to think of all these UI inputs holistically as follows. So imagine you have a huge grid and the rows in the grid are all the operations you can do in your app.

It’s like jump to file, increase text size, indent here, collapse code block, and the columns are the different ways to send inputs to the UI. So you have the menus. You have keyboard shortcuts, you have maybe the command bar, maybe you have Siri, and you have the help menu, and I think the best systems have a few properties.

One is they actually use all those inputs. They’re systematically connected.

The example of the help menu was a good one where you go to the Help menu and it like literally shines a light on the menu where the command is. And likewise, in the best systems, all of the operations are available in as many of the columns as possible and ideally the user has agency over managing those mappings, so they can change the key binding. And that might be reflected in the menu, you know, a little gray icon that shows what the chorca key is next to it changes as well.

00:26:32 - Speaker 1: Yeah, I think that’s interesting like to basically expose the functionality in different ways.

What’s interesting about that one is also you talk to different users, right? So I think not everybody want to use a command palette. It’s on the one end inside a simple system, but it might also be more for advanced users that really rely on the keyboard all day, but you can expose the core functionality also a very good SUI, right? It’s still very useful to have those buttons because they also tell. A story, what is an important action you want to do at the moment. It can highlight something like you have on Zoom calls, the leave button, it’s, it’s red on the end button, right? So it says to you, like, hey, there is a button. If you press that one, it’s red, so be careful about that. But it teaches the user certain interactions that are in the context very important. But then as you mentioned, there is like probably too many of those actions that you can take at any given moment that then the other utilarian things like a menu and a command pallet can shine to give you access to those actions in a more concise way.

00:27:36 - Speaker 2: Yeah, they’re much more discoverable and approachable.

00:27:39 - Speaker 3: Yeah, and in fact, a key benefit of often it’s the menu and the command palette is a complete enumeration of the options. One of the most annoying things for me about software is when I can’t discover the full set of things that are possible, it’s like hidden and there’s no way to enumerate them. But typically, if you open up a command pile and then don’t type anything, you can just press down arrow a bunch and find out all the cool and obscure stuff the app can do.

00:28:03 - Speaker 1: Yeah, definitely. It’s a good way to explore the functionality of applications, especially like the more hidden ones as you mentioned that are maybe further down.

00:28:11 - Speaker 3: And while we’re talking about the properties of these systems in general, I just want to make two kind of theoretical comments.

One is, we call them different things like launchers and command bars. I think there’s a bit of a dichotomy in here. So there’s what I would call launchers, which is like you type an app name and it launches the app. There’s search, which is you type like plain text and it finds documents that have that text in them, just like look up where you know the name of your document and you type that and it opens the document for you. There’s commands like calc, you get a calculator. And then there’s hybrid systems that do a mix of all of these, and I don’t think any of those are better or worse. I just think it’s useful to understand there’s quite a spectrum, and that often it’s pretty useful to just combine them all into one thing as Raycast does. The other point I wanted to make, and you knew this was coming, was the importance of speed and performance in these systems, and it’s subtle cause it’s not just that. The system responds quickly to input, although they usually do, and that’s often a benefit of these things, is that you often don’t need any branches at all. So if I want to increase tech size by going through a menu, I have to look at my screen, find the place I want to go, move the mouse there, visually confirm that I’m over the menu, click, confirm that it comes down, move the mouse down. To confirm, each of those visual confirmations is a branch and anytime you’re round tripping through your whole sensory system and making a conscious decision to click or not click, like you’re kind of already hosed, it’s already hundreds of milliseconds. Whereas, like another example, if I want to open the sublime map, I just hit command space SUB enter, and I don’t need to look at my keyboard. I don’t need to think. I don’t need to check any branches. I can do that all basically in one string, and a little bit later, the app will pop up. That’s a huge benefit of these systems and other kind of keyboard input systems in general.

00:29:52 - Speaker 1: Yeah, I totally agree. Speed is like a fundamental thing to this.

And it’s not only speed, it’s also the predictability because you described, you type in SUV for sublime, hit enter, right? At some point, it becomes just muscle memory. So you don’t really think about it anymore. You know, you need to go to sublime, you do the sequence that you described, command space, SUV, enter, and then you dare. So the system also needs to be predictable in a way. And that’s also sometimes a challenge, right? Being fast. Predictable, sometimes conflicts. You can’t do many things in parallel because then it becomes unpredictable what finished first, or you need to sequence it somehow. So there’s a huge technical implications there. And then also, what’s very interesting with that you can optimize as well, because it becomes a very fundamental part on how you navigate your computer. And you do a lot of interactions through it, so it becomes smarter as you type in there. So if you type SUB all the time, it recognized that maybe even earlier, if you just type SU, it already uprk because it knows, well, you’re gonna type sublime and make sure that you hit that even faster. And that’s also an interesting angle, which you can’t really have with the UI that you described in the menu where you need to do the steps yourself, and it’s just a lot slower than what the system is capable of doing.

00:31:12 - Speaker 2: Predictability is a huge one for me, and this is a place where, unfortunately, the default system ones for me fall a little short, Spotlight on Mac OS, for example, I use the file lookup aspect quite often. Or you talked about the launching applications, looking up files, searching, and for me those first two are the most important. But yeah, Spotlight will kind of maybe like it updates its cache, sort of lazily, which is fine, but what happens is you type something in. You think you see the result, you hit enter and then it changes the moment before your finger comes down. Which is to me it’s just a no go. Similarly, on the phone and on the iPad, I do use the home screen search quite a bit, often for launching applications, but sometimes for looking up documents. And if you tune it to turn off a bunch of junk, mainly the Siri suggestions that basically go out to the web, but that just takes time, especially if your network connection is not ideal, and so you tend to get this thing where it just changes underneath your finger, and to me that’s just a total no go.

00:32:13 - Speaker 3: Yeah, I feel like I’ve seen this on my Windows machine, which I don’t use very often, but occasionally I’m on it, and it’s like, you open up that whatever it is now. When I was a kid it was the start menu or whatever that is now, and it like starts searching for news stories, and like it’s looking at online help articles, it’s like, I’m looking for to do that TXT on my computer, calm down.

00:32:30 - Speaker 1: Yeah. That’s one thing which we deliberately did in Rayo.

So if you open that, it’s very predictable. We basically make sure that it’s a fast algorithm that matches all your entries, but it doesn’t do async operations like going to the network, trying to fetch something, which just ruins the predictability, or it makes it just a lot slower.

So there is a lot of engineering work went into The initial version and we did recently an iteration on top of that to make basically that as fast and predictable as possible.

And then functionality that needs to go to the network, for example, to search your linear issues.

They are in a separate command.

So you launch this command and then you’re in the command, and then they can perform an async operation, but even there, We basically build it in a way that there is always cash available, that it is fast by default, and if you need additional data, it’s getting updated in the background.

But yeah, like, I think this is sometimes undervalued. Um, making something predictable and keeping it predictably fast is sometimes tricky, but it’s hugely important for such user interfaces.

00:33:40 - Speaker 3: Yeah, and now that I’m thinking about it more, I’m realizing that the moment you stray from totally deterministic predictable, user controlled search, or basically an algorithm for that, that’s totally within the user’s control, it just becomes overwhelmingly tempting for the platform to do nefarious stuff.

The example I’m thinking of is Twitter, where Twitter, like every few days, will try to opt you into their algorithmic timeline, but you can go in there and say, no, just show me the tweets of the people that I’ve explicitly followed in the order that they posted them. And the reason I do that, like, you know, a lot of times Twitter has interesting suggestions, but they kind of can’t help themselves but suggest clickbait. And I feel like you kind of get the same dynamic whenever you have algorithmic lists. And so this is a little bastion of user control that I’m trying to maintain in my computing environments, both Twitter and the launchers.

00:34:28 - Speaker 1: Yeah. It’s interesting. One thing we did, we made it configurable how sensitive you want to have to search because the search is very personal.

People search for things differently. There’s obviously a huge overlap, but certain groups search differently. So we had initially just, um, searching for prefixes and we switched recently to Fuzzy search where you basically can search for letters that are not directly followed by each other. And that opens up just a lot more search results. So you need to rank them differently and cut them differently off.

So what we did is we basically added a preference, and we very not keen on preferences because we feel like we should ship really good defaults. And then here and there, you maybe need a few preferences. This was one of the ones which we went for, because like, we test everything with our team and we’ve already realized there, there are different styles of searching for it. So we went for a preference and made it a, a nice slider, which you basically can Configure the sensitivity of how you wanna have those mats appear in the route search and ray cost.

00:35:29 - Speaker 2: And we had started a little bit on the history of this stuff and the discussion of fuzzy search also reminds me of the first time I saw that, which was in Textmate, I think it was the command T as kind of a different way to quickly pull up files, and it felt like a an amalgamation of search and the command line, which maybe is is in the same realm as all the stuff we’re talking about here.

And we started to talk about the history a little bit. I’d be curious to hear where you think this stuff went mainstream, Thomas, because clearly, yeah, this is built into Mac OS, iPad, iOS, whether or not you like the system, default or not, it’s acknowledged by the platform maker that this is something that should be available to everyone.

Windows indeed. Has it also, again, I don’t know if start menu is the right term for it these days, but I know when you use a Windows computer these days, you hit the Windows key and your cursor focuses on a field that is pretty simple, but still like kind of one of the launchers.

So clearly all the platforms have said this is a core feature, but that wasn’t always that way. They were third party apps at the beginning and it’s interesting also in the case of Raycast, you’re kind of coming full circle and saying, well, actually we could do a lot better than what the core operating system is.

00:36:41 - Speaker 1: Yeah, definitely. I think in the early 2000s, it was the time when you look back, where a bunch of those third party launchers appeared, and Mark touched based on launchers that are basically things to launch as applications. But I think also one critical thing for launchers is that they, globally on your system. So they don’t live in an app. They’re an app themselves, which basically sits on top of everything else. And the first ones were, I think, launched by Quicksilver, both of them.

00:37:10 - Speaker 2: I have great memories of Quicksilver, yeah.

00:37:11 - Speaker 1: Yeah, Quicksilver got a lot of love back in the days. And yeah, basically, they started, I think, with launching applications, but then also thinking a step further, what is it, what you else do? You have files were big in the early 2000s, right? And you need to do something with this files. You may be opening in specific tools, you may want to send it via an email. So there were more this.

I think verb, noun input, like you find something and then you do something with it.

Initially, on the Mac, at least, this was, um, located on the top right. I think they’re done also the spotlight position that was initially there on the top right where you had this little search symbol, the magnifier glass. You clicked on it and there was a search feed popping up.

So it was highly inspired from the help menu that we chatted before, but it was just globally, right? So you clicked on it, you could search an app. And then you launch the app or you can search for files, and it launched the file. And it was the very early days of this. And then later in the 2000s, this became more of a redesign. When Spotlight became, I think, very mainstream was when they did in Yosemite, the redesigned to make it a front and center bar, that when you have the hot key command space, it pops up, and then you have this one big search field to input something. And then it finds results and you can execute on that. I think that was, for me, the tipping point when it became really mainstream because it was a really big feature in Mac. It was basically how you launch your apps, how you find your files. It was a core part of the system by then. And then a few years later, also this became basically part of iOS and basically having the same experience as you described that them to search apps and launch that as well.

00:38:58 - Speaker 3: Yeah, this history overview is quite the trip down memory lane, because this is where you start your computing when you sit down, that’s sort of like a series of pictures of all the living rooms of all the houses you’ve ever lived in. It’s pretty wild.

00:39:09 - Speaker 1: Yeah, good memories also for old operating systems to see how those evolve over time.

00:39:16 - Speaker 2: It is always vaguely shocking to see screenshots of even relatively recent past, you know, 10 years ago, Mac OS or really any operating system, certainly a, a phone screen, which of course will be massively lower resolution than what we have today, and therefore tiny when rendered 1 to 1, and yeah, it’s, you know. Technology moves fast, both in the sense of what computers can do, but also the fashion of it, I think the stylistic elements or something that are constantly evolving for good or for ill.

00:39:48 - Speaker 1: Yeah, definitely. And I think after then, Spotlight became this main thing and other third party apps built like similar feature sets out. I think another tipping point that I saw is like this just a few years ago when we chatted about text editors like Sublime or text made before and VS Code is a modern version of those as well, who has this command palettes inside. But there were other apps outside of developer tooling coming up with command pallets integrated. There were Superhuman, which is an email client, which is very focused on keyboard shortcuts and had this command pallet to make all the actions on emails accessible. There’s linear and issue tracker, similarly, where you can Navigate through it with a command panel that is built in into the tool. And then also other apps like Notion, which oftentimes focus only on search, but even that, they follow a similar interface right where you have this keyboard shortcut, oftentimes it’s either way command K or command P, which seems to be the primary keyboard shortcut that those apps select. But I think that was something which made this even more mainstream because then It got out of this more niche developer space where people experience in, in those other applications or sometimes websites, and even it goes so far that companies nowadays advertise with it, right? So you see on homepage like, oh, we have this fast user interface which is totally accessible by command pallets and that’s just super fascinating to see when such a user interface change happens, right, which we Haven’t really had that many in the past. We started with buttons, we’re still with buttons. A lot of the things are still the same primitives. I think that’s one of the primitives, at least that I remember, that just popped up rather recently in modern UI development.

00:41:42 - Speaker 2: Yeah absolutely. Also, give a shout out to one of the friends of the podcast, which is the Arc browser, and they quite cleverly took, I think it almost feels like a natural extension of the fact that you have a URL bar in browsers, and people know you go there to type in the website you want to visit.

At some point, Chrome merged that with search, so right there, that almost mark covers two of the three you were talking about. You’ve got search and the sort of look up by name.

The kind of the web version of that, and then arc take it a step further, which is now when you press that same keyboard shortcut that you would normally press to make a new tab or to activate the URL bar, that’s command to or command L, you get something that is indeed can be used as a search or URL entry, but basically is a command palette quick launcher. So I thought that was quite a nice evolution or it feels like this gradual enhancement of what was originally just a place you typed in a web address.

00:42:38 - Speaker 1: Yeah, that’s true. Yeah. I think even with just what you mentioned with Chrome is interesting, right? Initially, you just type in an address, then it became Search for history, then it became just search with the suggestions. It’s just a nice evolution of what seems to be a simple text input can actually be quite powerful and saving again a bunch of clicks or network navigations that you need to do if you don’t have that.

00:43:04 - Speaker 2: And how do you think about the fact that given that this is a built-in platform feature essentially everywhere now and you’re building, presumably what is a better version of that? I mean, I’m a recast user, so I can definitely say it is better than the built-in spotlight, but do you see that as like a challenging, I don’t know, marketing problem or sales problem to pitch the value prop of we’ll install this extra app, it does what you already have, but more or something.

00:43:31 - Speaker 1: Yeah, it’s definitely a problem we’re thinking about. I mean, one thing, Adam, that you mentioned before is people are sometimes frustrated with the buildings due to like the nonpredictable results. So often sometimes people come with those frustrations to us. One thing that we always say is like, you will always have one of those global launchers or command pallets installed, right? Because You have this one keyboard shortcut that you remember and then you’re gonna use that.

So at some point, there is a situation where you go from the building and spotlight one to recast and hopefully replace it with the keyboard shortcut that you had before to keep your muscle memory. And so one of the things that we did very early on is basically said, hey, there is this moment when you switch. So what we need to make sure is that we can do the basics that Spotlight can do. But also much better. So that’s where we invested a lot in the speed to make it faster to launch those things. We invested in file search to search files in a more predictable way. And then when you have those basics, and there’s the question, what else can you bring to this, right? And that’s when we decided on the platform aspect, because then you can integrate with pretty much anything else that is on your computer, so you can start really navigating and controlling your computer in a new way. And that goes often that far that people use third party services like Chia exclusively in Rayo because the daily operations they have is, oh, I need to create issues, or I need to see what is assigned to me. And then when I see my assigned issues and Rayo, you also can modify that and update your status. So there is nowadays, a really full flexibility and functionality in there that is not longer just like searching and launching. We rather think about what is actually the workflow you want to do. For example, I want to create a bug report. I’m writing my editor, but I don’t want to jump to the browser, navigate to Chia, open the link, open the create issue form. I would rather just press my global hot key for Raycast, search for the command, type in what I want to have for the bug report, create it, continue where I left off. So we’re really on the path of like covering full workflows instead of like just finding and opening because we believe that’s obviously some part of it, but it’s much better when you can close the loop entirely. And that’s what we kind of said with Rayos, that it really removes the friction that you usually have in a bunch of other things as well.

00:46:03 - Speaker 2: Also occurs to me you’ve gone a little bit full circle in terms of being a platform provider. So, if you started your career as building mobile apps, that meant you were dealing with the often frustrating process of going through app review. So now you’re in the position of reviewing people’s extensions, and I think the way I understand it is you can run your own local extensions as much as you want, that doesn’t need to go through a review, but if you want to put it in your store to make it really easy to share with other people, now you have to, yeah, review that pull request, right?

00:46:35 - Speaker 1: Yes, that’s correct. So you can start developing and use the extensions happily yourself. But then when we share it, we want to make sure that other people have a really good experience.

And the motivation from that actually came from a different angle that Mark mentioned. We want to blur the lines, what is built in, and what is third party contributed to Ray cost.

And for that, we really want to make sure that every extension is as high quality as possible and follow certain guidelines to make this seamless experience. So the only way we really thought about it is one, we need to have a good API that restricts so much that you can’t really break too much out of the system. But you also need to have a little bit of a refill to make sure that the UI pattern are followed properly. So that led us to making refills, um, which on iOS and went through Apple refills can be sometimes a little bit frustrating, especially if you work on something which pushes the boundaries here and there a little bit. So what we decided to do is being very transparent about that. So we thought about a lot how we do refills and we work with developers directly, right? It’s not that there’s like some marketing department in between. We work really directly with developers together. So when we think about development, there is one review process that all of us know, and it’s the pull request review process, right? We do that every day in our companies. So we thought like, why not do the same. So for our platform, we decided having one big repository where all the extensions are in. And if you want to put an extension into the store, you just open a pull request with your extension, and then as soon as it’s merged, it’s getting pushed in our store, and then other people can install it right from Rayo. So that it’s I think a really good transparency because on the pull request, we discussed with the answer or, hey, how about you do this, give a few hints here, help them, which maybe making the code here and there a little bit better, and then it gets merged. And so far we haven’t had any pushback here because it is so transparent. I think that makes it just A no brainer for a developer, right? You just open a request and not really questions asked. There is one downside to it that we experience nowadays. Like, we have, I think, more than 600 extensions by now in the repository, but this becomes quite a big repository. So the collaboration is a little bit harder. But on the flip side, if you now build the new extension, you have 600 other extensions to look at how you do something. So it’s a huge source of inspiration. It’s a huge sort of templates, essentially, because a lot of the commands are similar. So you can copy other things, or you can also contribute to it, right? That’s also very often happening right now, where people use an extension and think, Hey, actually, I would like to have this functionality, and then they can just go to the source code, modify it, spin out a pull request, and then the author can look over it, and then we can merge it together.

00:49:34 - Speaker 2: I like that a lot, and maybe it also works well because the scale you’re at, or the fact that you are sort of, these are largely kind of developer or developer-ish people, certainly power users who have some level of programming capability solving their own problems and wanting to share that with others, so maybe it doesn’t quite have the huge scale problem the iOS app store does.

Now, have you been in a position where you’ve needed to, I don’t know, reject something or reject isn’t quite the right word, I guess, say we’re not ready to accept this because you’re not complying with these things are maybe almost more subjective, you could say obviously flat out like it breaks or it’s, you know, abusive or. Tries to do something nefarious with the system. I think that’s an obvious case. But if it’s something that’s a little bit more of a judgment call, this doesn’t quite comply with RUI and as you said, someone thinks, well, yeah, but I’m pushing the boundaries in an interesting way. Essentially, you disagree and it becomes contentious. Has that happened yet? And if so, have you found a good way to sort it out.

00:50:30 - Speaker 1: Thankfully, not that often. We had a few situations where there were a few discussions, but then you usually find a way to compromise on a few angles from all sides and then we push this thing through.

Also, oftentimes you’re very proactive and just help people. Hey, this is how you could do it. Here are examples. Maybe sometimes we even push to the same branch, um, and help them modifying it in the right direction, because it’s also sometimes we aware of that people oftentime build those extensions in their free time, and we want to also make sure to respect that.

But so far we’ve been lucky. Like, we didn’t have that many outliers there. And I think it’s interesting when you work with developers together.

It’s obviously a specific audience, right? I know they handcraft very well, but they also like usually in our community, very friendly and very collaborative. And at the end of the day, everybody wants to build something which purposes other people as well.

And I think now that we have this big amount of extensions. It probably became a little bit easier for us because there are so many examples that you can just follow. So it became more of like, hey, there is a standard you should follow. And if you’re a Ray cost user, you see basically what is a good extension UI. So you experience it yourself. So when you didn’t build it, you’re just following the same patterns.

Initially, we had to form that, right? The first step, we had to form it ourselves and then we had it to bring to the community.

And then, Nowadays, I think that’s easier because we just having more people in the community, helping with that as well, and having other people who build at the 3rd or 4th extension, and I just now already know what is a good extension. And we also wrote a little bit of guidelines around that. But, well, documentation is sometimes not that everybody reads it, right, as we all know, but at least we have a reference to point people to words, and then when they read it once, they know it for the next time.

00:52:25 - Speaker 2: Yeah, I can see how the element of, let’s collaborate on making this extension you’ve made fit into our ecosystem in a way that meets our standards or will be good for everybody, that that will feel quite different from the distant reviewer who only has 10 seconds to look at your thing and issue some kind of judgment that often is even hard to understand what they’re complaining about and, you know, gestures vaguely at a rule. From this long set of guidelines that you feel like maybe doesn’t even apply.

And again, that’s partially a scale thing, but I do think that that’s certainly very powerful, being able to come back and suggest a modification in the branch of like, well, actually, you know, you do it this way, and then it’s like we’re building it together, even though, you know, they’re basically doing most of the work, but you’re working together to make this thing.

00:53:14 - Speaker 1: Yeah, we’re relying there basically on open source, right? And because it’s also open source, I think also people. I want to be presented that well and also this way collaborative. I think that’s also benefit of that and just the transparency with those modes are just much higher and nicer for everybody who is involved.

00:53:36 - Speaker 2: So as we’ve mapped out this transition over the history of computing, right, going from the terminal command line, which itself, I think was a step forward from the punch cards, the very long feedback loop, and the terminal is something that evolves into something that feels like having a conversation with the computer, going to the GUI, which obviously has its pros and cons, and then perhaps some of the merging of those two together and launchers and command pallets and so forth.

Something that certainly comes to mind for me is audio interfaces, which are maybe less of a hot thing right at this moment than they were a couple years ago, but I do feel like Siri and Amazon’s Alexa are something that could potentially fit into this story, right? We talked about how terminals like having a conversation with a computer, while these voice interfaces are actually having a conversation with the computer. How do you see those as fitting into the story, or is that different because natural language is just fundamentally kind of a different branch in the user interface history of computing.

00:54:38 - Speaker 1: Yeah, I think, as you mentioned, they became quite prominent over the last couple of years on various devices, where they were the standalone devices like the Amazon Echo, but then you also have things like Siri building to iOS initially, and nowadays also on the Mac.

It’s certainly a new way to interact with your computer via voice, which is interesting, but it’s also, I think, very challenging. I think they come from a similar pattern, especially when we look back to the terminal, right? As you say, you really have a conversation. There might be a little bit more forgiving in a way that you don’t need to specify directly a command. You can rather talk about prompts or intents, like, oh, I want to buy something or tell me something about this.

That is, I think, a big step forward to make it more forgiving.

The hard part about it is that I see is the feedback loop that you have. So, You say something, but you don’t really know exactly what you’re getting back. And it’s very hard to learn that, where you’re then falling back properly into certain patterns and try to figure out how you really need to communicate with the computer that they understand you, which I think is the tricky bit, which almost reminds me a bit of the terminal where you also need to figure out how to talk to this computer.

Because there is not that much help. We now have things like autocompletion there, but it’s still like a little bit bumpy to use a terminal.

I think similar is on an audio, versus when I compare it to launches and command pallets, you have a really quick feedback loop because you enter something, you get suggestions that you’re essentially filtering down to what you’re looking for, and then you execute that. So it’s very, well, predictable, as you mentioned before. But it’s also very intuitive, whereas the audio is much more abstract and you don’t really have a good grasp on what the computer can do for you, how does that recognize what you’re saying, which I think is the main downside of it.

On the flip side, you use cadence for it. I think things in the car, where you basically have your hands and your concentration on something else. I think the input mechanism of voice is just super interesting for those kinds of things. But I find it hard to believe that this is how we’re gonna work if professionally with computers because it’s so different to what we used to, but I might be a little bit old school here.

00:57:02 - Speaker 3: Yeah, this is a very interesting prompt. So I’ll be honest that I was not, and I’m not a fan of the original black box audio interfaces, so Siri and Amazon Echo, and it was for two reasons.

One is they were totally black boxed and cloud connected and I felt like you were just putting an always on microphone in your home, which was always extremely suspicious to me, but also, They were black box in the sense of you kind of didn’t know what they could do or what they were thinking, but now I’m thinking back to our big grid, and it would be amazing if we made audio, just one more column, like keyboard shortcuts and command palace, and if you had the same level of agency and visibility.

So imagine if you opened up Your launcher app, and you started saying things, and it started narrowing down the commands and highlighting more brightly those that sounded closer to what you were actually talking about. That would be a great way to get feedback and to be able to understand the full palette of options, if you will, for the command interface.

And now going back to our discussion about performance, it’s interesting to think a little bit theoretically about the speed, if you will, of these different input methods. So typing is quite fast because it’s very precise as well as having a high number of hits per second.

Each key you hit is, I don’t know, it’s probably a couple bytes, and you can do a lot in quick succession. And voice, if you think about where it might be most useful, it’s probably cases where you have a relatively high number of bits to input, and you can have a relatively high degree of confidence that you’re gonna get the right answer, because as you were alluding to before, the feedback and follow up is a bit of a mess on voice. Like I had this in the car sometimes where I’m like, you know, give me directions to the gas station, it’s like calling mom’s like, why not, you know, what are you talking about? But if you have a relatively long input and you have a high degree of confidence that it’s gonna get the right answer, then voices are pretty good.

And but as well as there are just cases where, for whatever reason you don’t want to be using your hands like you’re in the car or you’re doing something else with your hands or you’re out walking. So I think it could be interesting. The avenue that seems most promising to me is integrating it as a complementary mechanism versus a wholly different vertical silo.

00:59:11 - Speaker 2: Yeah, I certainly love multimodal interfaces and the idea of using hands, but voice, but also, yeah, keyboard, touch screens, and putting all of those together rather than just picking one or the other, picking the thing that’s right for the moment, whether it’s because of discoverability, whether it’s because of performance.

Yes, I can certainly speak to needing to, or wishing to operate a computing device when I barely even have one hand free, which is often the case when you’ve got a young child. So there’s a lot of value there.

But it’s funny, Thomas, because, yeah, I think the way you described it, I was originally thinking, OK, sort of naturally, voice naturally does slot in a bit with more like being like a command line because it is a language literal conversation with the computer as opposed to the gooey pointing to pictures about what you want. But the way you described it, it occurs to me, OK, well, earlier you said the key things about these launchers and command pallets is that they’re predictable and they’re fast, and those are the two things that voice are not. And some of that is probably weaknesses in our current ability of voice recognition that will get better with time, but some of it is fundamental to the format, right? You speak more slowly than you type, at least for a power user.

And the feedback mechanism, it can vary, you know, maybe if you have something on screen, you can get some degree of live feedback, but of course, especially if it’s something that is not a screen oriented thing and you need to wait for the computer to talk back to you, that’s a very slow feedback loop, and then certainly the Predictability of it probably at a minimum because we tend to lean towards natural language in those, we’re not using a direct example of a command line, right? If I was actually speaking Unix commands or something of that nature, you know, would that be more precise or more predictable? I’m not sure.

01:01:01 - Speaker 3: Well, now I’m thinking of really leaning into this.

So in the same way that you can set a keyboard shortcut, what if you could set a voice shortcut for stuff that you use all the time, you know, for example, something I would love to have as a timer, and what if I just said T 5 minutes, and it, and I could say that when I say T, that means set a timer versus like Siri, yes mark, Siri, set a timer for 5 minutes, you know, I’m not gonna do that.

I’m also wondering now. And this ties back to our platform conversation a little bit. It might just as if in the last few weeks become viable to do this outside of the big behemoths like Microsoft and Google and Apple because of these open source voice recognition algorithms. So, I think it would be a very interesting test of the extensions platform because can you run those programs within it? And the answer is probably no, not right now, but it’ll be interesting test case.

01:01:51 - Speaker 1: That would be indeed interesting. I like the idea of the shortcuts, right? Because I think one of the problems that Adam touched based on is like voice is just fundamentally slower. And when we talk about productivity, we want to make everything as fast as possible.

But like what Mark mentioned with the T 5 minutes, like interpreting that as a timer, that’s a an interesting twist to it, right, where you Tweak it to your personal needs, because you oftentimes don’t need it to be very general, right? You have a few specific use case that you know about.

Um, I sometimes turns, turn my lights off in the evening and use voice commands for that. It’s a low friction task. I don’t really care if they’re turned off immediately or like a few seconds later.

Similar with the timer, you probably also don’t care much about it. You just want to have the timer to cook your pasta or whatever.

I think those are really good use cadence for it, where it like doesn’t really matter the performance. That is, I think the conflicting with the work environment where performance really matters, which is why we still with the keyboard, every professional user, right, and types as fast as they can to put in, input into the computer.

01:03:01 - Speaker 3: Yeah, and this is also where I think it’s fruitful to try to be pretty precise and scientific about speed, because there’s speed and like bits of information per second, in which case I think voice is actually quite fast. I think it’s faster than typing.

Don’t quote me on that, we have to, you know, type it out and say it out, but I think it is, but there’s also a higher startup time for voice, kind of higher overhead, and then there’s the sort of loss factor that you get from the reduced precision of voice. But again, that suggests ways in which you can use and mitigate these different technologies that you could introduce voice short codes and you can target areas that have low loss from reduced accuracy. I think just kind of going in there with a little bit of precision is gonna be helpful.

01:03:41 - Speaker 2: Yeah, you can even imagine shaping the voice commands around what the computer can understand more precisely.

There’s a whole, I’ll pull in my interest in dog training here, which is the ability for, even though dogs have very good hearing, their ability to make out things in the frequency range that humans use to speak is actually not very good. And so, it’s much easier to speak to them in a way they can understand if you use single syllable words that have really kind of sharp consonants, so this is why it’s good to name your dog something like Spot or Spark, versus something like Tobias, I don’t know, because that really sharp single syllable is more likely, and yeah, maybe there’s a version like that, not unlike the Unix commands of yo, that would sort of cut out, you know, unnecessary vowels just to make sure they could get down to 3 or 4 letters. Maybe there’s a version of this that we get some kind of voice. Shorthand for speaking to a computer that’s more efficient, more comprehensible to the computer, easier to recognize, easier to disambiguate in a noisy setting or from different speakers, but that we learn to adapt our speech a little bit to add this new kind of input to our repertoire of ways we can interface to our devices.

01:04:57 - Speaker 3: Yeah, and this is a bit of an aside, but this also reminds me of some noodling we had done in the lab around probabilistic interfaces, and so the context there was touch and touch like voice is not precise, you know, you got these big fat fingers that cover like 1000 pixels, you know, you got oil on your hands and the cat’s walking across your screen, it’s a mess. But our intuition was that you could use the domain that you’re in to more reasonably interpret.

Touch input. So, for example, if you get touch input that’s like right where the person usually puts their non-writing hand, there’s a higher probability that it’s palm, and it should be rejected. And if there’s a finger that goes down right next to the OK button, there’s a higher probability that’s a real press.

And we were thinking, is there a way to incorporate this all probabilistically, so you don’t need to have these super deterministic yes no answers until the very end of the pipeline. That is when you’re spitting on an application action, not when you’re spitting out an XY coordinate on the screen. And I can imagine something similar with voice where they know that this user, the overwhelming thing that they use Alexa for is setting a timer. So when you hear anything that sounds like T or timer or time or set or clock, you know, there’s like 99% chance that the timers just called that. And I had the intuition that if you consider the problem holistically like that, these messier inputs could become actually quite useful and precise.

01:06:18 - Speaker 1: I think an interesting angle to that is just the channel awareness of context that you’re in. I think the timing during the day when you execute those commands might matter if you’re on a computer, what you use at the moment.

They have different functionality.

I think all of those things are not used in the most efficient way yet.

There are things like serious attractions, which try to be smart and it sometimes works, it doesn’t work. I think a huge problem with that is accuracy.

Like, if those things are too often false, you’re losing trust in them and you no longer rely on them, which is unfortunate.

But I think context, awareness can basically speed things up, right? Because a computer can make certain predictions that you’re very most likely doing this. So, as you mentioned, like, hey, when you say something that’s similar sounds to timer, it’s probably gonna be timer because you say that 100 times before.

01:07:12 - Speaker 3: Yeah, and by the way, the thing with these probabilistic systems is the hard part is getting the label data. The statistics to spit out an answer, giving labeled data and input is quite elementary.

This is another benefit of using the unified table of actions, because you can see when they’re using the keyboard and the command palette and the menu and the help, they’re always going to like set time or whatever it is, or increase fund. Size or open file to do.txt and then they can use that as sort of labeled inputs into the probability calculator to determine what are you likely to be saying.

Whereas if you’re Alexa or Siri starting from scratch, it’s kind of hard to assume anything. You know, maybe they know that people ask for music or timer slot, but what can you really assume whereas you have all this labeled data from your unified table of operations for your app or your OS or whatever, it’s quite good for bootstrapping.

01:07:58 - Speaker 2: Well, Thomas, I don’t know how soon we’re gonna get recast on the phone responding to probabilistic voice inputs. So perhaps more nearer term, what kind of things are on your road map? What’s your team working on at the moment?

01:08:10 - Speaker 1: Yeah, definitely. So yeah, we’re a little bit more crowded, I think, in the here and the yet, what we work on, we have a very active community that basically sends us suggestions, feature requests, pocket boards, day in and day out.

And we read all of them, enter all of them, and basically those things form very often in our roadmap.

So we’re working on a lot of the features that throughout the year, people popped up and want to address them throughout the rest of the year and then starting with bigger efforts next year. I think that’s an interesting thing when you have this community of very loyal users, they want to push the system with them, and we’re very happy about that and trying to make as much as possible happen with the small team that we are.

01:08:55 - Speaker 2: Yeah, like being able to spend some time on kind of directly addressing the most common feedback, just, you know, give the people what they want kind of thing. But also, you also need some periods of time where you can focus on bigger things that maybe people wouldn’t have known task for because they reflect your long term vision or things you see opportunities you see as the creator of the product that is thinking about it night and day, more than any user or customer ever could. So I think it’s healthy on a team to have some time devoted to each.

01:09:29 - Speaker 1: Yeah, definitely. You want to strike a good balance there. I think it’s one thing which is unique when you work with those developers together. They give you a lot of input and we really value that. And the least we can do is basically coming back as quick as possible with answers to this. Sometimes that is, we’re not gonna do that, then we’re also honest about it, but then we try to build as much as possible into the tool. Because at the end of the day, we’re building the tool for other people, not only us, and then we get, we listened to and try to formulate our own opinions afterwards.

01:10:05 - Speaker 2: Well, let’s wrap it there. Thanks everyone for listening. If you have feedback, write us on Twitter at museAppHQ via email, hello at museapp.com. And Thomas, thank you for continuing to push forward the world of launchers and command lines and doing things with keyboards here in 2022 because I feel like even though that may be something we’ve had have been part of computers from the beginning, I don’t think we’ve taken that all the way to its terminus.

01:10:36 - Speaker 1: Thanks for having me. It was really fun being here. And yeah, we’re not gonna stop working on the next world of keyboard navigation.

Discuss this episode in the Muse community

Metamuse is a podcast about tools for thought, product design & how to have good ideas.

Hosted by Mark McGranaghan and Adam Wiggins
Apple Podcasts Overcast Pocket Casts Spotify
https://museapp.com/podcast.rss