Rich text editing is a foundational interaction in productivity software. Slim joins Mark and Adam to explain how rich text is more than just bold and italics for prose, but also includes math equations, diagrams, slideshows, and sheet music. Their discussion includes WYSIWYG versus markup languages for end users; how block-based editors change our understanding of rich text; and why Pandoc is Slim’s favorite piece of software. Plus: how to choose the best wagon in Oregon Trail.
00:00:00 - Speaker 1: There’s been very little innovation and research more generally into what is a good interface for inputting equations. So I think most people are probably familiar with Microsoft Word or Excel have these equation editors where you basically open this palette and there is a preview and there is a button for every possible mathematical symbol or operator you can imagine.
00:00:28 - Speaker 2: Hello and welcome to Meta Muse. Muse is a tool for thought on iPad and Mac. This podcast isn’t about Muse the product, it’s about Muse the company and the small team behind it. I’m Adam Wiggins here today with my colleague Mark McGranaghan. Hey, Adam. And joined by our guest Sarah Lim, who goes by Slim. Hello, hello, and Slim, you’ve got various interesting affiliations including UC Berkeley, Notion, Inc and Switch, but what I’m interested in right now is the lessons you’ve learned from playing classic video games. Tell me about that.
00:01:01 - Speaker 1: So this arose when I was deciding whether to get the 14 inch or 16 inch M1 MacBook Pro and a critical question of our age, let’s be
00:01:10 - Speaker 1: honest. Exactly, exactly. I couldn’t decide. I posted a request for comments on Twitter, and then I had this realization that when I was 6 years old playing Organ Trail 5, which is a remake of Organ Trail 2, which is itself a remake of the original. I was in the initial outfitting stage, and you have 3 choices for your farm wagon. You can get the small farm wagon, the large farm wagon, and the Conestoga wagon. I actually don’t know if I’m pronouncing that correctly, but let’s assume I am. So I just naively chose the Conestoga wagon because as a 6 year old, I figured that bigger must be better and being able to store more supplies for your expedition would make it more successful. I eventually learned that the fact that the wagon is much larger and can store a lot more weight means that it’s a lot easier to overload it. Among other things, this requires constantly abandoning supplies to cut weight. It makes the roover forwarding minigame much more perilous. It’s a lot harder to control the wagon. And yeah, I never chose that wagon again on subsequent playthroughs, and I decided to get the 14-inch laptop.
00:02:12 - Speaker 2: Makes perfect sense to me and and what a great lesson for a six year old trade-offs, I feel like it’s one of the most important kind of fundamental concepts to understand as a human in this world, and I think many folks struggle with that well into adulthood. At least I feel like I’ve often been in certainly business conversations where trying to explain trade-offs is met with confusion.
00:02:35 - Speaker 1: They should just play Organ Trail.
00:02:37 - Speaker 2: Clearly that’s the solution. And tell us a little bit about your background.
00:02:42 - Speaker 1: Yeah, so I’ve been interested in basically all permutations really of user interfaces and programming languages for a really long time, so this includes the very different programming languages as user interfaces and programming languages for user interfaces, and then, you know, the combination of the two. So right now I’m doing a PhD in programming languages, interested in more of like the theoretical perspective, but in the past, I’ve worked on I guess, end user computing, which is really the broader vision of notion, I was at Khan Academy for a while on the long term research team.
00:03:18 - Speaker 2: Yeah, and there I think you worked with Andy Matuschek, who’s a good friend of ours and uh previous guest on the podcast.
00:03:24 - Speaker 1: Yes, definitely. That was the first time I worked with Andy in real depth, and I still really enjoy talking to him and occasionally collaborating with him today.
So, I guess, prior to that, I was doing a lot of research at the intersection of HCI or human computer interaction and programming tools, programming systems, I guess. So, one of the big projects that I worked on as an undergrad was focused on inspecting.
CSS on a webpage or more generally trying to understand what are the properties of like the code that influence how the page looks or a visual outcome of interest, and there I was really motivated by the fact that you have these software tools have their own Mental model, I guess, or just model of how code works and how different parts of the program interact to produce some output and then you have the user who has often this entirely different intuitive model of what matters, what’s important.
So they don’t care if this line of code is or isn’t evaluated, they care whether it actually has a visible effect on the output. So trying to reconcile those two paradigms, I think is a recurring theme in a lot of my work.
00:04:30 - Speaker 2: And I remember seeing a little demo maybe of some of the, I don’t know if it was a prototype or a full open source tool, but essentially a visualizer that helps you better understand which CSS rules are being applied. Am I remembering that right?
00:04:43 - Speaker 1: Yeah, so that was both part of the prototype and the eventual implementation in Firefox, but the idea there is The syntax of CSS really elides the complexity, I think, because syntactically it looks like you have all of these independent properties like color, red, you know, font size, 16 pixels, and they seem to be all equally important and at the same level of nesting, I guess, and what that really hides is the fact that there are a lot of dependencies between properties, so a certain property like Z index, you know, the perennial favorite Z index 9999999. Doesn’t take effect unless the element has like position relative, for example, and it’s not at all apparent if you’re writing those two properties that there is a dependency between them.
So I was working on visualizing kind of what those dependencies were. This actually arose because I wrote to Burt, who is one of the co-creators of CSS and was like, Hi, I’m interested in building a tool that visualizes these dependencies. Where can I find the computer readable list of all such dependencies? And he was like, oh, we don’t have one, you know, we have this SVG that tries to map out the dependencies between CSS 2.1 modules, and even there you can see all these circular dependencies, but we don’t have anything like what you’re looking for. That to me was totally bananas because it was the basic blocker to most people being able to go from writing really trivial CSS to more complicated layouts. So I was like, well, I guess this thing doesn’t exist, so I’d better go invent it.
00:06:12 - Speaker 2: Perfect way to find good research problems. Now, more recently, two projects I wanted to make sure we reference because they connect to what we’ll talk about today, which is recently worked on the equation editor at Ocean, and then you worked on a rich text CRDT called Paratext at In and Switch. Uh, would love to hear a little bit about those projects.
00:06:34 - Speaker 1: Yeah, definitely. So I guess the Paroex project, which was the most recent one was collaboration with Jeffrey Litt, Martin Klutman and Peter Van Harperberg, and that one was really exciting because we were trying to build a CRDT that could handle rich text formatting and traditionally, you have all of these CRDTs that are designed for fairly bespoke applications. They’re things like a counter data type or a set data type that has certain behavior when you combine two sets, and we’re still at the stage of CRDT development where aside from things like JSON CRDTs like automerge, we don’t really have a one size fits all CRDT framework or solution. You still mostly have to hand design and implement the CRDT for a given application.
And it turns out that in the case of something like rich text, it’s a lot harder than just saying, oh, you know, we’ll store annotations in an array and call it a day, because the semantics for how you want different types of formatting to combine when people split and rejoin sessions and things like that are all very complex and it turns out that we have a lot of learned behaviors that arise, even from just like, Design decisions in Microsoft Word, where you expect certain annotations to be able to extend, certain annotations to not extend, things like that. Capturing all of the nuance in that behavior turns out to be really difficult and requires a lot of domain specific thinking.
But we think we have an approach that works and I would really encourage everyone to read the essay that we published and try to poke holes in it too. This was like the 5th version of the. algorithm, right? So like months ago, we were like, all right, let’s start writing and then Martin, who has just an incredible talent for these things is like, hey, everyone, you know, I found some issues with the approach and, you know, oh no, 00, and sort of we fix those, we’re like, all right, you know, this one’s good and just repeat this like week after week. So I really have to give him a ton of credit for both coming up with a lot of these problems and also figuring out ways to work around it.
00:08:33 - Speaker 2: We talked with Peter a little bit recently, Peter van Hardenberg, about the pencils down element of the lab, but also just research generally, which is there’s always more to solve, you know, it’s the classic XKCD, more research needed is always the end of every paper ever written, which is indeed the pursuit of the unknown. That’s part of what makes science and Seeking new knowledge, exciting and interesting, but at some point you do have to say we have a new quantum of knowledge and it’s worth publishing that. But then I think if it’s just straight up wrong or you see major problems that you feel embarrassed by, then if you want to invest more.
00:09:09 - Speaker 1: Right, exactly. I think in this case. There was a distinction between, there’s always more we can tack on versus we wanted to get it right, you know, and in particular, the history of both operational transforms or OT and CRDT for rich text, just text in general is such that it’s this minefield of I guess to use kind of a gruesome visual metaphor, just dead bodies everywhere.
You’re like, oh, you know, such and such algorithm was published and it’s such and such time and it was new hotness for a while and then we realized, oh, it was actually wrong and this new paper came out which proved like 4 of the algorithms were wrong and so on.
And so with correctness being such an important part of any algorithm, of course, but also kind of this white whale in the rich text field, we thought it was important to at least make a credible effort to having a correct algorithm.
00:09:57 - Speaker 2: Yeah, makes sense. Yeah, I can highly recommend the Paroex essay.
One of the things I found interesting about it, maybe just for anyone who’s listening, whose head is spinning from all the specialized jargon here, CRDTs are a data structure for doing collaborative software, collaborative documents, and then, yeah, rich text, the Microsoft Word is the canonical example there.
You can bold things, you can italic things, you can make things bigger and smaller.
Well, part of what I enjoyed about this paper was actually that I felt, even if you have no interest in CRDTs, it has these lovely visualizations that show kind of the data representation of a sentence like the quick brown fox, and then if you bold quick, and then later someone else bolds fox, you know, how do those things merge together.
But even aside from the merging and the collaborative aspect, which obviously is the research, the novel research here. I felt it gave me a greater understanding of just how rich text editing works under the hood, which I guess I had a vague idea of, but hadn’t thought about it so deeply. So, highly recommend that paper. Just give them the figures, even if you don’t want to read the thousands of words.
00:11:05 - Speaker 1: I’m glad you like the figures. They were a real labor of sigma.
00:11:08 - Speaker 2: Perfect, yeah, so.
00:11:10 - Speaker 1: The one thing I would add is that CRDTs are a technology for collaboration, but the way they differ from operational transforms or OTs is that a CRDT is basically designed to operate in a decentralized setting, so you don’t need a persistent network connection to all the parts. you don’t need a centralized server. The idea is you can fluidly recover from network partitions by merging all of the data and operations that happened while you were offline, and this turns out to be really important to our vision of how collaborative editing should work because we think it’s really important for people to be able to do things like not always be editing in the same document at the same time as everyone. Maybe I want to take some space for myself to write in private and then have my changes sync up with everyone else thereafter. Maybe I’m, you know, self-conscious about other people editing. are seeing my work in progress, but I think that it would be interesting and helpful to look at what the main document looks like and how that’s evolving while I’m working in private, and you can have that kind of one way visibility with something like a CRDT versus with something like Google Docs, where it’s just sort of always online or always not editing in your own personal editor. Conversely, maybe I’m OK with everyone else seeing the work that I’m doing in progress, but I just find it really visually jarring to have all these cursors and different colors jumping around and People inserting text, bumping my paragraphs down the page. I’ve definitely been there. I’m not particularly precious about people seeing my work in progress, but I just cannot focus on writing when the page is just changing all around me. So in that situation, maybe I would want to allow other people to see my work in progress, so that we don’t duplicate effort or something like that, but I just have like a focus mode where incoming changes don’t disrupt my writing environment and these kinds of fork join one way window. Microgit style branching paradigms are really only enabled by a technology like CRDTs where you have the flexibility to separate and then come back together.
00:13:12 - Speaker 2: And I’m incredibly excited by the design research that needs to go into that.
Now at this point, I think we’re still on the technology level, you know, one way to think of it is Google Docs came along, I don’t know, 15, it’s almost 20 years ago now, I can’t even remember, let’s say 15 years ago, and this novel idea that We could both have a shared document or several people could have a shared document, all see the up to-date version and type into it and get, you know, a reasonable response or have that be coherent was an amazing breakthrough at the time and has since been kind of widely copied notion, Figma, many others.
But now maybe we can go beyond that, much more granularity, like you said, maybe borrowing from the developer version control workflows a little bit in a lightweight way, giving a lot more control and flexibility, and giving us a lot more choices about how we want to work most effectively.
But before we can even get onto those design decisions and how do we present all these different things to the user, what are the different options? We need this like fundamental underlying merge technology, hence the endless fascination that we have the lab and increasingly the technology industry generally has with CRDTs because it has the potential to enable all that.
00:14:23 - Speaker 1: Yeah, when we were working on the Paratax project, Peter was pushing really hard for, don’t make this just a technology project.
It’s a socio-technical endeavor and we need to invest a lot of time in the design component, also just doing user interviews, identifying how people interact with and.
How people collaborate in the status quo on text and Jeffrey and I actually did do a bunch of user interviews with people from all kinds of backgrounds. We’ve talked to people who write plays, people who produce a dramatic podcast kind of in this style of Night Vale.
I love Night Vale. Yeah, people who are in the writer’s room kind of working together with their collaborators on that, people who write lessons, video lessons for educational platforms. And there was a ton of really interesting Insights into user behavior around collaborative text.
We ended up just torn because we had this 12 week project and we were like, how should we best spend our time? Clearly, this is not just a technical area and we need to invest a lot in getting the design right, understanding what the design space even looks like since it hasn’t really been explored.
I really want to avoid, and this is a recurring theme in my work, I really want to avoid publishing or shipping something. And having it be this like, very broad, very shallow exploration into all the things that are possible. I think that this kind of work plays an important role, and there are a lot of people who do this well, just fermenting the space of possibilities and getting these ideas in a lot of people’s heads, who can then go on and do really cool things with them.
My personal style, I never want to feel like something is half baked, I guess, I would much rather ship this cohesive contribution like, here is an algorithm for building rich text. We think that this is a technical prerequisite to all of these interesting design choices, but the alternative with a 12 week period, and in fact, you know, this, the correctness and revision phase extended way over that. So thanks a lot to Martin and Jeffrey for leading during that part.
But it’s just already so hard to get it correct that trying to tack on a really substantive design exploration that does the area justice on top of that, I was just really worried it would stretched too thin.
So absolutely lots of room for future work in this particular. project. It’s very much a challenge in any area where you have simultaneously this rich design space that’s just asking to be explored with tons of prototypes and things like that, and then also to even realize the most simple of those prototypes, you require fundamentally new technology.
00:16:53 - Speaker 2: Yeah, I’ve been down that same path on many research projects as well, and often it’s that I’m excited for what the technology will enable, but also that in many cases it’s a combination, you know, some kind of peer to peer networking thing, but with that will enable us to provide a certain benefit to the user and I want to explore both of those things, but then that’s too much and then the whole thing is half baked exactly as you said. I’ve never found a perfect or even a good. Way to really manage that tradeoff. You just kind of pick your battles and hope for the best. Yeah, definitely. Well, I do want to hear about the equation editor project, but first I feel I should introduce our topic here, which I think folks could probably have gleaned is going to be rich text and rich text editing, and maybe we could just step back a moment and define that a little bit.
I think we know that texts, you know, symbolic representation of language is a pretty key thing, writing and the printing press and all that sort of thing. We wrote about that a little bit in our text blocks memo, which I’ll like in the show notes. But typically, I think computers for a lot of their early time and even now with something like computer code is typically plain text, that’s the dot TXT file is kind of almost the native style of text that you have and then rich text typically layers something on top of that. I don’t know, so maybe you could better define rich text for us to have a more concrete discussion about it.
00:18:21 - Speaker 1: Yeah, I think rich text for most people basically evokes things like bold, italic, underline, the ability to augment plain text with annotations that are useful in formatting, actually, I think.
Notepad to word pad is the archetypal jump in software, if you’re thinking about it from the old Windows perspective.
In the past few years, I think we’ve started to see a real expansion of what rich text can look like. So, of course, we started out with something like Markdown, which is, of course, a plain text representation. But it’s designed to be able to capture more nuance in plain text and be rendered to something like HTML which very much supports rich text.
So in Markdown, you have not only these kinds of inline formatting elements like bold and italic and hyperlinks as well.
You also have support for images, which you could think of as more block level rich text elements, I guess, and I don’t think there’s a real clear consensus across editors on how block level rich text elements should be displayed.
Of course, in between you have things like bulleted lists and those tend to be handled in a fairly standard manner with nested lists and so on, but it quickly becomes like a question of taste. Which kinds of annotations you support.
So in editors like Coto or Notion, you have all these different block types where the block is really the atom of collaboration and editing, and then you can have things like, you know, file embeds or even database views, things like that.
So I think we’re at a point now where both block-based editors, I’m using block based editors in like the text or writing sense, not the structured editors for programming sense, although I have other things to say about that, but we’re at a point where you’re starting to see these block-based editors appear and I think that there are a lot of really interesting patterns that this permits that the paragraphs via linear sequence of characters, including new lines and whitespace does not permit, or at least doesn’t allow you to build as structured tooling around.
00:20:30 - Speaker 2: I’m trying to think what is actually the core of the difference between a block-based editor, that’s a notion, a RO uses working on its own block text implementation and a flow of characters, so that’s Microsoft Word, Google Docs, maybe even text editors. I guess it’s sort of like paragraphs are separated by like these sort of nested. Elements or have a parent to the document versus like two new lines embedded in the stream of characters, but I don’t know, that seems too unsophisticated, maybe have a better definition for us.
00:21:03 - Speaker 1: So, I actually think about this very similarly to in the like programming languages and editor tools space.
There is a distinction between structured editors and regular plain text editors for programs. The idea is that you might have a text-based programming language and you can write that perfectly fine in any buffer that allows you to put sequential characters, often AI is sufficient for some languages, and then on the other hand, These programs might have a lot of inherent structure. A simple example is with lisps which are built out of these parenthesis S expressions, everything is, you know, an S expression. You can think about like the structure of the tree formed by, I guess a forest, formed by having like these S expressions with subelements and stuff. that, and then you can do manipulations directly on the structure in a way that allows you to always have a syntactically correct program or at least a partial syntactically correct program by doing things like I’m just going to take this subtree, which is a sub-expression and move it somewhere else where there’s room for another subexpression. So, I think of block-based editors as capturing a very similar zeitgeist to structured editors for code, because instead of just having this linear buffer of characters that can have, you know, formatting or things like that, you can have new lines, you actually have more of a forest structure where you have lots of like individual blocks, and then you can have blocks that are children of other blocks and so on, and that allows you to Do things like move an entire subtree representing an outline to another position in the document without selecting all of the characters, you know, cut them and then paste them somewhere else. So things like reparenting becomes a lot easier, things like setting the background of an entire subtree becomes a lot easier. Just in general, you have more structure and there’s more things you can do with that structure, I guess is how I would phrase it. One of my favorite things that you can do with this model in notion is you can change the type of a block very easily. So let’s say I have a bullet list item, and then I hit enter and enter these like subnote or something like that as children of the initial bullet list item. I can turn the bullet list item into a page, and then all of a sudden it’s just a subpage in the document, and the sub bullets that were there before are just like top level bullets in that page. And this is particularly important for my workflow because I care a lot about starting out with something like really rough and sketchy and then progressively improving it or moving up and down the ladder of like fidelity into something more polished. So you might, for instance, start off with just an outline list or even a one dimensional list of to do blocks when you’re trying to do project planning or something. And then later on, let’s say I want to put these into like a tasks database with support for like a conbond view or something like that. I don’t actually want to sit there and like recreate all of these tasks in Jira. I’ve been there, you know, I’ve been the person making all the tasks in Jira after the meeting and then assigning them to people. What the workflow that I think notion is poised to enable and can certainly do a better job in this regard, but already offers some benefits on is like, can I just highlight all of these blocks because everything is a block, move them into some existing database and have them match the schema. That kind of like allowing people to do fast and loose prototyping with very unstructured primitives and then promote them into something more structured like in a relational database setting or similar, I think is the sweet spot, structured editing provides the sweet spot between like just completely unstructured text and these very high fidelity, high effort interfaces that allows you to kind of move between them.
00:24:47 - Speaker 3: Yeah, I really like that direction and framing, and if I can extend it a little bit, I think we can also look at a continuum of richness in terms of the content itself.
So you have plain text, what you might classically call rich text with links and bold and underlying. And then you maybe start to throw a few images in, and then what if you can put it in videos and what if you have a whole table, and that table is actually a database query, and you can nest the figment document, and this way you can see that there’s sort of continuum on the richness of the document. One reason I think Notion has been so successful, they’ve been pushing along that continuum while maintaining a sort of foundation of rich textness, which is very familiar and the important basic use case for a lot of people.
A related idea is that I think we’re seeing a lot of the classic document types converge. So if you look at a rich text like a Microsoft Word and a PowerPoint and increasingly spreadsheets, those all used to be 3 distinct Microsoft Office applications, and we’re seeing the value of them being in or being the same document.
This is actually one of the motivating ideas behind Muse and a lot of the research we’ve done in the lab, and the kind of something Slim was saying, you want to take your idea continuously through different media and different modalities and different degrees of fidelity, and you don’t want to jump between different applications do that. You want to be able to do it on the same canvas. That’s by the way, one of the reasons I like Canvas. It’s not only because it’s a free multimedia surface, but also it evokes this idea of like flexibility and potentiality, and I think that’s one of the things that’s really excited about these mixed media documents.
00:26:16 - Speaker 2: And I know if Jeffrey were here, he might jump in and say that one downside to our current application silo world is that the only way to have this deeply rich text where it’s images, video, a table, a database query, something like that, is to have the Uber application, to have the everything app, and certainly notion has probably gotten pretty far on that, but others kind of in In some ways are forced to do that, like we have to do some of that in Muse as well. People come in and ask for all these different types here as well, and there’s more of like an open doc inspired or Unix inspired future that maybe Jeffrey and others, including me, would hope for, which would be more that applications could be these individual data types and you could put them all together through some kind of more operating system connection.
But that is so completely reversed from kind of how all our computing devices work today. It’s hard to see how we might get to that.
00:27:14 - Speaker 3: Yeah, I’m certainly sympathetic to that concern, although I suspect the way out is through, and you get platforms from working killer apps.
And so the way we got the whole unit ecosystem was they wanted to build a computer for, you know, writing and running programs and then eventually got all this generalized text processing stuff, but it’s not like they started in like, oh, I’m gonna make a generalized text processing machine.
I don’t think that was really the way that they approached it and developed a success. So, I’m still hopeful we could do this, but I think you got to extract it from something that’s already working as an app, but it always helps to have an eye towards that, and I think we’ve done some of that with Muse.
00:27:46 - Speaker 1: I was just going to say that it’s not me talking about texts, unless I bring up my favorite piece of software of all time, which is Pandok.
And I think that Pandok actually is very relevant to this discussion. So for those who aren’t as familiar with it, Panok brands itself as this Swiss Army knife for document formats, and it’s sort of headline contribution is that it allows you to convert between all kinds of documents.
For instance, I can take a Word document and convert it to a PDF Word documents to something like, I don’t know, IPython notebook, Jupiter notebook, back and forth across this incredible bipartite graph of formats, but I think that the subtler contribution that Pandokc makes, which is extremely significant, is that Pandok has this form of markdown called Pandok markdown that essentially aligns and supersedes all of the different fragments of markdown that we’ve seen before.
So the problem with markdown basically is that the original specification is sort of ill-defined. There are several cases in which the behavior is not super clear and then on top of that, it’s not very expressive.
There aren’t very many constructs. So things like fenced code blocks, which many people associate very closely with Markdown today, that was only added by GitHubb flavored markdown, which is certainly widely used among the programming community, but not everyone is on GitHub, of course. And then you have things like table formatting or even like strike through really strike Through wasn’t defined in the original markdown specification either. And so you have markdown and then you have like GitHub flavored markdown, common mark is sort of this unifying effort remark down all these different is the markdown cinematic universe. I tried to make a joke about this. I had this joke ready for the markdown Cinematic universe when the last Marvel. Movie came out. But then like, it didn’t get nearly the traction in my timeline as the Dune did, perhaps understandably. So really, I’m just going to have to wait till the next movie comes out. It’s a real, real tragedy. No, but like, I guess you have this real pluralism of forms and it becomes very difficult to use markdown truly as a portable format because the way it renders in one editor or even parses can very much differ from editor to editor. So, Pandoc provides this format that essentially serves as an IR or intermediate representation between all these kinds of documents using a markdown supersets that somehow magically encapsulates everything.
00:30:18 - Speaker 2: And that includes not just markdown, but also like PDFs or Microsoft Word, that seems.
00:30:24 - Speaker 1: Well, so the way it works is it’s this compilation pipeline, I guess, that allows you to go from a markdown document.
It compiles it to PDF using PDF Lawtech or something. It outputs Lawtech, it outputs HTML various things, and you can think of it as being this intermediate representation because you start with this like Word document, you can turn that into markdown and you can go from that markdown format into any of these output formats, which turns out to be like really powerful because the main issue with these kinds of conversions is that it’s often lossy, there are features that are supported by Law tech, for instance, that aren’t supported by the web natively, there are features that are part of like Word documents that aren’t necessarily supported by HTML and so on and so forth.
So Pandok serves this role of like basically saying, OK, what is an intermediate language that can encapsulate all the different implementations of the same concept across different input and output formats.
And what I think is so remarkable about it is that oftentimes when you are using an AP. of software and you’re like, oh darn, you know, now I need to support this other thing too. You quickly end up in a situation where you have the snowball and things start to feel tacked on.
So you’re like, Oh man, it’s very clear that they just glommed on this additional syntax for this feature. And with Pandok, everything feels like very principled in its inclusion. And at the same time, whenever I’m using Pandok and I’m like, darn, I really wish there was a construct that I could use to express this. particular thing, I look up in the documentation and it’s always supported. So, as one of my favorite examples, one of the output formats that Handok supports is various slideshow frameworks. So Beamer for people who use Lawtech and Reveal JS for people who use HTML and CSS and these slideshow frameworks basically allow you to replace something like PowerPoint, Keynote, Google slides with essentially like a text-based format. I really like doing slideshows in Pandock markdown. There are a few reasons for that. The first reason is that it’s really useful to be able to reuse some of the same content from like my blog post or essay even in the slideshow. There are some really minor and almost petty, but really significant reasons. Like, I like to have equations or code blocks with syntax highlighting in my slideshows, and there’s not really a good solution to putting like a syntax highlighted code block in Keynote right now.
00:32:39 - Speaker 2: Last I remembered, the gold standard at the Ruby conferences I used to frequent was to take screenshot of Textmate and paste that in.
00:32:47 - Speaker 1: Yeah, it’s awful. I don’t want to see your like monochai editor with like the weird background that contrasts weirdly with the slide background. I just, ah, and it doesn’t scale on a huge conference display anyway, I digress, but The other reason why I really like doing my slideshows in text is actually that there is often a hierarchical structure to my presentations, right? I’ll have like these main top level sections and then I’ll have subsections, and then I’ll have like sub subsections and all of these manifest and slides. But in the gooey thumbnail view of most of these existing Slideshow editors like PowerPoint or Google slides, it reduces it all to like this linear list. It’s like, here are all of your thumbnails in order. And it makes it very hard, as soon as I have like an hour-long conference talk, how do I like jump to this subsection that I know exists, aside from like scrolling past like 117 thumbnails and trying to find the right one, right? And moreover, let’s say I want to Reorder a certain part of the talk because I think it better fits the narrative structure. Now I have to like figure out which thumbnails I need to drag to which other place or worse, go into the individual slide, select the text from that, move that somewhere else, and it’s just way, way clunkier actually than reordering some text in like a bullet list outline in my editor.
And then the other part is that I was talking about how Pandok has really great support, expressive support for idioms of different formats, and one thing you often have in slideshows is that I have some element on the screen and then I press, you know, the next button again and then another element will appear.
So in Pandoc you can denote this with just like an ellipsis basically so like dot dot dot and then if I have a slide where I have a paragraph and then the dot dot dot and then another paragraph, it will render with just the first paragraph visible and then I press next and then like the subsequent paragraph comes in.
And that’s like just a very lightweight way to handle these stepped. Animations compared to going to the animation pane and then clicking the element that I want to animate in and so on and so forth.
So it started off with me being like, I’ll just prototype in this format, but then it ended up supporting columns, it supports all these things that you actually want. And I was like, this is in many ways a more ergonomic way to handle long technical slideshows. Anyway, I have to chill for Pandok anytime I talk about rich text, I’m contractually obligated to do so.
00:35:08 - Speaker 2: Yeah, it’s a great piece of software, use it here and there. I think I was doing some Asky doc kind of manuals many years ago and yeah, just in general, it’s also worth looking at the homepage that you mentioned the plot they have where it shows all the different formats that can convert between is quite fun. You click on that, you can zoom in.
00:35:26 - Speaker 1: Yeah, I had this really elaborate plan when I decided to go to Berkeley, that I was going to print out a door-sized poster of like that graph that shows all the formats they convert between and then show up at John McFarlane’s door and ask him to sign it. But then the pandemic interfered with some of those plans. Nonetheless, it remains on my list.
00:35:48 - Speaker 2: Good bucket list item, pretty unique one at that.
00:35:51 - Speaker 1: Also, I found my tweet, or I found the draft of my tweet, which is about eternals, and I said, directed by Chloe Zhao, the latest entry in the Markdown Cinematic Universe features an ensemble cast of multi markdown, GitHubb flavored markdown, PHP Markdown Extra, R Markdown, and Common Mark as they joined forces in battle against mankind’s ancient enemy, Doc X. Nice.
00:36:12 - Speaker 2: Wow. You would have gotten the like from me.
00:36:16 - Speaker 1: Yeah, we’ll see if it ever sees the light of Twitter.com.
00:36:20 - Speaker 2: You briefly mentioned there equations and La tech, and maybe that’s a good chance to talk about the equation project you did for notion. And part of what I thought was so interesting or what I think in general is interesting about equations is that they are obviously an extremely important symbolic format, but in many ways extremely different from the pros we’ve been talking about.
So English or other languages, even languages that are right to left or something like that, they all have the same kind of basic flow and the way that we represent sound. So with these little squiggly symbols, even though the symbols themselves and sounds vary and how we put them together into words across languages, that’s a common thing. If you go to the mathematical realm, you have symbolic representation, but equations are the whole own beast, and I think one that has gotten a lot less attention from kind of the software and editing world. So tell us about that rabbit hole.
00:37:16 - Speaker 1: Yeah, so just as context for people, notion and many other applications actually have long supported block equations, an equation that basically takes up, you know, most of the page horizontally.
What is much more uncommon in editors is support for inline equations and so this can be something as simple as saying, You want to type let X be a variable, and X should be formatted or stylized mathematically.
Being able to refer to elements of a block level equation in inline text is a prerequisite for being able to do any kind of serious mathematical writing, yet because this is kind of this niche area that has historically been the purview of Overleaf and other law tech editors, it’s really not implemented.
In most editors.
So I pushed really hard to add inline equations and inline math to notion, because I was like, there’s a huge opportunity for people to write scientific or mathematical documents that take advantage of all of notion’s other features like being able to embed FIMA or embed illustrations and things like that, right? So, it turns out that it’s kind of difficult, exactly as you’re describing to do this equation format.
There’s been very little innovation and research more generally into what is like a good interface for inputting equations. So I think most people Probably familiar with Microsoft Word or Excel have these equation editors, or even like operating system level sometimes where you basically like open this palette, and there is a preview and there is a button for every possible mathematical symbol or operator you can imagine. And then for composite symbols like the fraction bar or integral or something like that, you find the button for that, you click it, and then you click into like the little subboxes and then you find whatever symbol you want and you put those there too. So it’s kind of a structured editor, but like in an unimaginably cumbersome interface. This is what I used to do my lab reports in high school, for example. And then at the other end of the spectrum, you have things like law tech. Law tech is basically how everyone in at least in computer science and mathematics chooses to typeset their work, typesets complex mathematics. One of the real selling points of law tech, I think is that It turns out that operator spacing is really important, and there’s a big difference between, say, a dash that’s used like a hyphen or a dash character that’s used in text, and a hyphen or a dash character that’s used as a minus sign in an equation, the spacing is subtly different.
And one of the big things that Lawtech does is it basically allows you to declare certain operations in certain contexts as like a math operator versus just a symbol versus just like a tagged group of characters, and it correctly handles the spacing depending on what kinds of characters are around the operator in question. And so Lawtech basically produces really nice looking mathematics at the cost of this markdown which looks like I kind of smashed my keyboard that only had like 3 characters. It’s the exact opposite of the equation editors instead of having a button for every imaginable character, you only have 3 buttons. The buttons are backslash, open curly brace, and closed curly brace, and somehow like permuting those characters is supposed to get you like any possible mathematical outfit. There’s just two ends of the spectrum.
00:40:41 - Speaker 3: Yeah, I used to do my analysis homework in college in law tech, and I remember when I first looked up how you would input in law tech these formulas, like, that can’t be right. This is not the best way in the world to do this. In fact, that’s it, that’s the one and only way.
00:40:53 - Speaker 1: It really is, it’s terrifying. It’s the one and only way and the wild part is there are people who are like super, super good at law tech. They can like live tech their lecture notes. I was never nearly like that fast, but some people can do it usually with extensive use of macros, which macros are another selling point of law tech as you can define these kind of custom shorthand for operators you use a lot. But anyway, yeah, so you have a lot of tech sort of at the other end of the spectrum, like really quite unreadable, oftentimes, like, it’s like a right only format, many times.
00:41:23 - Speaker 2: And of a regular expressions come to mind on that as well, yeah.
00:41:26 - Speaker 1: It’s exactly the same zeitgeist, I think. It turns out that figuring out how to have like a combination, gooey, plain text interface that allows you to be like in a rich text editor like notion, then. into an inline equation field to have like an inline symbol and then go back into the GUI editor was like just very unexplored territory.
And it kind of makes sense that lots of people don’t prioritize this because many people that notion rightfully had the question like, oh, is this something we should be working on? But first of all, it turned out that if you actually tallied up like our user requests, inline math was like near the top.
Of editor feature based requests. And then more generally, it turns out that because this is like a prerequisite for many researchers and for students, you can get a lot of people on your platform who rely on it, you know, as a student to take notes and something like that, because there’s literally no alternative. And then they are able to stick around and use the platform for all kinds of other things.
So this is just kind of a plug that more editors should implement this.
But Yeah, I thought that this project was really interesting because in the interaction paradigm, you want to capture a lot of the things that are very fluid about editing regular text. So for instance, we knew it was important that you should be able to use the arrow keys to move left and right, kind of straight through a token without editing it if you wanted, or if you wanted to be able to go. Into a token and edit it using the arrow keys, you shouldn’t have to like use the mouse to click, although, of course, you should also be able to use the mouse to click. And when you have this formatted equation, we made the decision that the rendered equation would be represented as this atomic token. So if you were highlighting text to copy and paste and move around, it would be like highlighting a single character that would just be like the whole equation. But of course, you could go in and edit the equation. Any way you want it in kind of this pop up text editing interface.
I think another thing that’s the subtle interface challenge here is that like Mark was saying, there is often a Uh, disproportionately large number of characters used to represent the equivalent of like one character with a formatted output. And so that’s something you don’t really take into account. The output is like X with a hat in San Sara font, and then there’s like 25 characters of markup that goes into that, and you just need to like scale the interface appropriately to take that into account.
But I think that it’s really interesting because It shows the power of combining different input and output formats in like the same atom, right? So you have like a single line of text, and you want to have rich text that’s formatted and stylized and so on, hyperlinks, and then also equations or whatever inline rendered output of another input format that you have. I think that that’s really where GUI editors and whizzy wig editors can shine is being able to combine these like, Input formats and output formats like in the same line in Chu, yeah, I guess you can’t really do that at all with the terminal or something like that, and I say this as someone who uses like CLIIM for everything.
00:44:34 - Speaker 3: This is bringing back so many memories. I wish I had notion with equation support back when I was a math undergrad. It’s so nice.
00:44:41 - Speaker 1: I’m like the notion math stand guardian, I don’t know, something like that. And I’m always keeping track of like all the cool things people are doing using equations and notion.
A lot of people are doing like math blogs in notion, which is really awesome for me to see. Also, I just feel like they’re having tried lots of other things. They’re just like really isn’t. A good alternative short of like actually writing lots like for your blog, which no one really likes. And yeah, I mean, certainly it’s the kind of thing that I implemented originally, kind of, I was like, I’m gonna do this for myself, and then realized that lots of people would be able to benefit from it.
It’s been really cool to see a bit of reception it gets, like the inline math tweets on the notion, uh Twitter account overwhelmingly get the most engagement and interaction.
And initially, like the marketing team was shocked. They thought this would be the super niche feature, but no, it turns out that people love math and like, they may not be the most vocal proponents or they’re used to no one caring about math type setting, things like that.
For a while, I think it was the case that when I did find an editor that had support for equations of some kind, to me, it was overwhelmingly obvious that the people who implemented it did not regularly use equations for writing. I think you can often tell that with different features. So I think that having that kind of Representation is not quite the right word, but being able to see a feature that was designed by someone who really cares about using it themselves is really cool for people who are interested in typesetting, students, researchers, people who are interested in typesetting more mathematical text.
00:46:11 - Speaker 3: Yeah, and I think it’s really important, like you were saying that it’s mixed media because you’re combining the equations, the inline equation and the block equation, by the way, in the world class form, which is a lot tech based with a world class rich text editor with text and images and stuff. It’s really nice. I do think there’s still one frontier here, especially for math, which is the fully gradual process from you’re taking handwritten notes and you’re working out a problem and you’re drawing squiggly diagrams all the way up through your finished homework. I remember when I was at math undergrad. I would basically have to do the homework twice. You do it once on paper. Nobody could read that, including myself, so that, you know, do it in lot again. And I always wish there was a way to do it incrementally. You sort of changed equation by equation and diagram by diagram into the final product. And I know there has been some research on uh turning equations into lot tech formulas with machine learning. I don’t know if I can do handwriting, but perhaps someday we’ll get the new support for equations and you can go all the way to the end.
00:47:02 - Speaker 1: Yeah, like you, I share exactly the same frustration that you have to essentially do lots of things twice, and the relative position of everything is ambiguous, and Lawtech is what allows you to do things like have subscripts of subscripts, which would be really inscrutable in most people’s handwriting, including my own, and, you know, subscripts of subscripts along with super scripts and things like that. There are just so many ambiguous details and it turns out in my experience with like, anything that tries to automate the transition is that I always end up Going through and like really rewriting all of the details to be structured in a readable way.
You have this other problem which back in the days of like Wizzy Wig web editors like Dreamweaver and Microsoft Front Page and things like that, you would often end up with this problem where you try to do like any edit in the Wizzy Wig side and then you look at the generated HTML and it’s ridiculous. There’s just like 16 nested empty span tags, and no one would ever be able to maintain that.
And my worry is basically that when you automatically create Markup for something that has a very complex graphical representation, it’s really like one way, you know, maybe it will help you produce a compiled output, but it doesn’t actually help you go back in and like edit and tweak the representation later or it’s just so inscrutable if you do that it’s kind of also a reg x type situation.
I think we really need to get to some kind of like good intermediate representation that allows you to flexibly go both ways.
And that goes back to something that I think Adam and I were chatting about earlier, which is that a lot of people gripe and complain that like law tech is the best we have and, you know, I’m one of them, but It really is the case that, you know, lottech was just this like monumental effort by really a few people and amount of effort that would be like considered really impressive if I were to try to do the same thing but better today and not a lot of people just have like spare time to do this all in one text formatting, packaging, document representation project, even though it would have huge impact on the way people write and publish these kinds of documents. And so in many ways we’re sort of just bottlenecked on the fact that It’s hard to do incremental improvements to this particular area. We really depend on these like software monoliths to keep us afloat.
00:49:19 - Speaker 2: I’m not nearly as mathy as either of you, but I can’t help but make the comparison on these equation editing to what you mentioned earlier with kind of structured editors and programming, where whether there’s lightweight help from your text editor, things like code folding, syntax highlighting and autocomplete, or full structured editing, some of the visual programming stuff we talked about with Maggie Appleton, like Scratch, for example, or these flow based systems that are fully graph. and you sort of can’t have it in a bad state. And I can’t help but to think there might be some direction like that that is not necessarily the right only inscrutable tech, but is not the Microsoft Word one button literally for every symbol you might ever want.
It does seem like there might be some other path, and yeah, I agree it’s a monumental effort, but I mean, mathematics is so important and foundational and so much of human endeavor that certainly seems like one worth investing in, although perhaps hard to reap a profit from, and that makes it harder to put concentrated capital behind it.
00:50:20 - Speaker 1: Yeah, I think that there’s definitely very clear demand for I think something exactly like what you’re describing, which is somewhere in between the two extremes, and it is really relevant because ACM, which is the Association for Computing Machinery, the academic and professional body really for computer science, they are currently undergoing this.
Fiasco, maybe, I probably shouldn’t go on the record as calling it a fiasco.
The ACM is currently undergoing this initiative called TAPS, which is the ACM Publishing System, where they are attempting to revise the template by which all computer science research is published and disseminated, and the idea behind this is that right now, computer science research is published to these PDFs. Initially they were all two column PDFs, now I think there’s some one column PDFs. They want to output HTML as the archival format for various reasons, including that it offers much better reading experience on different screen widths, so like phones or tablets, which are increasingly how people are reading papers, not just printed out. And they are much more accessible than PDFs. PDFs are just like really quite inaccessible, especially to screen readers and other assistive technologies that are trying to parse out all the different math or whatever arbitrary formatting you’ve decided to use. The upshot of this, I guess, is that there are currently a group of very smart people who are trying to figure out how in the world we’re going to get people to start writing all of their papers and outputting them in a different format, in a world where everyone is already used to preparing. Their publications and preprints in law tech. And turns out that even if you solve the problem of like what the input syntax should be, rendering math in the browser is like an extremely unsolved problem.
00:52:05 - Speaker 3: Yeah, isn’t the state of the art that it like generates PNG and sticks it in the web page?
00:52:09 - Speaker 1: Not exactly, but like almost. OK. So MathML, which is like an XML dialect or like mathematical markup language, was this effort to build.
HTML XML style syntax for typesetting mathematics.
Naturally, it is only implemented in Firefox, so that’s really unfortunate. So in terms of the state of the art, there are basically two libraries that you can use to typeset mathematics. There’s math Jack and Caltech.
Mathjax supports basically all valid law tech, including, you know, different. Environments and equations and things like that.
The problem is that Mathjacks is very slow. So if you ever go on math overflow or another like related stock exchange and you see like all of these answers with like weird gaps, and then as you watch before you, the page starts to like load all of the rendered equations like bumping everything down one level at a time. That’s math Jackson action.
And oftentimes it is doing what you’re describing where it is outputting like an SBG or a PNG or something like that, and it’s just like reflowing the page with every equation.
So then you have Caltech, which was a library developed at Konn Academy where they realized that math Jack’s performance was basically just like not satisfactory for their exercises and things like that. Sootte supports a much more limited subset of all of Law tech syntax, but it does it all using CSS basically, and it doesn’t reflow the page for every equation. It’s basically instant surrender.
So tech is what we use at Notion, it’s also what’s used in like Facebook Messenger, which supports equations if you ever tried that, and many other websites, and basically it means that your options, if you want to render math are only target Firefox. Use a limited subset of math that’s supported by Kottech and Consign yourself to like extremely slow, dozens of reflow, full expressive power rendering to inline PNG’s.
And so that’s just not like a great situation to be in, and we haven’t even gotten to the question of like how people write math. So I would say that people underestimate like how open this problem spaces.
00:54:17 - Speaker 3: Yeah, man.
00:54:19 - Speaker 1: Just take a moment of silence to like recognize the gravity of the situation.
00:54:23 - Speaker 3: This is an aside, I don’t know if you want to put this in the episode, but now I’m curious. It sounds like both of those are interpreted in the sense that the equations are rendered at load time instead of being compiled down to some like HTML and CSS that you can render without JavaScript. Like, basically, do you need JavaScript to render these pages?
00:54:39 - Speaker 1: Yeah, basically, I should say you also need JavaScript, unless you’re doing the pre-compied to MathML and then hope that people are using Firefox.
00:54:47 - Speaker 3: Man, I feel like there’s no way that that stuff loads in 10 years, but we’ll see.
00:54:52 - Speaker 1: I actually had this exact argument, again, I don’t know if you want to put this in the episode.
I had this exact argument with Jonathan Aldrich, who’s on the taps committee when we were talking about this, and I think the point was not so much that you can guarantee that the artifact loads. Exactly the same way in 10 years, but that the representation is rich enough that one could feasibly build software that renders it the same way in 10 years. So it’s more about the fidelity of the like underlying representation where like a team of, I guess, digital, you know, archaeologists could recover the work that we were doing and not so much like we trust in the vendors to like keep everything stable, which is obviously never going to happen. You know, the only reason like PDFs are stable is because how many trillions of dollars of IP depend on being able to load the PDF the same way as it was written, you know, 30 years ago.
00:55:45 - Speaker 3: Yeah, interesting.
00:55:46 - Speaker 1: Nice. Going back to this idea earlier that Mark mentioned of the spectrum of like plain text, rich text, Wizzy wig editors.
One recurring theme for me is thinking about decoupling this spectrum into like what is the format and then what are like the editors and tools that we can use to interact with this format, so they structured, unstructured, etc.
I want to call outAR, which is a native application for Mac OS and iOS that does a really great job with this, which is that Bear is basically Something in between a whizzy wig and a plain text editor in that you’re always editing markdown documents and indeed, when you have something that’s bold, you can see the like asterisks around it that delimits that character.
But all of these standard, you know, Control B, U, editor shortcuts work as you would expect.
And more importantly, you can see like the formatting applied in real time.
So That when you do star star, hello star star, he suddenly becomes bold face in this gooey.
And so in many ways it combines like the fluidity and the real-time preview of a rich text editor or previewer with the flexibility of like ultimately just writing plain text characters. And I think this is like really unexplored area.
I don’t just mean something like Open VS code or VIM and type characters and then see like different formatting labels attached to the results.
I mean like a native application that’s really designed like for end use or end users, that doesn’t fully obscure the input syntax but does real time rendering in place.
It’s not even like in monospace font, right? It makes it feel much more like this is actually the output that you’re targeting. And not just like an input step that needs to be pre-processed. I think that there is a lot of room for applications that are kind of in between and in that same spaces where it doesn’t entirely obscure what you are writing, but it does give you a lot of the benefits of previewing things and having like a GUI application outside of the terminal in terms of like capturing the richness of the possible results.
00:57:52 - Speaker 3: Yeah, I like the bear approach a lot. Now, are there particular domains or types of documents that you think would be susceptible to this approach, or it just for rich tech specifically?
00:58:01 - Speaker 1: So I was making a list of like all of the different traditionally graphical outputs that have corresponding plain text representations and a lot of them I was thinking about, for example, in engraving sheet music, right, traditionally you would use a desktop program like Finae or Sibelius nowadays you have options like new score and flat, which are more web-based editors, but you see the staff and you click notes. In the staff like corresponding to where you want the note, and you know you use the quarter note or the 8th note cursor to pick the duration and so on.
And then at the other end of the spectrum you have Lily Pond, which is kind of like law tech I guess for engraving sheet music where you type a very like law tech-esque syntax and out comes, you know, beautifully typeset sheet music. For me this is like a little bit too. Gnu edgy, just because when I think of like composing music, I’m very much thinking about like what the staff looks like, just to be able to visualize chords and counterpoints and things like that.
But I think the upshot is that like you could very easily have something in between where you have like a text-based or non-binary representation of like a piece of music or a composition, and then you can edit it either using like the text editor or using the structured editor of an existing Wizzy wigUY like composition software or notation software rather, and edit the same representation both ways. And then likewise, you have for diagram generation. This is an area that’s been A real pain point for me historically because you can basically do something like really low fidelity, like sketching on paper, but then if you don’t want to like take a picture and upload it to whatever document, right? All of the options are like very high fidelity, like there’s omnigraphle and whimsical and Sigma, which is even more involved where you get all of these nice things like lots of styles and force directed layout and so on and so forth, but it’s like quite cumbersome to input a diagram that you sketched in all of 30 seconds into omnigraphle in its full glory. And then you have like on the plain text end of the spectrum. The software like graph is Tie for Law tech things I really like are Mermaid, which is a markdown type syntax for quickly generating diagrams. There’s SVG Bob, which is incredible. It basically lets you turn Asy art into formatted SVG though, as a brief aside, I don’t actually know what problem this is solving. Aside from being incredibly cool, because at least for me, I consider myself someone who’s like fairly artistic, and it takes at least as much effort to figure out how to make a really nice Asky art like thought bubble as it does to figure out how to actually like do the SVG. I’ve always really wanted something that basically allows you to edit it either as text, which allows you to prototype really quickly, make a fast flow chart or something like that, and something. I’ve always really wanted an intermediate representation for diagrams where you can edit it either on the text end using something like mermaid to do really fast prototyping for a flow chart or something like that. And then if I wanted to have more precision and control, I could also pull it into software like omnigraphle or Figma and make fine grain tweaks, um, get like my nice force directed layout or control where individual nodes were if I find a grain control over positioning, things like that. I guess I think there are lots of different areas outside of just traditional documents that are ripe for an editor or a representation that learns some things from the plain text approach and some things from the whizzy wig approach. I think that we are, yeah, we’re getting close to being able to explore those, but I would love to see more work in this area.
01:01:40 - Speaker 3: Yeah, this is very interesting. One challenge here I think is with plain text and rich text, the structure of the text and structure of the final output are going to be pretty close.
And so that makes it most feasible to have the thing where you’re seeing both the worlds superimposed with the double asterisks on both sides and the bold text, for example, with something like a diagram, if you were to represent a diagram in just like Like not like input, it would be a complete mess. It basically no resemblance to the final output, just be like a string of really opaque characters, and then it would compile out to a nice graph, but it’s kind of hard to go back and forth because of that.
One way to combine these two worlds would be to invoke the command palette metaphor that we see emerging so often, as you can imagine, OK, you’re editing a score or you’re editing a graph. And instead Having 1000 buttons around the edge of your screen like you do with these typical applications, the only interfaces you can click on stuff and then you can type stuff in the command poet. So you click up where you want to add a note and you say like B, you know, BQ, and it puts in the Bcor node and so on. And similarly with graphs, you could click on a node and you could invoke little commands with your text editor or perhaps edit the little node locally represented as a little text box. That’s kind of a way to bridge this issue of a pure tax representation would have no obvious correspondence to a 2D or a 3D image, but if you have some way to get more local nodes, it could work well.
01:03:00 - Speaker 1: Yeah, definitely.
01:03:01 - Speaker 2: And the thing that brings to mind for me is our oft-cited favorite tool for thought, which is the spreadsheet where you do have this, it’s a very, very simple version of that, this 2D layout, but in fact you do click on cells and type in symbolic there, so you are mixing a visual spatial layout, a very lightweight one with some symbolic representation.
01:03:23 - Speaker 3: Spreadsheet remains undefeated.
01:03:25 - Speaker 1: One thing that I find really interesting about spreadsheets, that’s I think often very unexplored is that many applications like Air Table notion is also very much guilty of this.
You can capture like the power of the spreadsheet as like a relational database or we like what happens if we impose better structure onto the different columns and things like that, but there’s like a separate totally untapped. Under explored area of spreadsheets, which is that it’s basically this canvas, right? Spreadsheets capture everything that people liked about table-based layouts in HTML with none of the stigma associated with it. And so you can create these like really complex interfaces that basically just do data. and things like that and put things, be like, OK, I’m going to like copy this data and bring it over closer to where I’m working now so I can reference it more easily. It’s basically just this grid, right? And that’s totally unstructured. It doesn’t correspond to any kind of relational format, but it’s also a really powerful computation paradigm.
01:04:20 - Speaker 3: Yeah, totally. I think people really love to be able to click somewhere and put stuff there. And a lot of spreadsheet use is just that. They just want to click there and put text or put a color, and there’s no formulas at all. And by the way, this goes back to our idea of convergence of the Office document types. I see people using Figma for this a lot, like they’re not designers, they’re not designing interface. They want to click and put pictures on a 2D canvas, and they want to click and put text there and you could see a sort of continuation of this world where these things continue to merge as the software gets more sophisticated.
01:04:49 - Speaker 1: Yeah, and then on the subject of diagrams real quickly, I remember that I want to mention sketch and sketch, which is this project by Brian Hempel, Justin Lubin, Robbie Shug at University of Chicago from a couple of years ago, and the idea there is you have direct manipulation programming for SBG.
So in the same you have this editor and then on the left side, you might see the code that outputs a certain SPG on the right side you see the SP. itself, and you should be able to do things like directly go in with the mouse, click an anchor point and drag it somewhere or do other kinds of transformations that people are used to when SVG editing, and it should obviously be reflected in the output, but also change the code that goes into it, and then you can make changes to the code and it will modify the output.
I think this is one of the most successful examples. I’ve seen of an editor that actually manages to keep this bidirectional linkage working and when you make manual edits with the direct manipulation edits with the cursor, it doesn’t totally botch your code and when you make changes with the code, it doesn’t lose all of your edits with the visual side. I think it would be great to see like more things like this for more structured areas like diagramming or things like that.
01:05:59 - Speaker 3: So many research projects to do.
01:06:02 - Speaker 1: Yes, lots to do.
01:06:04 - Speaker 2: So slum, I see a recurring theme in how you think about all of this, whether it’s equations, prose, rich texts, musical score, or diagrams, is this intermediate format concept, and maybe like a straw man or an outside view might come at this thinking, well, being able to see something like a markdown is sort of exposing plumbing that nerdy programmer types might like, but The reason we invented what you see is what you get word processors, whatever, 40 years ago or whatever it was, was to potentially liberate us from that.
But I see that you see the future is not one where those go away.
We want to expose that. There’s some value to that separately from a fully visual 100% mapping the rendered output and the way you edit it looking precisely.
The same, so I think that eliminates somewhat what I would imagine how you would answer the question I was going to ask you about the future, but with that in mind, I’ll basically say, yeah, if you look forward, say 5 or 10 years to what advances either have happened or that you hope to see happen in terms of how rich text works on our computing devices, what does that look like?
01:07:16 - Speaker 1: Yeah, I think it’s exactly like you were describing, we originally had this idea that you would be able to get a Wizzy Wig editor or something like Microsoft Word and totally decouple yourself from this underlying representation. I think that works up until the point where you have lots of different Output formats or different ways of viewing the document that people would like to use.
And as soon as you are in a world where even something like, let’s say I want to have two different views in a GUI application, all of a sudden it becomes much more beneficial to have some kind of intermediate format so that you don’t have to do like N times different renderers and parsers and compilation pipelines for all of these internal things.
01:08:01 - Speaker 2: So there’s a simple example of that earlier you mentioned the reading academic papers on different size screens, you know, a phone versus a desktop versus a printout, that even just the basic reflow of the text, simple as that seems to a narrower or wider screen actually is pretty complicated and There was an approach of designing for several different screen sizes, but now we know that that’s not very futureproof and doesn’t the way we want. And so as soon as you have anything that’s even slightly dynamic, even something as simple as text free flowing, that’s the place where you think in an immediate format is necessary.
01:08:36 - Speaker 1: Yeah, exactly, like it’s not tractable to design like a phone version of the website for every possible phone and then like a tablet version for all the tablets and then a desktop version. And but also like a projector version, things like that.
So the layout and the appearance is driven by the content itself. And I think that there’s an idea of that for outputting a paper.
If you’re thinking about outputting another artifact like a diagram or something, I think there are situations where it’s really useful to be able to do standard direct manipulation diagram editing and then also situations where it’s really useful to be able To like select all of the text that corresponds to a certain subgraph and just like move it somewhere else and allowing people the flexibility of choosing between those different edit options, depending on what task they’re trying to perform, what problem they’re trying to solve is like a really big area of opportunity.
So I think like, we’re still at a stage where with all of these different new editors like Coda or Notion or even bare craft. Editors are still like very much borrowing from each other a lot and periodically striking out in the direction of like, here’s a new kind of block or a new kind of like cell or type of text that you can have, and I think that while we’re still in the stage of like churning feature churn around, what are the editing primitives that people care about, what things go in a document, it’s going to be hard to develop any kind of like unifying framework or IR for these documents to work together.
I’m hopeful that once we reach a scenario where there’s a little more stasis and maybe more overlap in the capabilities and interests of different editors, you could have like this intermediate platform that extends from things like Rome to notion or notion to air Table or something like that for the components that make sense to go into those other platforms, and then you can actually really flexibly move your data around between these areas and likewise, within applications, maybe you want to be able to start off with something really low fidelity and gradually get something higher fidelity, like it would be really nice to have a slider almost. That allows you to move up and down the ladder of abstraction, but failing that, like an intermediate tool that you can plug in and be like, OK, I want to take this like bullet list of to do items and upgrade it into a database for like things or something like that. Something that’s more plug and play that also handles structured data in the same way we have a tool like Handdoc for text. That’s what I’m really excited to see because I really think of rich text as slowly expanding to include all the things you might want to have in a document, which might include Embedded views of other databases or things like that. So just having a more expansive interpretation of rich text that is less constraining with respect to the kinds of artifacts that you can produce, allows you to combine more things together, has like a notion of structure that enables these kinds of really Powerful edits like re-parenting an entire subtree, while also allowing you to do things like select a linear region of text and copy it somewhere else. I think that’s kind of the direction we’re moving in, where we combine a lot of flexibility of plain text editors that we’ve seen to date with some of the power of having more structure.
01:11:51 - Speaker 3: It’s a pretty exciting future.
01:11:53 - Speaker 1: Yeah, let’s hope we get there.
01:11:54 - Speaker 2: Well, let’s wrap it there. Thanks everyone for listening. If you have feedback, write us on Twitter at @museapphq. You can reach us on email, hello at museapp.com. You can also leave us a review on Apple Podcasts. And slim, your drive and passion for all things text and in fact expanding my mind has been expanded on what even we would think of text as being and what these intermediate formats can do for us in the future. So I’m really excited that you’re on the forefront of this and pushing forward our tools.
01:12:29 - Speaker 1: Yeah, thanks so much for having me. This was great to talk about.