Structural OCaml Editing in Emacs

dan-robertson · on March 15, 2020

Extract expression looks amazing.

These days I write a lot of ocaml. I used to write a bit of Common Lisp though I don’t write much of that nowadays.

The thing I miss most when writing ocaml is paredit, a mode for emacs to edit sexps keeping them balanced. There is a more generic mode, smartparens, which can be used in other modes but it doesn’t work super well for ocaml: it can’t automatically add the required delimiters (though it isn’t clear that would always be wanted) and it struggles with multi-char delimiters, eg:

  (* comments *)
  [| arrays |]
  {| unescaped strings |}
  {foo| harder unescaped strings |foo}

And because the structure of ocaml involved other delimiters that smartparens can’t cope with, you miss out on a lot of structure editing.

But that said, maybe it is hard to edit structure that is less plain to see. Swapping two lines like this:

  let foo = x y in
  let bar = a b in
  ...

Seems like a natural operation but the same thing in lisp is quite an unnatural operation (the indentation style of ocaml hides that the second let is really nestled inside the first). Incidentally paredit has an operation to do this but it’s pretty confusing (something like, take the sexps backwards from the point to the start of the smallest list strictly containing the point and swap them with the sexps going backwards from (excluding) the smallest list strictly containing the point to the start of the smallest list strictly containing the smallest list strictly containing the point.

Another ocaml operation would be merging/splitting lets into a let ... and. But this would be even more unnatural in the equivalent lisp structure (perhaps one reason is that parallel lets are idiomatic in lisp but sequential lets are idiomatic in ocaml so it isn’t really a common operation to do in lisp).

There are a few structure editing features that sort of work in the normal ocaml modes of Tuareg with merlin. One is a “select more” operation which understands the syntax a bit. Another is a forward/backward defun operation which works a lot of the time and can be useful. Merlin also offers the ability to expand patterns based on the type they are matching (eg to expand _ into (_,_,_) or to the names of the record fields). There is also a “construct” operation which I’ve not really been able to figure out (I usually just abuse the destruct operation somehow) and some kind of type hole syntax (like this: (??)) which only merlin understands and which really confuses the compiler proper.

I’m pretty excited to try out this structure editing improvement, or indeed any structure editing or type-directed editing features.

gopiandcode · on March 16, 2020

As it happens, my main inspiration for writing this plugin was primarily due to the extensive support Emacs provides for performing structural edits in lisp. After having experienced the ease and speed at which you can naturally transform lisp programs by using the expression-level movement and editing commands, it was rather annoying to find myself frequently relying on character level movements to account for small issues with tuareg/merlin's movement commands.

Also, after actually setting this whole thing up, I now believe ocaml has the potential to provide an even better editing experience than lisp if structural editing is properly implemented, as the comparative lack of delimiters means that ocaml is quite quick to type in (and structural support should mean that this doesn't come at the cost of movement speed). Hopefully I've got the ball rolling with regards to structural editing of ocaml code, and I'm eager to see how this area progresses.

j88439h84 · on March 16, 2020

> The thing I miss most when writing ocaml is paredit

Have you tried smartparens?

https://smartparens.readthedocs.io/

dan-robertson · on March 16, 2020

I describe it in and after the paragraph you took that quotation from.

j88439h84 · on March 16, 2020

pbiggar · on March 15, 2020

Very cool.

We built a structural editor for an OCaml-like language (Darklang). Our first version was very AST-based, with all movement using the AST. The feedback we got was that AST-based editing is hard to grok and that you need line-based editing too. So we re-implemented and now we have both (although we broke and are re-adding a bunch of the editor refactorings).

gopiandcode · on March 16, 2020

That makes sense - I'd assume restricting movement to just ast-based edits would be too constrained and would ultimately just limit the types of transformations users could perform (especially important when the syntax is wrong, or when making the AST temporarily invalid).

Have you had a look at the various Emacs plugins for editing lisp? Given the simplicity of the syntax, it would be fairly simple to make a lisp editor only performs AST-based transformations, however most lisp plugins support a mixture of both AST and line-based transformations. I find this approach to be very natural, allowing me to rely on the AST when I need it, and then being able to ignore it when I don't need it. My main motivation for gopcaml-mode was just to achieve a similar editing experience for OCaml code.

pbiggar · on March 16, 2020

Yeah, I was mostly influenced by paredit, though I had looked at parinfer and a few others.

Question about your approach: what happens if the code is syntactically invalid? Dark prevents code from being syntactically invalid cause it's AST only, but for example ocamlformat just refuses to run if it can't parse the whole file.

gopiandcode · on March 16, 2020

At the moment, it just gracefully disables itself so the movement commands revert to calling the corresponding Tuareg and Merlin operations. This is mainly because I currently rely on the vanilla OCaml parser to build the AST which doesn't handle recovery from syntax errors.

In the future I plan to move over to the Merlin parser, which has partial support for syntactically invalid buffers (I believe it wraps invalid regions with another syntactic construct), so it should still be possible to move around in other parts of the buffer.

As an aside, as I mentioned on the OCaml forum, I haven’t found this to be a major issue as I find that most of the time when I’m moving around (i.e not inserting text), the code is usually syntactically correct - I’ve also bound M-RET to insert a type hole (??) to easily allow movement even when I haven’t completed a function definition, and overall this leads to quite a natural editing experience.