Modifications-0.3

Readable specification version 0.3 modifications

These items were discussed and resolved before the freeze of the "readable" specification version 0.3 freeze date of July 31, 2012.

  • () [] {} can be postfixed on anything, not just function/macro invocations or symbols.
    • Status: Accepted
  • Non-whitespace character for indentation
    • Status: Accepted, "!" used.
  • Precise semantics and syntax for GROUP, and related syntactical concepts like SPLICE/SPLIT and ENLIST.
    • Status: Agreed: Use SPLIT semantics with marker \\.
  • QUOTE et al. + blank/tab in indentation-expressions
    • Status: Agreement to continue with SRFI-49 meaning: This quotes (et al.) the whole expression following.
  • Blank line means END of top-level expression
    • Status: Accepted
  • Indentation at the top-level DISABLES indentation-expressions
    • Status: Accepted
  • SUBLIST
    • Status: Accepted
  • Precise and formal parser specification
    • Status: To be done, pending current set of items under discussion

() [] {} can be postfixed on anything.

As of the 0.2 spec, () [] and {} could be postfixed only on symbols or lists. However, this actually complicated the parser implementation: you needed to specifically guard against #f, #t, .2, etc. It also complicated explanation of the rule, and it limited potential future uses. Removing this limitation simplifies the parser implementation: now, we always check for a postfix - no further checking is needed.

Status: Accepted

Pros:

  • Simplifies the spec and implementation
  • Makes specification more generally consistent
    • Disabling the postfix meaning of () after e.g. numbers used to be the only way that the general rule "all parameters are separated by whitespace" could be violated, i.e. f(9(x)) would be a function call on two parameters, even though no space exists between them. Now this is a function call on a single parameter, the invocation of 9 on the variable x.

Cons:

  • Increases the amount of existing s-expressions that get misparsed
    • Arguably, if there are existing s-expressions that could get misparsed by the interpretation of 9(x) as (9 x), those are not properly formatted.

Non-whitespace character for indentation

In most languages, indentation whitespace is just tab or space. Great flame wars have been fought over which side, the tabs or the spaces, are the One True Way to write indentation. As outcasts of the Great Lisp Way That Eschews Indentation-Based-Languages (IBL's), we the readable-discuss team now back yet another whitespace character: the lowly "." or period.

This idea was formed due to the problem of indenting code in HTML transport e-mails. Such HTML transport e-mails would often elide real space indentation, considerably confusing everyone. Eventually the mailinglist formed the habit of using "." in place of indentation spaces. dwheeler noted it, and almkglor took it seriously enough to push it.

Status: Provisionally Accepted pending final decision regarding what symbol/character to use for GROUP, SPLICE/SPLIT, or ENLIST: If "." is used for any of those, we will probably have to drop "." as "whitespace" indentation, as we currently consider those to be higher priority/giving more expressive value.

Err, REAL Status: Accepted, but with "!" as the non-whitespace indentation character.

Pros:

  • The original purpose: ensuring that indentation whitespace does not get lost in HTML transport e-mails.

  • It allows the programmer to "draw" indentation levels in very long definitions or sub-clauses, effectively creating a brand-new kind of commentary:

    define foo(x)
      define bar(x y)
      . let ((w {x + y}))
      .   execute-stuff(w)
      .   execute-more-stuff(x)
      .   execute-even-more-stuff(y)
      define quux(u v)
      . execute-stuff(u)
      . execute-even-more-stuff(v)
      cond
        {x < 42}
        . bar(x {x + 42})
        . quux(x {x - 42})
        . bar({x + 42} x)
        . quux({x - 42} x)
        {x < 0}
        . quux(x {x - 42})
        . bar({x + 42} x)
        . quux({x - 42} x)
        . bar(x {x + 42})
        else
          #t
    

Cons:

  • This is not common with other languages.
    • Leading to more than a few LOLs when discussing it!
    • Maybe because they haven't actually seriously thought about it
  • Conflicts with use of ... in Scheme macros:

    define-syntax do-it
      syntax-rules ()
        group
          do-it
            x
          group x
        group
          do-it
            x
            body
            ...
          group begin
                  x
                  do-it
                    body
                    ...
    
    • But we could still use "group ...", especially with properly-designed GROUP syntax. If GROUP gets reduced to a single character like "\":

      define-syntax do-it
        syntax-rules ()
          \
          . do-it
          .   x
          . \ x
          \
          . do-it
          .   x
          .   body
          .   \ ...
          . \ begin
          .   . x
          .   . do-it
          .   .   body
          .   .   \ ...
      

Possible "column 1" variants:
1. Have period only mean indentation in column 1
2. Have period only mean indentation if in a sequence of periods starting at column 1.

These column 1 variants would mean that symbols beginning with period would only need to be escaped if they started in column 1 (less likely).

Another variant: Period only has an indentation meaning ONLY if it's followed by tab or space. This reduces the need for escaping (e.g., for symbol "...")

One question: Should period as indentation be considered exactly the same space, or should exact matching be required when processing indentation?

Precise semantics and syntax for GROUP, additional/extended semantics ideas such as SPLICE/SPLIT and ENLIST

This is strictly speaking not a single proposal but rather a set of related proposals. Some are intercompatible, others are not.

For our indentation system we were using Scheme SRFI-49, but as we experimented with it we had various issues. Thus, we are tweaking it based on our experimentation to work better. There are also various edge cases we'd like to nail down completely.

Currently, if a line begins with "group", the "group" is silently removed. But "group" is a purely alphabetic symbol, and a useful one at that; abbreviations are typically punctuation in Lisp-based languages. Alternative symbols for doing this, with possibly tweaked semantics, are under discussion.

Definitions

First, let us define what GROUP, SPLICE, SPLIT, ENLIST etc. mean:

  • GROUP - in the old SRFI-49, the group symbol (group in SRFI-49), when occuring as the first non-indentation item on a line, would be an "invisible" symbol that gets removed. The main effect is that when on a line by itself, it would define an indentation level, but would be removed from the indentation - the net effect being that an additional indentation level gets inserted.
  • SPLICE - after finalizing sweet-expressions 0.2, an issue was raised regarding Arc if syntax and most Lisplike's "keyword" syntax. SPLICE was proposed as an extension composed of 3 rules: (1) \ at the start of a line is ignored (2) \ in the middle of the line causes the current line to be ended at that point, and the rest of the line pretends to be a new line on the same indentation, (3) \ at the end of a line causes the next line's indent to be ignored, and will be a continuation of the current line. Examples:

    ; GUILE
    define-modules (example module) \
      :export (example1 example2) \
      :import (ice-9 match)
    ; ARC
    if
      cond1() \ expr1()
      cond2() \ expr2()
      \         expr3()
    
  • SPLIT - A final refinement of the SPLICE concept, originating from a proposal to use the same character for GROUP and SPLICE. By careful definition of spec, SPLICE can actually implictly include the GROUP rules. This generally requires removal of the SPLICE-at-end-of-line rule, and since the term "splice" refers to splicing separate lines together into a single line (which actually only occurs on that rule), the removal of that rule should probably also necessitate renaming of the overall rule. SPLIT on a line by itself and SPLIT at the start of a line acts much like GROUP, and only the SPLICE-inline rule is actually added on top of the GROUP rule; this allows SPLIT and GROUP symbols to be the same.

  • ENLIST - a different treatment of the GROUP rules means that if it occurs at the beginning of a line, it immediately inserts a list level, regardless of whether it is by itself or with other elements on the line. This variation was called "ENLIST" as it always causes the insertion of a new list level, in opposition to the initial GROUP rules, which causes this behavior as a side effect of its rule when occuring on a line by itself.
  • GRIT: - A variant of SPLIT, again, a single SYMBOL is used. Semantics:
    1. If SYMBOL is at the beginning of the line (after indent), and nothing else on the line (other than maybe space/tab/;-comment), then it creates a list without header using all children lines, like GROUP or SPLIT. Put another way, it stands for the empty first parameter.
    2. If SYMBOL is between datums it works like SPLIT. (What if it's the last one?)
    3. If SYMBOL is at the beginning of the line, and something else is on the list afterwards, it ALWAYS creates a list from what follows, EVEN IF there is only one datum. Note that this is the same as no SYMBOL at all when there is more than one datum on the line or a child line. The theory here is that, e.g., with "let", you can always use SYMBOL at the beginning of the line for the first parameter; if there's only 1 variable/value pair, the "extra" list that wouldn't normally be added will be added, otherwise, nothing different happens, so that you can use a simple mental rule: "with let, the next line is a child line that always begins with SYMBOL".

TODO: other details

Here are some GRIT examples, using \. If you use let, you can simply say "the first parameter better start with an indented SYMBOL". If it's complicated, it can look like this:

. let
. . \ a(a0) b(b0)
. . . c calculate-c0()
. . . d calculate-d0()
. . do-stuff-inside-let()

Ah, but then when you go down to one variable it still works:

. let
. . \ a(a0)
. . do-stuff-inside-let()

Aside from the discussion of the above set of rules, there is also a proposal to reduce the length of the symbols used for syntax. In 0.2 the only rule existing was GROUP and it used the lengthy symbol group. Proposals currently exist to use shorter symbols, preferably one-character non-alphabetic ones to serve for the above rule variations:

  • \\ (BACKSLASH)

    • originally proposed for the SPLICE rule, since a backslash at the end of the line usually means that the next line is a continuation of the current line in most other languages and syntaxes (Bourne Shell, C, etc.).
    • the SPLIT proposal (which removes the SPLICE-at-end-of-line rule and merges the GROUP rule with SPLICE-at-the-start) inherits this proposed character from the SPLICE proposal, but the removal of the SPLICE-at-the-end-of-line rule makes a backslash at the end of a line act differently from the expected meaning from other languages.
  • . (PERIOD)

    • proposed as an alternative to the group symbol for the GROUP rule.
    • proposed also as an alternative symbol for the SPLIT rule (also known as "END" rule in this case - a period ends a sentence, and a period in the middle of a line ends the previous expression and starts a new expression at the same indentation level).
    • conflicts with the "PERIOD as indent whitespace" proposal.
  • ~ (TILDE)

    • proposed as a symbol for ENLIST - this proposal allows both the original GROUP and the new ENLIST behavior to exist side-by-side, using different symbols for the two behaviors.
    • conflicts with the meaning of ~ in Arc.
  • @ (AT)

    • another proposed symbol, not assigned to any specific rule proposal.

Positions

As of July 15, the situation is:

Pool of symbols: ~ \ . ! $ % ^

Then we have a bunch of concrete proposals:

  1. almkglor:

    Don't use GROUP
    Don't use SPLICE
    Use SPLIT
    Use ENLIST

Use 2 symbols from our pool of syntax symbols. Current proposal: \ = SPLIT, ~ = ENLIST

  1. dwheeler:

    Don't use GROUP
    Don't use SPLICE
    Use SPLIT
    Don't use ENLIST

Use 1 symbol from our pool of syntax symbols. Various: ! = SPLIT, ~ = SPLIT, \ = SPLIT (in order of preference)

  1. arne_bab:

    Use GROUP + GROUP-inline=dotted list
    Use SPLICE - SPLICE-at-beginning rule
    Don't use SPLIT
    Don't use ENLIST

Use 2 symbols from our pool of syntax symbols. Current proposal: . = GROUP, \ = SPLICE

Examples

SPLIT example. The code below assumes: \ = GROUP = SPLICE, .-at-indent = whitespace.

define-syntax doing
  syntax-rules (let)
    \
    . doing
    .   let var \ value
    .   body
    .   \ ...
    ..\ let ((var value))
    .     doing
    .       body
    .       \ ...
    \
    . doing
    .   x
    ..\ x
    \
    . doing
    .   x
    .   body
    .   \ ...
    ..\ begin
    .     x
    .     doing
    .       body
    .       \ ...

; inspired by Haskell, of course!
doing
  let x \ compute-x(foo bar)
  let y \ compute-y(foo bar)
  {x + y}

An older description of splicing using "\":

  1. If it's between items on a line, it's interpreted as a line break to
    the same indentation level.
  2. Otherwise, if it's at the beginning of a line (after 0+
    spaces/tabs), it's ignored - but the first non-whitespace
    character's indentation level is used.

This is mainly to handle named parameters more gracefully, e.g.:

  myfunction
    :option1 \ f(a)
    :option2 \ g(b)
    :option3
    \ h(c)

could map to (myfunction :option1 (f a) :option2 (g b) :option3 (h c)). Note that f(a) or g(b) could be the beginning of a complex program using indentation, since \ does not turn off indentation.

QUOTE in indentation-expressions

In the current spec, during indentation processing, if the first non-indentation character is the abbreviation for quote followed by whitespace, the rest of the expression is processed at the given indentation level starting at the next whitespace; then that expression is quoted. Similar for the other usual abbreviations. Thus:

' x y z

means (quote (x y z)).

While:

'x y z

means ((quote x) y z).

This is straight from (and consistent with) SRFI-49. Is this the right way to go?

Pros:

  • Provides an easy way to escape/quote complex expressions
  • Consistent with Scheme SRFI-49.

Cons:

  • Yet another rule to explain and implement.

SUBLIST

Initially proposed 2012-07-19.

Alan Manuel Gloria noted that he had to code this expression and found it ugly:

force(car(force(unwrap-box(s))))

Proposal: If the SUBLIST symbol "$" is found in the middle of a line, then it is an implied "promise" to insert a "(" at that point with a ")" automagically inserted at the end of the block (i.e. before the next line with the same or lesser indent than this one). So:

force $ car $ force $ unwrap-box s

is equivalent to:

(force ( car ( force ( unwrap-box s ))))

Thoughts:

  1. Haskell uses it for a highly similar meaning
  2. Think of $ as a tiny open parenthesis (, a vertical line, and a tiny close parenthesis ). So $ is an implied promise to open a list, which will automatically get closed somewhere down the road (the vertical line).

Pros:

  1. Simplifies this case by reducing parentheses

Cons:

  1. Yet another rule.
  2. Not clear that this case is common enough to be worth creating a special abbreviation.

Concrete SUBLIST proposal

Formal description

Add the following production for (i-expr lvl):

(i-expr lvl) -> head hspace* SUBLIST hspace* (i-expr lvl)
  ; head yields a list
  append($1 list($5))

Note that further consideration must be made, in that SUBLIST
is not allowed at the start of a line - i.e. head shouldn't be empty.

Implementation-centric description

In readblock-internal, if the SUBLIST symbol is encountered,
clear any remaining horizontal whitespace after it and recurse
into readblock-clean. Take readblock-clean's returned object,
wrap it in a singleton list, and return with the sub-call's
exited indent.

If SUBLIST symbol is encountered in readblock-internal while
first? is #t (indicating start of line), signal an error instead.

Example Transformations

foo $ bar
=> (foo bar) ; the implied promise isn't taken

foo $ bar nitz
=> (foo (bar nitz))

foo bar $ nitz quux
=> (foo bar (nitz quux))

foo bar $ nitz
  quux meow
=> (foo bar (nitz (quux meow)))

Indentation at top-level

This proposal is that if the first line is indented, indentation processing is disabled for that line.

Pros:
This makes it easy to temporarily disable indentation processing
Improves backward compatibility even more
* Easily handles the case "text does not start in column 1" which was inconsistently handled before.

Accepted.

FORMAL Parser Spec

This will need to be done once the major decisions are made. The expectation is that this would be in the git repository, since this is more like code. It will be based on the text in [Solution].


Related

Wiki: Join
Wiki: Modifications-0.4
Wiki: Solution

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.