CHAPTER SIX

Editing

One tool which has not been discussed is a way of creating files that can be manipulated by the other tools which have been presented. copyprog, like its Unix analogue, cat, is usable as an editor, for very simple tasks.

A better editor can be built around the regular expression engine presented in the last chapter, a line editor similar to the Unix editor ed.

PROGRAM

  edit - edit text files

USAGE

  edit [ file ]

FUNCTION

  edit is an interactive text editor that reads command lines from its
  input and writes display information, upon command, to its output.
  It works by reading text files on command into an internal "buffer"
  (which may be quite large), displaying and modifying the buffer
  contents by other commands, then writing all or part of the buffer
  to text files, also on command.  The buffer is organized as a
  sequence of lines, numbered from 1; lines are implicitly renumbered
  as text is added or deleted.

  Context searches and substitutions are specified by writing text
  patterns, following the same rules for building patterns as used by
  find.  Substitutions specify replacement text following the same
  rules as used by the program change.

  Line numbers are formed from the following components:

    n             a decimal number
    .             the current line ("dot")
    $             the last line
    /pattern/     a forward context search
    \pattern\     a backward context search

  Components may be combined with + or -, as in, for example,

    .+1           sum of . and 1
    $-5           five lines before $

  Line numbers are separated by commas or semicolons; a semicolon sets
  the current line to the most recent line number before proceeding.

  Commands may be preceded by an arbitrary number of line numbers
  (except for e, f, and q, which require that none be present).  The
  last one or two are used as needed.  If two line numbers are needed
  and only one is specified, it is used for both.  If no line numbers
  are specified, a default rule is applied.

  In alphabetical order, the commands and their default line numbers
  are:

    (.)    a            append text after line (text follows)
    (.,.)  c            change text (text follows)
    (.,.)  dp           delete text
           e file       edit file after discarding all previous text,
                        remember file name
           f file       print file name, remember file name
    (.)    i            insert text before line (text follows)
    (.,.)  m line3      p move text to after line3
    (.,.)  p            print text
           q            quit
    (.)    r file       read file, appending after line
    (.,.)  s/pat/new/gp substitute new for occurrence of pat (g
                        implies each occurrence across line)
    (1,$)  w file       write file (leaves current state unaltered)
    (.)    =p           print line number
    (.+1)  newline      print one line

  The trailing p, which is optional, causes the last affected line to
  be printed.  Dot is set to the last affected line, except for f, w,
  and =, for which it is unchanged.

  Text entered with a, c, and i is terminated with a line containing
  just a '.'.

  The global prefixes cause repeated execution of a command, once for
  each line that matches (g) or does not match (x) a specified text
  pattern:

    (1,$)  g/pattern/command
    (1,$)  x/pattern/command

  command can be anything but a, c, i, or q, and may be preceded by
  line numbers as usual.  Dot is set to the matched line before
  command is done.

  If the command line argument file is present, then the editor
  behaves as if its input began with the command e file.  The first
  filename used is remembered, so that a subsequent e, f, r, or w
  command can be written with no filename to refer to the remembered
  filename.  A filename given with e or f replaces any remembered
  filename.

EXAMPLE

  Don't be silly.

BUGS

  The p suffix is only implemented for = and d.

Regex pattern parsing revisited

Before getting into the editor, one problem should be addressed. The getpat and getpat' functions from the previous chapter are unspeakably ugly as well as being difficult to understand and maintain. Since the editor promises to require more parsing than that, it seems reasonable to dig up more powerful parsing tools.

I would rather keep this entire project as one Haskell source, so I did not use Happy, the Haskell yacc-alike. Fortunately, the Haskell library comes with Parsec, "a fast combinator parser". Using Parsec, the parser is built using a series of rules, each defined as a Haskell monadic function.

In a sense, using Parsec to parse regular expressions for use in my home-made nondeterministic regex engine is something like using an automated factory to produce stone tools. But, where would we be if we didn't occasionally knap ourselves some stone tools, and why shouldn't we be comfortable while doing so?

(Note: this parser is the first I have written using Parsec. Take it with a grain of salt.)

One minor change is that, instead of directly parsing to the Pattern functions, this parser builds an intermediate ParsePattern list. While developing and debugging the parser, it is helpful to see the intermediate data structures, and it is easy to show a pure data structure such as ParsePattern while it is very difficult to show a function like Patterns.

data ParsePattern = EOL
                  | Any
                  | Lit String
                  | Class String
                  | NClass String
                  | Star ParsePattern
                    deriving Show

There are a couple of places in the regex pattern grammar where characters can be escaped using '@' as an escape character. escChar provides a grammar rule for handling those instances.

It is a monadic action that matches a '@', followed by any character and returns a string made up of the escaped character. If the '@' is not found, or is not followed by a character, the action will fail (and hopefully some other action will take up the slack).

escChar     :: P.Parser String
escChar     = do { P.char '@'; ch <- P.anyChar; return [esc ch] }

dollar is a similar action; it matches a '$' character, and returns it as a string. This rule is used to match '$' characters in the body of the expression, as opposed to the $ end-of-line anchor. See the endAnchor function for that rule.

dollar      :: P.Parser String
dollar      = do { P.char '$'; return "$" }

literalChar is an action that matches any character except the ones listed, "$@?*[". (These are the active metacharacters in the body of the expression.) Like escChar and dollar, it returns the character as a string.

literalChar :: P.Parser String
literalChar = do { c <- P.noneOf "$@?*["; return [c] }

literalP (which is a poor name) is an action that combines the three previous actions, collecting the result of whichever one succeeds, and returns that result as a literal pattern element, Lit. The '<|>' combinator is the choice operator; if the left parser operand succeeds, its result is the result of the expression; if it fails, the right parser operand is attempted.

Keeping this rule (and the patterns rule below) simple is the reason the first three functions return strings.

literalP    :: P.Parser ParsePattern
literalP    = do
              l <- dollar <|> escChar <|> literalChar
              return (Lit l)

endAnchor matches a '$' character like the dollar rule above, but in this case it must be followed by the end of the input. If the rule matches, the result will be the EOL pattern element.

This rule is responsible for the correct discovery of the '$' end-of-line anchor; in a combination of it and literalP above, this rule must be checked first, to avoid having literalP eat the '$' as a literal character. See pattern below.

endAnchor   :: P.Parser ParsePattern
endAnchor   = do { P.char '$'; P.eof; return EOL }

dotChar is trivial (and poorly named).

dotChar     :: P.Parser ParsePattern
dotChar     = do { P.char '?'; return Any }     -- '?' is normally spelled '.'

The next several rules are used to match character classes. range identifies the "c1-c2" syntax in classes, and returns a string containing all of the characters between c1 and c2.

The rule below uses Parsec's alphaNum rule to force b (playing the role of c1) and e (likewise, c2) to be upper or lower case letters or numbers.

range       :: P.Parser String
range       = do
              b <- P.alphaNum
              P.char '-'
              e <- P.alphaNum
              return [ b..e ]

classChar is similar to literalChar above; it matches any character other than the one that would terminate the character class.

classChar   :: P.Parser String
classChar   = do { c <- P.noneOf "]"; return [c] }

classMbrs introduces another aspect of Parsec: lookahead. Consider the two patterns, with the character currently being examined by the parser indicated by ^:

[abc]
 ^

[a-z]
 ^

The question is, does the 'a' represent a classChar, i.e., a literal character, or the beginning of a range? Given a rule like:

range <|> escChar <|> classChar

the 'a' would be assumed to be the start of a range, and the point would move to the next character. With the first pattern, the range rule would fail and as a result of the way <|> works, the whole rule would fail.

One way to handle this would be to left-factor this rule in such a way that <|> could be used successfully, but the result would likely be complex and ungraceful. Instead, classMbrs uses 'try', which backtracks when its argument (range, in this case) fails, allowing escChar and classChar (which cannot eat characters before failing) to examine the same input.

Order is important, since classChar would happily accept the same characters as range.

(According to the Parsec docs, it appears that this particular use of try is not as efficient as it could be if it were partially left-factored (it should be up in range itself). However, I think it is clearer here, and the performance of the expression parser is not wildly important.)

classMbrs   :: P.Parser String
classMbrs   = (P.try range) <|> escChar <|> classChar

The next two rules illustrate one of the advantages of the ParsePattern data representation over the Pattern function: a negated class is a normal class that has '^' at the beginning. In the two functions below, cls is used to parse the class contents for both, and ncls merely re-marks the contents as a negated class if necessary.

cls uses the Parsec 'many' combinator, to indicate that there are zero or more classMbrs making up the class. Since the result is a list of classMbrs' results, which are strings, and the final result of cls and ncls is a string, cls forms the concatenation of the list of strings.

cls,ncls    :: P.Parser ParsePattern
cls         = do { c <- P.many classMbrs; return (Class (concat c)) }
ncls        = do { P.char '^'; (Class c) <- cls; return (NClass c) }

Finally, the character class rule, charClass. The overall expression for a character class is surrounded by square braces.

charClass   :: P.Parser ParsePattern
charClass   = do { P.char '['; c <- (ncls <|> cls); P.char ']'; return c }

An individual basic pattern in a regular expression is one of:

  • a '?' any-character,
  • a literal character, or
  • a character class.
pattern     :: P.Parser ParsePattern
pattern     = dotChar <|> literalP <|> charClass

The star modifies the immediately preceding pattern; this rule first tries to match that pattern, then find a '*'. If it succeeds, the first ParsePattern is wrapped in the Star to be later used in an instance of kStarP.

star        :: P.Parser ParsePattern
star        = do { p <- pattern; P.char '*'; return (Star p) }

patterns is the heart of the regular expression parser. It attempts to match zero or more of either

  • a star (a pattern followed by '*'; using lookahead to avoid eating patterns unnecessarily),
  • a $ end-of-line anchor (using lookahead to avoid eating literal '$'s), or
  • a pattern.

Parsec's many joins the results in a list. Then patterns matches the end of the regular expression.

patterns also applies an optimization allowed by the ParsePattern data structure and the strings returned by the original three rules: joinlits finds sequences of Lit's in the list and combines them into a single Lit with a longer string for easier matching.

patterns :: P.Parser [ParsePattern]
patterns = do
           pats <- P.many ( P.try star <|> P.try endAnchor <|> pattern )
           P.eof
           return (joinlits pats)            -- primitive optimization
    where
    joinlits [] = []
    joinlits (Lit s1 : Lit s2 : rest) = joinlits (Lit (s1++s2) : rest)
    joinlits (p : rest) = p : (joinlits rest)

anchoredpattern performs the same function as genpat: it determines whether or not the pattern is anchored at the beginning of the line and adds this information to the parsed pattern.

anchoredpattern :: P.Parser (Bool,[ParsePattern])
anchoredpattern = do
                  anc <- False `P.option` (do { P.char '^'; return True })
                  pat <- patterns
                  return (anc,pat)

parsePattern is the final piece of the Parsec regular expression parser. Its job is two-fold:

  • It uses Parsec's parse function to execute the parser on the input string, picking out the success or failure and the results, and
  • it translates the resulting ParsePattern list into a Pattern function, building it up as it walks the list.
toPattern :: (Bool, [ParsePattern]) -> AnchoredPattern
toPattern (anch, pat) = (anch, toPattern' pat)
    where
    toPattern' []                = doneP
    toPattern' [EOL]             = eolP
    toPattern' (Any      : rest) = anyP `seqP` (toPattern' rest)
    toPattern' (Lit s    : rest) = (litP s) `seqP` (toPattern' rest)
    toPattern' (Class s  : rest) = (classP s) `seqP` (toPattern' rest)
    toPattern' (NClass s : rest) = (nclassP s) `seqP` (toPattern' rest)
    toPattern' (Star p   : rest) = (toPattern' [p]) `kStarP` (toPattern' rest)

parsePattern :: String -> Maybe AnchoredPattern
parsePattern str = case P.parse anchoredpattern "" str
                   of Left _ -> Nothing
                      Right p -> Just (toPattern p)

Buffer representation

For edit, K&P focus on a seven-function interface rather than a specific buffer representation, although the implementation of their interface uses a familiar representation:

Alternatively, we could keep a single array of positions of text lines, and rearrange the position values as lines are deleted or moved, as we did for sort in Chapter 4. Since appending, deleting and reading all change the number of lines, bookkeeping is needed to keep the active positions in a compact set. But finding any given line is fast.

As it turns out, the position-array organization is substantially easier to work with than the linked-list (we tried that once before and it was a mess), and for most operations is just as efficient.

The interface they use is:

  • setbuf - initial buffer
  • clrbuf - clear buffer
  • puttxt - copies a text line after the current line; makes it current
  • gettxt - returns line n
  • blkmove - n1 n2 n3 - puts lines n1-n2 after n3
  • putmark - places the mark m on line n
  • getmark - returns the mark on line n

For my implementation, I have discarded both their interface and their representation. Instead, I am representing the buffer as a zipper'ed list. (A "zipper" is the derivative of a data structure, originally by Gerard Huet, ("Functional Perl: The Zipper", Journal of Functional Programming, Sep. 1997). The zipper is a modification of the data structure to open it up and make some internal position immediately accessible.)

A ZipList is like a List, except that rather than recording the head and tail of the list, it points to a particular location in the list (or more precisely, to a link between two nodes in the list). The ZipList has a r element containing a list representing the tail of the list following the current position, and a l element containing a list representing the elements preceding the current position, in reverse order.

(It also has a p element identifying by number the current position. That is just to make numerical movements easier.)

For convenience (it makes some of the subsequent operations easier), the "current element" of the ZipList is the head of the l list. As a result, a position p of 0 has l empty, and is the position before the first element---this is the same as the interface used by edit.

data ZipList a = ZL { p :: Int, l :: [a], r :: [a] } deriving Show

ziplist :: ZipList a
ziplist = ZL { p = 0, l = [], r = [] }

Two primary operations on a ZipList is to move one or more elements left or right. Moving right means transferring the head of the r list to become the head of the l list (and updating p). Moving left is the reverse. Finally moving to a specific position is a sequence of left or right moves.

zmoveRight, zmoveLeft :: ZipList a -> ZipList a
zmoveRight (ZL { p = n, l = l,  r = (h:t) })
    = ZL { p = (n+1), l = (h:l), r = t }
zmoveRight _ = error "zmoveRight: at right end"

zmoveLeft (ZL { p = n, l = (h:t), r = r })
    = ZL { p = (n-1), l = t, r = (h:r) }
zmoveLeft _ = error "zmoveLeft: at left end"

zgoto :: ZipList a -> Int -> ZipList a
zgoto zl n = if (p zl) < n
             then zgoto (zmoveRight zl) n
             else if (p zl) > n
                  then zgoto (zmoveLeft zl) n
                  else zl

Predicates on the ZipList determine

  • whether the ZL is empty, or
  • whether the current position is at the left or
  • right end of the ZL.
zempty, zleftEnd, zrightEnd :: ZipList a -> Bool
zempty zl = zleftEnd zl && zrightEnd zl
zleftEnd zl = null (l zl)
zrightEnd zl = null (r zl)

The (numeric) current position in the ZipList is directly accessible, as is the current element's contents.

zposition :: ZipList a -> Int
zposition zl = p zl

zget :: ZipList a -> a
zget (ZL { l = (h:t) }) = h
zget _ = error "zget: at left end"

The length of the ZipList, however, requires some computation. The position, p, of the current element is the length of the l list, so adding that to the length of the r list produces the overall length.

zlength :: ZipList a -> Int
zlength zl = (p zl) + (length (r zl))

Inserting a new element into the ZipList is relatively simple; a new ZL is created (this is a functional data structure, after all), with the new element at the head of the l list and the position p incremented. Notice that the rest of the l list, as well as the r list, are unchanged. Inserting is sort of fast, if it's at the current position.

Note that this operation invalidates the positions of the r elements. However, since the only way to get to those elements is to move right, and that operation updates the positions, that invalidation is irrelevant.

zinsert :: ZipList a -> a -> ZipList a
zinsert zl a = zl { l = (a:(l zl)), p = ((p zl) + 1) }

Replacing the current line is also trivial (as long as there is a current line).

zreplace :: ZipList a -> a -> ZipList a
zreplace zl@(ZL { l = (h:t) }) a = zl { l = (a:t) }
zreplace _ _ = error "zreplace: no element"

A slice of a ZipList takes the elements beginning at a given position and accumulating them up to a subsequent position. (Note: the positions are inclusive; this simplifies the edit code using this operation, but may not be the right choice in general.)

zslice :: ZipList a -> Int -> Int -> [a]
zslice zl i j | i <= j = zslice' zl i j []
    where
    zslice' zl i j acc = let zl' = zgoto zl i
                         in if i == j
                            then reverse ((zget zl'):acc)
                            else zslice' zl' (i+1) j ((zget zl'):acc)
zslice _ _ _ = error "zslice"

A ZipList can be converted to or from a regular list.

ztoList :: ZipList a -> [a]
ztoList zl = (reverse (l zl)) ++ (r zl)

zfromList :: [a] -> ZipList a
zfromList lst = ZL { p = 0, l = [], r = lst }

Deleting a range of elements from a ZipList is currently implemented using regular list operations and transitions to and from those regular lists. This may not be the most efficient way to do this, but it is simple.

zdelete :: ZipList a -> Int -> Int -> ZipList a
zdelete zl i j = let lst = ztoList zl
                     (lft,rst) = splitAt i lst
                     (mid,rht) = splitAt (j - i) rst
                 in zfromList (lft ++ rht)

Finally, a ZipList operation needed by blkmove: reversing a slice of a ZipList, once again implemented using list operations.

zreverse :: ZipList a -> Int -> Int -> ZipList a
zreverse zl b e = let lst = ztoList zl
                      (lft,rst) = splitAt b lst
                      (mid,rht) = splitAt (e - b) rst
                  in zfromList (lft ++ (reverse mid) ++ rht)

If anyone is looking for an example of beautiful code in Software Tools, here is my candidate. The original blkmove function handles moving a group of lines to a position following another line in an incredibly elegant manner.

The original has:

blkmove moves lines around; it does most of its work with the help of a subordinate, reverse, which reverses part of an array. Three well-chosen reversals cause the desired group of lines to move. We have illustrated one case here; you should verify the other yourself.
n1 ... n2 x ... y z ... n3 original positions
n2 ... n1 x ... y z ... n3 first reversal
n2 ... n1 n3 ... z y ... x second reversal
x ... y z ... n3 n1 ... n2 third reversal
procedure blkmove (n1,n2,n3 : integer);
begin
  if (n3 < n1 - 1) then begin
    reverse(n3 + 1, n1 - 1);
    reverse(n1, n2);
    reverse(n3 + 1, n2);
  end
  else if (n3 > n2) then begin
    reverse(n1, n2);
    reverse(n2 + 1, n3);
    reverse(n1, n3);
  end
end;

A few things to notice about this function:

  • reverse manipulates the elements [x .. y]; my zreverse handles [x..y).
  • It implicitly assumes n1 <= n2.
  • If n1 = n3 + 1 or n3 = n2, it does nothing, correctly.
  • If n1 <= n3 <= n2, the function does nothing. This is a safe choice, and arguably correct. In a language supporting exceptions, that would be the other option. In any case, the situation should be documented, since the caller is responsible for gracefully handling it.

The Haskell translation preserves the reversing algorithm, although it is somewhat less attractive here.

This is also the only one of the seven basic operations from the original that is preserved by this code. It is only used by the "m" command, too.

blkmove :: ZipList a -> Int -> Int -> Int -> ZipList a
blkmove zl n1 n2 n3 | n1 <= n2 && (n1 > n3 || n2 < n3)
    = if n3 < (n1 - 1) then let zl'   = zreverse zl   (n3 + 1) n1
                                zl''  = zreverse zl'  n1       (n2 + 1)
                                zl''' = zreverse zl'' (n3 + 1) (n2 + 1)
                            in (zgoto zl''' (n2 + 1))
      else if n3 > n2 then let zl'   = zreverse zl   n1       (n2 + 1)
                               zl''  = zreverse zl'  (n2 + 1) (n3 + 1)
                               zl''' = zreverse zl'' n1       (n3 + 1)
                           in (zgoto zl''' (n2 + 1))
      else zl
blkmove _ _ _ _ = error "blkmove: illegal arguments"

There is one down side to this implementation. The original:

Now that we have a working editor, we can concentrate on making it better. Our first concern is the buffer storage, which is intentionally rudimentary in the current version. We can gain considerable capacity by keeping only position information in memory and maintaining the bulkier text on a scratch file.

Only puttxt, gettxt, setbuf, and clrbuf need to be modified to use the external storage.

That is not the case with my implementation. Using a scratch file would mean IO, and ZipList does not touch any IO actions, so a simple rewrite would not be feasable. Sorry.

This is a downside to the separation between stateful IO operations and stateless functional code: it becomes difficult to refactor a stateless code to use (externally invisible) stateful operations. (A similar condition exists with respect to code tracing during debugging.)

Line numbers

There are a number of possibilities for line identifiers in an edit command:

  • A given line number,
  • a relative line number (currently '.' or '$'),
  • a forward-searching regular expression, either given in the command or referring to the previous regular expression ("//").
  • a backward-searching regular expression, or
  • an arithmetic expression involving a line identifier plus or minus some constant.
data ParsedLineNumber = Number Int
                      | Rel Char
                      | EmptyFRe
                      | FRe AnchoredPattern
                      | EmptyBRe
                      | BRe AnchoredPattern
                      | Expr ParsedLineNumber Char Int

Since AnchoredPatterns are built from functions and thus not showable (not instances of the Show class, to be specific), I defined an instance of Show for ParsedLineNumber. An alternative is to define an instance of Show for Pattern or AnchoredPattern. Unfortunately, those are simple types (aliases, in other words, rather than data definitions), and creating an instance for them would be a fairly significant change to existing---one that would not ultimately matter much.

instance Show ParsedLineNumber where
    show (Number i) = "Number " ++ (show i)
    show (Rel ch) = "Rel " ++ (show ch)
    show (EmptyFRe) = "EmptyFRe"
    show (FRe _) = "FRe"
    show (EmptyBRe) = "EmptyBRe"
    show (BRe _) = "BRe"
    show (Expr p c i) = "Expr ("++(show p)++" "++[c]++" "++(show i)++")"

Finally, a series of line identifiers from an editor command are represented by a ParsedLineNumbers type, which is recursive.

  • None is an empty line specification (or a recursive base case).
  • Single is a single line specification (or a recursive base case).
  • Comma is a line specification followed by a comma and further specifications.
  • Semi is similar to Comma, but with a semicolon separator.

The final two cases are used to construct sequences of line specifications, with the difference being that semicolons update the current line, while commas do not.

data ParsedLineNumbers = None
                       | Single ParsedLineNumber
                       | Comma ParsedLineNumber ParsedLineNumbers
                       | Semi ParsedLineNumber ParsedLineNumbers
                         deriving Show

When looking at a ParsedLineNumbers value, one question useful to the editor commands is whether or not the PLN describes a single location or a range. A single location is described by

  • None or Single. The first represents a default single line, the latter an explicit single line.
  • A PLN where the last join-character was a semicolon.

A range location is described by a PLN where the last join-character was a comma.

isUnary :: ParsedLineNumbers -> Bool
isUnary None = True
isUnary (Single _) = True
isUnary pln = case lastJoin pln
              of Comma _ _ -> False
                 Semi _ _ -> True
    where
    lastJoin j@(Comma _ (Single _)) = j
    lastJoin (Comma _ j) = lastJoin j
    lastJoin j@(Semi _ (Single _)) = j
    lastJoin (Semi _ j) = lastJoin j
    lastJoin _ = error "isUnary"

Another useful question is whether any line was explicitly specified.

isDefault None = True
isDefault _ = False

lnNo and lnRel are simple parsers for the simple line specification cases: constant numbers or relative locations.

lnNo :: P.Parser ParsedLineNumber
lnNo = do { n <- P.many1 P.digit; return (Number (read n)) }

lnRel :: P.Parser ParsedLineNumber
lnRel = do { ch <- (P.char '.' <|> P.char '$'); return (Rel ch) }

Regular expressions appear in editing commands in several places: both forward and backward searches, global command, and the intra-line search-and-replace command. The first two differ in the delimiter character around the expression, and the latter two allow the user to specify the delimiter.

delimitedRegex handles parsing all of these cases, with special handling for the escape character followed by the delimiter (other escape sequences are passed through completely untouched).

delimitedRegex :: Char -> P.Parser AnchoredPattern
delimitedRegex delimit = do
                         pat <- P.between reQuote reQuote (P.many1 reChar)
                         case parsePattern pat
                              of Nothing -> fail "illegal regular expression"
                                 Just p -> return p
    where
    reQuote = P.char delimit
    reChar = (P.noneOf (delimit:"@")) <|> escDelimit
    escDelimit = do
                 P.char '@'
                 (reQuote >> return delimit) <|> return '@'

lnRe defines the rule for parsing forward and backward regex searches, including empty regexes that refer to previous patterns.

lnRe :: P.Parser ParsedLineNumber
lnRe = (P.try nullFrwdRe) <|> (P.try nullBkwdRe) <|> forwardRe <|> backwardRe
    where
    nullFrwdRe = do { P.string "//"; return EmptyFRe }
    forwardRe  = do { p <- delimitedRegex '/'; return (FRe p) }
    nullBkwdRe = do { P.string "\\\\"; return EmptyBRe }
    backwardRe = do { p <- delimitedRegex '\\'; return (BRe p) }

lnExp defines the rule for parsing line specification expressions. Currently, these expressions are limited to:

  • a constant line number, relative line identifier, or regular expression search,
  • one '+' or '-' operator, and
  • a constant number.
lnExp :: P.Parser ParsedLineNumber
lnExp = do
        l <- (lnNo <|> lnRe <|> lnRel)
        c <- (P.char '+' <|> P.char '-')
        n <- do { n' <- P.many1 P.digit; return (read n') }
        return (Expr l c n)

lineNumber combines the previous specifier rules to describe a single line identifier...

lineNumber :: P.Parser ParsedLineNumber
lineNumber = (P.try lnExp <|> lnNo <|> lnRe <|> lnRel)

...and lineNumbers describes the sequence of line identifiers.

lineNumbers :: P.Parser ParsedLineNumbers
lineNumbers = P.try lineNumbers_ <|> lineNumber_ <|> return None
    where
    lineNumber_ = do
                  l <- lineNumber
                  return (Single l)
    lineNumbers_ = do
                   l <- lineNumber
                   s <- (P.char ',' <|> P.char ';')
                   ls <- lineNumbers
                   case s of ',' -> return (l `Comma` ls)
                             ';' -> return (l `Semi` ls)

Finally, some of the commands may allow internal whitespace (between the command and an argument, for example).

whitespace :: P.Parser ()
whitespace = do
             P.many1 (P.space <|> P.tab)
             return ()

Command parsing

Each command is parsed from the input line into a Command value. Each Command has associated necessary information---many ParsedLineNumbers indicating where in the buffer the command is to apply. Quit, Filename, and Edit do not---these commands affect the overall editor state.

Quit has no parameters. Filename Nothing (the 'f' command) prints the current file name, while (Filename Just s) renames the buffer to s. Edit requires a string naming the file to be read in, replacing the current buffer.

Equal, the '=' command to print a line number, has an associated Bool; Equal a True indicates that the line should be printed as well as the line number, while Equal a False indicates that it should not. (Delete uses this same 'p' suffix.)

Move requires another ParsedLineNumber, indicating the location to which the lines should be moved.

Read takes a optional string naming the file to be inserted into the buffer, while Write Nothing writes the buffer to the current name and (Write Just s) writes the buffer to the file s.

Subst calls for a regex pattern, a replacement string, and a flag indicating whether or not all occurrences on the line should be changed, or just the first.

GCmd (for the 'g' and 'x' commands) takes a regex pattern, a subcommand, and a flag indicating whether it is a 'g' or an 'x'.

data Command = Quit
             | Print    ParsedLineNumbers
             | Newline  ParsedLineNumbers
             | Equal    ParsedLineNumbers Bool
             | Add      ParsedLineNumbers
             | Insert   ParsedLineNumbers
             | Delete   ParsedLineNumbers Bool
             | Change   ParsedLineNumbers
             | Move     ParsedLineNumbers ParsedLineNumber
             | Filename (Maybe String)
             | Edit     (Maybe String)
             | Read     ParsedLineNumbers (Maybe String)
             | Write    ParsedLineNumbers (Maybe String)
             | Subst    ParsedLineNumbers (Maybe AnchoredPattern) String Bool
             | GCmd     ParsedLineNumbers (Maybe AnchoredPattern) Command Bool

The next group of functions attempt to parse each of the possible commands, returning the resulting Command.

equalcommand, returning Equal, is currently the only command that supports the 'p' suffix modifier, indicating that the current line should be printed. Other commands that should allow this suffix are 'd', 'm', and 's'.

qcommand, fcommand, and ecommand are "raw"; they do not accept line numbers before the command character. This pattern is factored out into the rawcommand parser, which transforms a parser for the command character (and any arguments) and a Command constructor to a Command Parser..

qcommand :: P.Parser Command
qcommand = rawcommand (P.char 'q') (\_ -> Quit)

rawcommand :: P.Parser a -> (a -> Command) -> P.Parser Command
rawcommand pars comm = do { p <- pars; P.eof; return (comm p) }

fcommand and ecommand (as well as rcommand and wcommand below) accept an optional filename argument, and thus use the fileopt parser for that option.

fcommand, ecommand :: P.Parser Command
fcommand = rawcommand (P.char 'f' >> fileopt) Filename
ecommand = rawcommand (P.char 'e' >> fileopt) Edit

fileopt :: P.Parser (Maybe String)
fileopt = (filename <|> return Nothing)
    where
    filename = do
               P.optional whitespace
               s <- P.many1 (P.noneOf " \t")
               return (Just s)

A second group of commands (p, a, i, c, and nl) only allow the line numbers before the command character. These are parsed with the help of locatedcommand. (nl is special---there is no actual command character, so its parser accepts the empty string.)

pcommand, acommand, icommand, ccommand, nlcommand :: P.Parser Command
pcommand  = locatedcommand (P.char 'p') Print
acommand  = locatedcommand (P.char 'a') Add
icommand  = locatedcommand (P.char 'i') Insert
ccommand  = locatedcommand (P.char 'c') Change
nlcommand = locatedcommand (return ()) Newline

locatedcommand :: P.Parser a -> (ParsedLineNumbers -> Command)
               -> P.Parser Command
locatedcommand pars comm = do
                           ln <- lineNumbers
                           pars
                           P.eof
                           return (comm ln)

A third group of commands (=, d, m, r, and w) take both line numbers and an argument. (The argument for = and d is the optional trailing p suffix.) This group of commands is parsed with the help of argcommand. (The p suffix is assisted by suffix.)

equalcommand, dcommand, mcommand, rcommand, wcommand :: P.Parser Command
equalcommand = argcommand (P.char '=' >> (suffix 'p')) Equal
dcommand     = argcommand (P.char 'd' >> (suffix 'p')) Delete
mcommand     = argcommand mpars Move
    where
    mpars = P.char 'm' >> P.optional whitespace >> lineNumber
rcommand     = argcommand (P.char 'r' >> fileopt) Read
wcommand     = argcommand (P.char 'w' >> fileopt) Write

argcommand :: P.Parser a -> (ParsedLineNumbers -> a -> Command)
           -> P.Parser Command
argcommand pars comm = do
                       ln <- lineNumbers
                       p <- pars
                       P.eof
                       return (comm ln p)

suffix :: Char -> P.Parser Bool
suffix ch = (P.char ch >> return True) <|> (return False)

scommand requires multiple parsing steps. First, it gets the line numbers, then the 's' command itself. Then, it looks ahead to the next character to identify the regex pattern quotations using P.getInput. (This step can fail if there is no such character.) Finally, based on that, it reads the pattern itself followed by the substitution string and a final, possible 'g' suffix. (The pattern is read by subre, a parser for a sub-regex also used by gcommand below.)

Additional complexity is called for by the possibility that the substitution regex is empty, thus using the EditState's recorded pattern.

scommand :: P.Parser Command
scommand = do
           ln <- lineNumbers
           P.char 's'
           inp <- P.getInput
           case inp of [] -> fail "illegal pattern"
                       (h:t) -> do
                                p <- subre h
                                r <- P.many (P.noneOf [h])
                                P.char h
                                g <- suffix 'g'
                                P.eof
                                return (Subst ln p r g)

subre :: Char -> P.Parser (Maybe AnchoredPattern)
subre delim = subre' <|> esubre'
    where
    subre' = do
             re <- P.try (delimitedRegex delim)
             return (Just re)
    esubre' = do
              P.string [delim, delim]
              return Nothing

gcommand, parsing both the 'x' and 'g' commands, is relatively more complex. It gathers the line numbers, the command character, then examines the input for the regex quotation character. Finally, it collects the pattern and then the subcommand, which is a subset of the general editor commands, including their line numbers and associated arguments.

gcommand :: P.Parser Command
gcommand = do
           ln <- lineNumbers
           cmd <- P.char 'g' <|> P.char 'x'
           inp <- P.getInput
           case inp of [] -> fail "illegal pattern"
                       (h:t) -> do
                                p <- subre h
                                c <- globalcommand
                                return (GCmd ln p c (cmd == 'g'))
    where
    globalcommand = P.choice [ P.try dcommand, P.try ecommand,
                               P.try fcommand, P.try mcommand,
                               P.try pcommand, P.try rcommand,
                               P.try scommand, P.try wcommand,
                               P.try equalcommand, P.try nlcommand ]

inLine (which is poorly named) parses an individual command line and returns the Command itself and its related information.

There's an old joke about a car designed by the original Unix crew. It seems that this car only has a single instrument in front of the driver: a question mark that lights up under any failing circumstance. The experienced driver, you see, should immediately understand what the problem is.

To support edit's error handling capabilities, when parsing the command line fails, inLine returns Nothing; when it succeeds, inLine returns Just the useful information. (This translation of Either to Maybe is not necessary, but sometimes the code should be subordinate to the joke.)

inLine :: String -> Maybe Command
inLine str = case P.parse line "" str
             of Left _ -> Nothing
                Right cmd -> Just cmd
    where
    line = P.choice [ qcommand,           P.try pcommand,
                      P.try equalcommand, P.try acommand,
                      P.try icommand,     P.try dcommand,
                      P.try ccommand,     P.try mcommand,
                      P.try fcommand,     P.try ecommand,
                      P.try rcommand,     P.try wcommand,
                      P.try scommand,     P.try gcommand,
                      P.try nlcommand ]

Editor state

The editor's buffer is based around a zippered list, described previously, of Strings.

type Buffer = ZipList String

The state of the editor is made up of a buffer, the last recorded regex pattern, and the name of the file being edited.

I am using the State monad to thread this state through the editor's actions. To gain easier access to the individual components of the state, the following functions return (State EditState a) actions acting on the components. These actions will be combined to form an overall (State EditState a) computation handling each command.

  • getBuf and putBuf allow the computation to act on the current Buffer from the state, or return a modified Buffer to the current state respectively.
  • getRe and putRe recover and store the regular expression pattern component.
  • getName and putName recover and store the filename component.
  • Finally, getES recovers the entire EditorState (as an alias for the State monad's get); this is typically used when transferring the state out of the monadic computation.
data EditState = ES {
                     buf :: Buffer,
                     re  :: Maybe AnchoredPattern,
                     fn  :: String
                    }

getBuf :: State EditState Buffer
getBuf = do { st <- get; return (buf st) }

putBuf :: Buffer -> State EditState ()
putBuf b = do { st <- get; put (st { buf = b }) }

getRe :: State EditState (Maybe AnchoredPattern)
getRe = do { st <- get; return (re st) }

putRe :: AnchoredPattern -> State EditState ()
putRe r = do { st <- get; put (st { re = (Just r) }) }

getName :: State EditState String
getName = do { st <- get; return (fn st) }

putName :: String -> State EditState ()
putName n = do { st <- get; put (st { fn = n }) }

getES :: State EditState EditState
getES = get

Line movement

The first, common, part of handling editor commands is in identifying which line or lines to operate on. These functions use provide (State EditState a) monadic computations, typically acting on the buffer component.

First, two helping functions: forwardMatch and backwardMatch take a regex pattern and a buffer, and beginning at the current location in the buffer search forward or backward respectively to find the first line containing a match for the pattern.

forwardMatch :: AnchoredPattern -> Buffer -> Int
forwardMatch pat buf = if match pat (zget buf) then zposition buf
                       else if zrightEnd buf   then error "no match"
                       else forwardMatch pat (zmoveRight buf)

backwardMatch :: AnchoredPattern -> Buffer -> Int
backwardMatch pat buf = if match pat (zget buf) then zposition buf
                        else if zleftEnd buf    then error "no match"
                        else backwardMatch pat (zmoveLeft buf)

doLine interprets an individual ParsedLineNumber (a '.', regex, number, etc.), updating the state as necessary. It returns the line number described by the ParsedLineNumber.

For the FRe and BRe (forward and backward regex searches), the state is updated with the last regex pattern seen. This state component is used by the EmptyFRe and EmptyBRe ParsedLineNumber elements.

doLine :: ParsedLineNumber -> State EditState Int
doLine (Number j) = do
                    b <- getBuf
                    if j < 1 || j > (zlength b)
                       then error "no line"
                       else return j
doLine (Rel '.') = do
                   b <- getBuf
                   return (zposition b)
doLine (Rel '$') = do
                   b <- getBuf
                   return (zlength b)
doLine (FRe pat) = do
                   b <- getBuf
                   putRe pat
                   return (forwardMatch pat (zmoveRight b))
doLine EmptyFRe = do
                  p <- getRe
                  case p of Nothing -> error "no pattern"
                            Just pat -> doLine (FRe pat)
doLine (BRe pat) = do
                   b <- getBuf
                   putRe pat
                   return (backwardMatch pat (zmoveLeft b))
doLine EmptyBRe = do
                  p <- getRe
                  case p of Nothing -> error "no pattern"
                            Just pat -> doLine (BRe pat)
doLine (Expr pln '+' i) = do
                          pos <- doLine pln
                          b <- getBuf
                          if (pos + i) > (zlength b)
                             then error "end of buffer"
                             else return (pos + i)
doLine (Expr pln '-' i) = do
                          pos <- doLine pln
                          if ((pos - i) < 0)
                             then error "beginning of buffer"
                             else return (pos - i)

doLines interprets the collected ParsedLineNumbers, using doLine to handle the individual components. Note the difference between the Comma and Semi definitions. The former remembers the previous component as part of the two components describing a range, while the latter also updates the buffer state with the new current line.

doLines returns a State EditState (Int,Int,Buffer) action; the first Int is the second-to-last line number, the second is the last line number. (These may be important, if the line number specified is unary. In that case, the first Int is current line number of the buffer entering the computation, while the second Int is the specified line.) The Buffer returned is the current Buffer from the state; this element was added after it was noted that all the handlers using a version of doLines immediately recovered the buffer component from the state.

doLines :: ParsedLineNumbers -> State EditState (Int,Int,Buffer)
doLines pln = do
              b <- getBuf
              doLines' (zposition b) pln
    where
    doLines' i None = do
                      b <- getBuf
                      return (i, zposition b, b)
    doLines' i (Single ln) = do
                             j <- doLine ln
                             b <- getBuf
                             return (i,j,b)
    doLines' i (Comma ln lns) = do
                                j <- doLine ln
                                doLines' j lns
    doLines' i (Semi ln lns) = do
                               j <- doLine ln
                               b <- getBuf
                               putBuf (zgoto b j)
                               doLines' j lns

Command interpretation

runStep accepts the current state of the editor and a command handler, an EditStep, that advances the state of the editor. There are two ways for an editing step to change the editor's state:

  • an "internal" state change, manipulating the internal editing data structures, similar to the line movement operations above, and
  • an "external" state change, involving reading lines from the user (for the "a" command, say) or writing file contents to the output. (Some IO operations may not change the editor's state, such as printing a line for the "p" command, but it is simpler to treat all such IO operations together.)

To handle both of these state changes, an editing step is an action in the State monad that returns an action in the IO monad. Each has an opportunity to modify or use the editor state, as seen below in the individual command handlers.

type EditStep = State EditState (IO EditState)

Here is a question for you: If a monadic function returns an "action" representing a computation, how does the computation actually get executed? A Haskell program, overall, is an IO () action---it is an IO action returning unit. How does this action actually do anything?

Part of the answer is below. runStep takes the current EditState and an EditStep (a (State Editstate (IO EditState)) action), and returns an IO (Maybe EditState) action. To do that, it has to execute the State monad's computation, and to do that it uses the runState function. runState has the following type:

runState :: State s a -> s -> (a, s)

The first argument, (State s a), is a State-monadic computation threading state s and returning a value a. The second argument is the initial state, s. The result is a pair consisting of the final return value, a, and the terminal state, s. To produce that a and s, in this case, involves executing the editor command handler beginning in the current state, including the line numbers interpreter above and the individual command handlers below.

The final return value is IO (Maybe EditState), an IO action possibly returning a new editor state. This IO action is needed to read new lines from the user, for example, or to read or write to a file.

Almost all of the commands return Just a new EditState; the only exception is Quit which returns Nothing, signaling the mainloop to terminate.

So, if runState is necessary to execute a State computation, what executes an IO computation? What executes the whole program, overall? A magic bit of the runtime system, that's what. (I believe there are alien astronauts and giant pyramids involved.)

My original version of the editor's code had the editor's main loop read and parse a command from the user, then call doCommand below (determining the command handler), which in turn called runStep to invoke the handler. runStep then invoked the main loop:

runStep :: EditState -> EditStep -> IO ()
runStep es step =
    do
    es' <- (C.catch step' onError)
    mainloop es'
    where
    step' = fst (runState step es)
    onError err = do { putStrLn "?"; return es }

doCommand :: EditState -> Command -> IO ()
doCommand es (Print pln) = runStep es (doprint pln)
...

This mutually tail-recursive loop formed the core of the editor. It had the advantage that the quit command could simply return (), rather than invoking runStep.

Unfortunately, when I began working on the global g and x commands, I wanted to use doCommand inside that command's loop (a loop entered from the main loop and returning to the main loop when all of the lines identified by g or x have been handled). Two options presented themselves:

  • Rewrite mainloop to be a closed loop itself, invoking doCommand and runStep on each iteration, or
  • Wandering into continuation passing style, where doCommand and runStep each took an extra parameter indicating the function to call when runStep was done.

The first option actually seemed easier, although it required returning information back to the main loop about the nature of the command. (A Quit command returns Nothing, to indicate that there is no next state and the editor should exit; other commands return Just a new EditState, giving the loop the next state to iterate on.) The result is runStep, doCommand, and mainloop as seen below.

runStep uses catch to stop error propagation in the command execution, printing the "?" message and returning the state unchanged. If no error occurs, the new state is returned.

runStep :: EditState -> EditStep -> IO (Maybe EditState)
runStep es step = do
                  es' <- (C.catch step' onError)
                  return (Just es')
    where
    step' = fst (runState step es)
    onError _ = do { putStrLn "?"; return es }

doCommand fundamentally demultiplexes the editor commands; it accepts an editor state and a Command, and either

  • returns Nothing for a Quit command, terminating the editor, or
  • identifies the proper command handler, provides it with the information from the Command, and passes it to runStep.

The two complexities here are the substitution command, which has (or does not have) the "g" flag, and the g/x commands, which are identical except for the reversed test. The dosub handler is passed a function transforming each matching line created from the Subst values. The doglobal handler is passed a line predicate determined by the g or x command and the pattern.

doCommand :: EditState -> Command -> IO (Maybe EditState)
doCommand es Quit = return Nothing                                 -- q
doCommand es (Print pln) = runStep es (doprint pln)                -- p
doCommand es (Newline pln) = runStep es (donewline pln)            -- \n
doCommand es (Equal pln p) = runStep es (doequal pln p)            -- =
doCommand es (Add pln) = runStep es (doadd pln)                    -- a
doCommand es (Insert pln) = runStep es (doinsert pln)              -- i
doCommand es (Delete pln p) = runStep es (dodelete pln p)          -- d
doCommand es (Change pln) = runStep es (dochange pln)              -- c
doCommand es (Filename nm) = runStep es (dofile nm)                -- f
doCommand es (Move pln dst) = runStep es (domove pln dst)          -- m
doCommand es (Edit fn) = runStep es (donewfile fn)                 -- e
doCommand es (Read pln fn) = runStep es (doread pln fn)            -- r
doCommand es (Write pln nm) = runStep es (dowrite pln nm)          -- w
doCommand es (Subst pln p r glob) = runStep es (dosub pln sub)     -- s/sg
      where
      sub = (if glob then subline else subline1) (findpat es p) (makesub r)
doCommand es (GCmd pln p c pos) = runStep es (doglobal pln test c) -- g/x
      where
      test = let t = match $ findpat es p in if pos then t else (not . t)

findpat _ (Just p) = p
findpat es Nothing = case re es of Nothing -> error "no pattern"
                                   Just p -> p

Command handlers

Each of the command handlers is an EditStep, a State EditState computation building and returning an IO EditState computation. There may well be a better way to mix-n-match two Monadic computations. However, this seems to work pretty well and produce decent code (with the possible exception of the occasional double return. (When you see

return (return es)

the first return, acting in the State monad, is acting on an IO computation, the second return, which is itself simply returning a EditState into the IO action.)

doprint is the prototypical EditStep. First, it interprets the ParsedLineNumbers, getting two line numbers and the current buffer. Then, it gets a new buffer, moving to the second line specification, and stores this buffer back in the State. Finally, it gets the overall EditState and uses it to return an IO action.

In this case the IO action depends on whether the original PLN was unary or binary. If unary, there is only one line to be printed (the current line in the buffer); if binary, there is a range of lines to be printed.

The final IO action is built with a helper, used in common with several other handlers. This helper, putStrES, prints a string and then returns the EditState in the IO monad.

Note the (i', j'); this ordering ensures that the line numbers are in order. I believe the original would print its error message if they were in the wrong order; helpfully interpreting the range seems to me to be a better thing to do.

doprint :: ParsedLineNumbers -> EditStep
doprint pln =
    do
    (i,j,b) <- doLines pln
    let b' = zgoto b j
    putBuf b'
    es <- getES
    if isUnary pln
       then return (putStrES (zget b') es)
       else let (i',j') = ((min i j), (max i j))
                ls = concat $ intersperse "\n" (zslice b i' j')
            in return (putStrES ls es)

putStrES :: String -> EditState -> IO EditState
putStrES s es = do { putStrLn s; return es }

The newline command is similarly simple, printing a single line and advancing through the buffer if no line was specified with the command.

donewline :: ParsedLineNumbers -> EditStep
donewline pln =
    do
    if isDefault pln
       then do
            b <- getBuf
            putBuf (zmoveRight b)
       else do
            (_,j,b) <- doLines pln
            putBuf (zgoto b j)
    es <- getES
    return (putStrES (zget (buf es)) es)

I mentioned earlier that the '=' command was the only one that currently implemented the 'p' suffix. While the parser above reads the suffix, doequal here is charged with actually performing it.

To do so, the doequal handler is built of two other handlers:

  • an internal, plain doequal' handler, which simply prints the line number, in the manner of doprint, and
  • possibly the doprint handler, if the Bool flag is true.

If the flag is true, the two actions are combined with a (&>) operator, shown below. This operator uses the first EditStep to get a first IO action, uses the second EditStep to get a second IO action, then returns an IO action performing both of the IO actions in order. In this case, doequal' prints the line number and (doprint None) prints the current line.

There are a number of limitations in this combination; in particular, if the first EditStep returns an IO action which changes the EditState, the new EditState will be lost; the second IO action is using the same state as the first action, rather than the result of the first action. In this case, however (and in all uses of (&>)), the first action does not change the state, so the combination works.

doequal :: ParsedLineNumbers -> Bool -> EditStep
doequal pln p = if p then (doequal' pln) &> (doprint None) else doequal' pln
    where
    doequal' pln =
        do
        (_,j,b) <- doLines pln
        putBuf (zgoto b j)
        es <- getES
        return (putStrES (show $ zposition $ buf es) es)

(&>) :: EditStep -> EditStep -> EditStep
(&>) e1 e2 = do { a1 <- e1; a2 <- e2; return (do { a1; a2 }) }

doadd is the primary reason an EditStep returns an IO EditState, rather than an IO () action. In order to allow the user to add lines to the buffer in the state, the IO action has to have the opportunity to fiddle with it before returning it to the main loop.

The fiddling, in this case, is done by addlines, an IO action which reads lines from a file descriptor until it sees a line consisting of only a '.'.

doadd :: ParsedLineNumbers -> EditStep
doadd pln =
    do
    (_,j,b) <- doLines pln
    putBuf (zgoto b j)
    es <- getES
    return (addlines stdin es)

addlines :: Handle -> EditState -> IO EditState
addlines fd es =
    do
    ln <- getline fd
    case ln
         of Nothing -> return es
            Just l -> if l == ".\n"
                      then return es
                      else let b = buf es
                               es' = es { buf = (zinsert b (init l)) }
                           in addlines fd es'

doinsert is a variant of doadd, except that it moves one further line up in the buffer (left, for those of you thinking about the ZipList) before allowing doadd to add the lines.

doinsert :: ParsedLineNumbers -> EditStep
doinsert pln =
    do
    (_,j,b) <- doLines pln
    putBuf (zgoto b (j-1))
    doadd None

dodelete is an example of an EditStep that performs only internal state modifications. It deletes some specified lines from the buffer before doing the double return---one return give back to the State computation an IO computation returning the EditState.

dodelete :: ParsedLineNumbers -> Bool -> EditStep
dodelete pln p = if p then (dodelete' pln) &> (doprint None) else dodelete' pln
    where
    dodelete' pln =
        do
        (i,j,b) <- doLines pln
        if isUnary pln
           then do
                let b' = zdelete b (j-1) j
                putBuf (zgoto b' (prevln j))
           else do
                let (i',j') = ((min i j), (max i j))
                    b' = zdelete b (i'-1) j'
                putBuf (zgoto b' (prevln i'))
        es <- getES
        return (return es)
    prevln i = if i <= 1 then 1 else (i-1)

Given dodelete and doadd, dochange is a simple combination---it deletes the lines to be changed, then uses add to get the replacements.

dochange :: ParsedLineNumbers -> EditStep
dochange pln = (dodelete pln False) &> (doadd None)

Ah domove, the user of that wonderful blkmove monstrosity. And, another example of an internal-only EditStep.

You know, I have used ed moderately frequently in previous years, and I have never used the move command that I can recall. Strange.

domove :: ParsedLineNumbers -> ParsedLineNumber -> EditStep
domove pln dst =
    do
    (i,j,b) <- doLines pln
    k <- doLine dst
    if isUnary pln
       then putBuf (blkmove b (j-1) (j-1) (k-1))
       else let (i',j') = ((min i j), (max i j))
            in putBuf (blkmove b (i'-1) (j'-1) (k-1))
    es <- getES
    return (return es)

dofile: change the state's filename, or show the current name.

dofile :: Maybe String -> EditStep
dofile s = do
           case s of Nothing -> return ()
                     Just f -> putName f
           es <- getES
           return (putStrES (fn es) es)

Here is a strange, yet lovely, EditStep: donewfile simply returns an IO action throwing away the current EditState and replacing it with a shiny new one, read from a file. Due to the magic of readfile, the file does not have to exist before the operation.

donewfile :: Maybe String -> EditStep
donewfile fn' =
    do
    es <- getES
    case fn' of Nothing -> return ( do
                                    b <- readfile (fn es)
                                    return (es { buf = b }) )
                Just f -> return ( do
                                   b <- readfile f
                                   return (es { buf = b, fn = f }) )

doread performs simple internal state changes, simply moving to the correct spot in the buffer, before handing the work over to the IO action, which reads the file and adds the lines to the buffer.

doread :: ParsedLineNumbers -> Maybe String -> EditStep
doread pln f =
    do
    (_,j,b) <- doLines pln
    putBuf (zgoto b j)
    es <- getES
    let f' = case f of Nothing -> (fn es)
                       Just fn' -> fn'
    return (bracket (mustopen f' IO.ReadMode)
                    close
                    (insertfile es))
    where
    insertfile es fd =
        do
        ln <- getline fd
        case ln
             of Nothing -> return es
                Just l -> let b = buf es
                              es' = es { buf = (zinsert b (init l)) }
                          in insertfile es' fd

dowrite is the most complex use of PLNs, because the behavior is different in every possible case. The default, no PLN specified, indicates (1,$). A unary PLN indicates writing a single line and a range calls for writing only the given range. The actual IO action is provided by a simple helper.

dowrite :: ParsedLineNumbers -> Maybe String -> EditStep
dowrite pln nm =
    case nm
         of Nothing -> getName >>= write
            Just s -> write s
    where
    write fn =
        do
        if isDefault pln
           then do
                b <- getBuf
                es <- getES
                return (writefileES fn (ztoList b) es)
           else if isUnary pln
                then do
                     (_,j,b) <- doLines pln
                     es <- getES
                     return (writefileES fn [zget (zgoto b j)] es)
                else do
                     (i,j,b) <- doLines pln
                     let (i',j') = ((min i j), (max i j))
                     es <- getES
                     return (writefileES fn (zslice b i' j') es)
    writefileES fn s es = do { writefile fn s; return es }

dosub, the substitution command, is remarkably simple, given that it exercises a big chunk of the editor's code. It is an internal-only EditStep, so its IO action is trivial. Fundamentally, it recovers the range of lines to operate on, then uses the slines and sline helpers to loop through those lines performing the requisite substitutions.

The substitution itself is passed in, as a function that maps a string (one line) to another (the resulting line). See doCommand above for the construction of that function.

dosub :: ParsedLineNumbers -> (String -> String) -> EditStep
dosub pln subst =
    do
    (i,j,b) <- doLines pln
    if isUnary pln
       then slines b j j
       else let (i',j') = ((min i j), (max i j)) in slines b i' j'
    es <- getES
    return (return es)
    where
    slines b n m = do
                   sline b n
                   if n == m
                      then return ()
                      else do
                           b' <- getBuf
                           slines b' (n+1) m
    sline b n = let b' = zgoto b n
                in putBuf $ zreplace b' (subst (zget b'))

doglobal, although it required changes is runStep, doCommand, and mainloop, wound up fairly simple. It is built on almost the same skeleton as dosub, looping through the lines in a range and performing an action.

Note that, however, the loop in doglobal is in the IO monad, not the State monad. Because doglobal is calling other EditStep handlers, its behavior is external rather than dosub's internal. I originally missed that distinction when I wrote this, and the result failed to compile due to type errors. To help pin them down, I added the type annotations for the helper functions and thought about them. Then I realized that doCommand returns an IO action, not a State action, and thus global and global' would need to be IO computations.

A few changes to the type annotations, then to make the functions match the annotations, and it compiled and worked. (There are many stories of programs in strongly-typed, ML-ish languages "just working" once the types were figured out. This is my current one.

doglobal :: ParsedLineNumbers -> (String -> Bool) -> Command -> EditStep
doglobal pln test cmd =
    do
    (i,j,b) <- doLines pln
    es <- getES
    if isDefault pln
       then return (global es 1 (zlength (buf es)))
       else
       if isUnary pln
          then return (global es j j)
          else let (i',j') = ((min i j), (max i j))
               in return (global es i' j')
    where
    global :: EditState -> Int -> Int -> IO EditState
    global es n m = do
                    es' <- global' es n
                    if n == m
                       then return es'
                       else global es' (n+1) m
    global' :: EditState -> Int -> IO EditState
    global' es n = do
                   let b' = zgoto (buf es) n
                       es' = es { buf = b' }
                   if test (zget b')
                      then do
                           s <- doCommand es' cmd
                           case s
                                of Nothing -> return es'
                                   Just es'' -> return es''
                      else return es'

...and the editor

And that's that. From here on, edit is relatively simple. Although the code above is a weird dance through two monads (three, counting the regular expression engine's use of Maybe), the actual usage of the code is relatively simple.

Argument parsing

parseargs determines whether the user has invoked edit with a filename or not. readfile brings in a file as a Buffer, while writefile outputs a list of String lines (which is a slightly more useful interface, given that it may be called on a subset of the lines in a buffer).

parseargs :: IO (Maybe String)
parseargs = do
            args <- getArgs
            case args of [filename] -> return (Just filename)
                         [] -> return Nothing
                         _ -> error "usage: edit [file]"

File input/output

readfile :: String -> IO Buffer
readfile fn = do
              s <- readFile fn `catch` (\_ -> return "")
              return (zfromList (lines s))

writefile :: String -> [String] -> IO ()
writefile fn ss = let s = (concat $ intersperse "\n" ss) ++ "\n"
                  in writeFile fn s

Main program

And main generates an EditState, based on the command line argument, while the mainloop reads a command from stdin, parses it (printing "?" if parsing fails), and performs the command.

mainloop :: EditState -> IO ()
mainloop es =
    do
    line <- getline stdin
    case line
         of Nothing -> return ()
            Just ln -> case inLine (init ln)
                       of Nothing -> do { putStrLn "?"; mainloop es }
                          Just cmd -> do
                                      s <- doCommand es cmd
                                      case s of Nothing -> return ()
                                                Just es' -> mainloop es'

main :: IO ()
main = do
       let es = ES { buf = undefined, re = Nothing, fn = undefined }
       arg <- parseargs
       case arg of Nothing -> mainloop ( es { buf = ziplist, fn = "" } )
                   Just fn -> do
                              b <- readfile fn
                              mainloop ( es { buf = b, fn = fn } )

gloria i ad inferni
faciamus opus

Return to Top | About this site...
Last edited Sat Aug 8 03:29:10 2009.
Copyright © 2005-2012 Tommy M. McGuire