CHAPTER SEVEN

Formatting text

One of the first uses for Unix was as a text processing system, producing well-formatted documents from text input. The tools used to produce that output eventually evolved to the familiar n- and troff (and related) programs.

As an example of this form of text processing the following output was produced by the input file which follows:

                             PROGRAM

  format - produce formatted output

USAGE

  format

FUNCTION

  format reads its input a line at a time  and  writes  a  neatly
  formatted  version  of  the input text to the output, with page
  headers and footers and with output lines filled to  a  uniform
  right margin. Input text lines may have interspersed among them
  command lines that alter this default  mode  of  formatting.  A
  command  line  consists  of a leading period, followed by a two
  letter code, possibly with  optional  arguments  following  the
  first sequence of blanks and tabs.
     Certain  commands cause a "break" in the processing of input
  text lines, i.e., any partially filled line is output and a new
  line  is  begun. In the following command summary, the letter n
  stands for an optional numeric argument. If a numeric  argument
  is preceded by a + or a -, the current value is changed by this
  amount; otherwise the argument represents the new value. If  no
  argument is given, the default value is used.


    command  break?  default  function
    .bp n    yes     n=+1     begin page numbered n
    .br      yes              cause break
    .ce n    yes     n=1      center next n lines
    .fi                       start filling
    .fo str  no      empty    footer title
    .he str  no      empty    header title
    .in n    no      n=0      indent n spaces
    .ls n    no      n=1      line spacing is n
    .nf      yes              stop filling
    .pl n    no      n=66     set page length to n
    .rm n    no      n=60     set right margin to n
    .sp n    yes     n=1      go down n lines or to end of page
    .ti n    yes     n=0      temporary indent of n
    .ul n    no      n=1      underline words from next n lines


  A  blank  input line causes a break and is passed to the output
  unchanged. Similarly, an input line  that  begins  with  blanks
  causes  a  break  and is written to the output with the leading
  blanks preserved. Thus a document formatted in the conventional
  manner  by  hand  will retain its original paragraph breaks and
  indentation.







Page 1                                            15 Sep 2007

The input that produced it is:

.he format.hs
.fo Page #                                            15 Sep 2007
.rm 65

.ce 1
PROGRAM

.ti 2
format - produce formatted output

USAGE

.ti 2
format

FUNCTION

.in 2
format reads its input
a line at a time
and writes a neatly
formatted version of
the input text to the output,
with page headers
and footers and with output
lines filled to a uniform right margin.
Input text
lines may have interspersed
among them command
lines that
alter this default mode of formatting.
A command line consists of a
leading period, followed
by a two letter code, possibly with
optional arguments
following the first sequence of blanks and tabs.
.br
.ti 3
Certain commands cause a "break" in the processing of input text
lines, i.e., any partially filled line is output and a new line is
begun.  In the following command summary, the letter n stands for an
optional numeric argument.  If a numeric argument is preceded by a +
or a -, the current value is changed by this amount; otherwise the
argument represents the new value.  If no argument is given, the
default value is used.
.sp 2
.nf
.in +1
command  break?  default  function
 .bp n    yes     n=+1     begin page numbered n
 .br      yes              cause break
 .ce n    yes     n=1      center next n lines
 .fi                       start filling
 .fo str  no      empty    footer title
 .he str  no      empty    header title
 .in n    no      n=0      indent n spaces
 .ls n    no      n=1      line spacing is n
 .nf      yes              stop filling
 .pl n    no      n=66     set page length to n
 .rm n    no      n=60     set right margin to n
 .sp n    yes     n=1      go down n lines or to end of page
 .ti n    yes     n=0      temporary indent of n
 .ul n    no      n=1      underline words from next n lines
.in -1
.fi
.sp 2
A blank input line causes a break
and is passed to the output unchanged.
Similarly, an input line that begins with blanks causes
a break and is written to the output with the leading blanks
preserved.
Thus a document formatted in the conventional manner by
hand will retain its original paragraph breaks and indentation.

Note: underlining is not implemented by this code. I have no idea whether any terminal I have handy would correctly handle backspaces.

Formatting implementation

K&P's implementation begins with the main loop and command parser, so I have begun with a skeletal loop and handlers. I initially sketched the code, relying on Haskell's undefined function and the type declarations.

One thing I know that will be necessary is FormatState, a blob of data similar to EditState from edit. It encapsulates the entire state necessary to format the output.

-- command :: FormatState -> String -> IO ()
-- command = undefined
--
-- text :: FormatState -> String -> IO ()
-- text = undefined
--
-- flushoutput :: FormatState -> IO ()
-- flushoutput = undefined

The implementations of the functions were filled out below as the implementation proceeded.

This code is similar to the ubiquitous copy, with the exception that it partially buffers output (requiring flushoutput) and that it incorporates commands, which are specified as lines beginning with a period.

mainloop :: FormatState -> String -> IO FormatState
mainloop fs ('.':l) = command fs l
mainloop fs l       = text fs l

main = do
       text <- getContents
       formatstate' <- foldM mainloop formatstate $ lines text

There are a small number of easily identifiable pure functions in the original, used as supporting infrastructure for the major heavy lifting of formatting.

isSpace, skipblank, and skiptoblank are used to pick apart strings, finding the first non-whitespace and first whitespace in a string.

gettl finds the first non-blank following a command such as .fo or .he and returns it as a title or footer. A single or double quote can be used to include spaces in the title, in which case the title is everything following the quote, including the spaces:

.fo '    a spaced footer

getval and setparam work together to manipulate the numeric parameters for commands such as .bp. getval returns a Value selector, including the numeric argument if one is present; setparam uses this Value selector and the current and default values for the parameter, along with lower and upper bounds (and minmax), to return the appropriate new value.

isSpace c   = c == ' ' || c == '\t'
skipblank   = dropWhile isSpace
skiptoblank = dropWhile (not . isSpace)

gettl :: String -> String
gettl s = case skipblank (skiptoblank s) of [] -> ""
                                            ('"':s') -> s'
                                            ('\'':s') -> s'
                                            s' -> s'

data Value = Absolute Int
           | Plus Int
           | Minus Int
           | Default

getval :: String -> Value
getval s = case skipblank (skiptoblank s) of []       -> Default
                                             ('+':s') -> Plus (fst (ctoi s'))
                                             ('-':s') -> Minus (fst (ctoi s'))
                                             s'       -> Absolute (fst (ctoi s'))

setparam (Default)    cur def l u = minmax def     l u
setparam (Absolute i) cur def l u = minmax i       l u
setparam (Plus i)     cur def l u = minmax (cur+i) l u
setparam (Minus i)    cur def l u = minmax (cur-i) l u

minmax v l u = max l (min u v)

Constants

pagelen   = 66
pagewidth = 60
huge      = 10000

K&P say,

The majority of the parameters are kept in global variables in the main routine, since they are needed throughout the program and there are far too many to pass around as arguments.

Fortunately or unfortunately, I don't have that option. Instead, I get to create a massive FormatState record, with elements containing each of the individual values necessary for formatting.

The first group of elements are parameters used for formatting, such as whether or not filling should be done, the line spacing, and so forth.

(An odd thing: spval. It is only used by the .sp command, but seems to be persistent. So, repeated uses of .sp +1 would generate an ever-increasing number of blank lines? Why?)

The second group provides state information for the page and line numbers.

The final group provides buffer storage for output filling. Side and dir are part of this mechanism; for their use, see the function spread below.

data Side = L | R deriving Show

data FormatState = FS {
                       -- Formatting parameters
                       fill :: Bool,          -- fill if true; initially true
                       lsval :: Int,            -- line spaceing; initially 1
                       spval :: Int,                   -- # of lines to space
                       inval :: Int,     -- current indent; >= 0; initially 0
                       rmval :: Int,     -- right margin; initially pagewidth
                       tival :: Int,  -- current temporary indent; initially 0
                       ceval :: Int,    -- no of lines to center; initially 0
                       plval :: Int,                  -- page length in lines
                       m1val :: Int,      -- margin before & including header
                       m2val :: Int,                   -- margin after header
                       m3val :: Int,           -- margin after last text line
                       m4val :: Int,             -- bottom margin with footer
                       bottom :: Int,                    -- last line on page
                       header :: String,                 -- top of page title
                       footer :: String,              -- bottom of page title
                       -- State information
                       curpage :: Int,          -- current output page number
                       newpage :: Int,                    -- next page number
                       lineno :: Int,                     -- next line number
                       -- Output area
                       outbuf :: [String], -- lines to be filled collect here
                       dir :: Side                  -- side for blank padding
                      }
                 deriving Show

Along with the FormatState record definition comes a definition of the initial state used in main.

formatstate = FS {
                  -- Formatting parameters
                  fill = True,
                  lsval = 1,
                  spval = 0,
                  inval = 0,
                  rmval = pagewidth,
                  tival = 0,
                  ceval = 0,
                  -- State information
                  curpage = 0,
                  newpage = 1,
                  lineno = 0,
                  plval = pagelen,
                  m1val = 3,
                  m2val = 2,
                  m3val = 2,
                  m4val = 3,
                  bottom = pagelen - 2 - 3,   -- plval - m3val - m4val
                  header = "\n",
                  footer = "\n",
                  -- Output area
                  outbuf = [],
                  dir = R
                 }

The first problem facing format is how to thread that massive FormatState through the IO operations involved in writing output. In edit, which faced a similar probelm, the input and output behavior formed a consistent alternation that made combining the two manually tractible, more or less (see my comments about duplicated returns). edit read a command, performed internal state transitions, then handed the state off to IO code to either write output or to modify the state further (when reading new lines, for example).

format does not have that simple alternation. Consider filling text:

  1. format reads a line of words, combines those words with previously read words, determines that enough has been read to fill a line, then
  2. writes the filled line, and then
  3. records in its state any remaining words that go onto the next line.

Meanwhile, writing the line involves

  1. checking the state to determine whether printing a page header is needed,
  2. indentation, also potentially updating the state,
  3. actually writing the line including line spacing, and
  4. potentially printing the page footer.

Further, at the end of a paragraph (or after seeing some of the dot-commands), format needs to flush any remaining words, forming the end of a paragraph.

This complex interaction would seem to make it difficult to use both the State monad to futz with internal state changes and the IO monad to handle external state changes, since that means combining actions in the two monads in complex ways. The other option would be to fall back into only the IO monad, and using functional code to handle the byzantine state changes.

Fortunately, Haskell has a way of combining the features of two (or more) monads into one computation: monad transformers.

Monad transformers are special variants of standard monads [such as State] that facilitate the combining of monads.

(From "All About Monads", http://www.haskell.org/all_about_monads/html/ which has more information.)

The transformer version of the State monad is StateT, which accepts an extra parameter for a contained monad, producing the combined monad. (StateT s m a) is the type of values of the combined monad where State is the base monad, s is the state, m is the inner monad, and a is the type returned from the computation. For format, s will be FormatState, and m (the inner monad) will be IO, producing the (FormatStep a) type for a computation in the combined monad producing a value of type a.

type FormatStep a = StateT FormatState IO a

Get operations

As with edit, one of the primary things needed for format is a collection of functions to access the contents of the FormatState from the monadic computation. getfs returns the overall FormatState, gettival returns element tival, and so on.

In order to generalize the state element accessors, I defined getfstate, getfstate2, etc., which allow access of one or more elements in one operation, by element name (actually, by the (FormatState -> v) function created by the record definition). This reduces the amount of code required to access several state elements within the operations below.

getfs :: FormatStep FormatState
getfs = get

getfstate f = do { fs <- getfs; return $ f fs }
getfstate2 f g = do { fs <- getfs; return (f fs, g fs) }
getfstate3 f g h = do { fs <- getfs; return (f fs, g fs, h fs) }
getfstate4 f g h i = do { fs <- getfs; return (f fs, g fs, h fs, i fs) }
getfstate5 f g h i j = do { fs <- getfs; return (f fs, g fs, h fs, i fs, j fs) }

gettival, getceval, getm2val, getm3val, getm4val :: FormatStep Int

gettival = getfstate tival
getceval = getfstate ceval
getm2val = getfstate m2val
getm3val = getfstate m3val
getm4val = getfstate m4val

Put operations

A further collection of functions is used to update the state from within the monad. putfs accepts a FormatState and inserts it into the monadic state, while putfill sets the fill/don't fill flag in the current state.

Unfortunately, there does not seem to be a technique allowing as neat a trick as the getfstate functions.

putfs :: FormatState -> FormatStep ()
putfs = put

putfill :: Bool -> FormatStep ()
putfill v = modify (\s -> s { fill = v })

putlsval, putspval, putinval, putrmval, puttival, putceval,
    putcurpage, putnewpage, putlineno, putplval, putm1val, putm2val,
    putm3val, putm4val, putbottom :: Int -> FormatStep ()

putlsval v =   modify (\s -> s { lsval = v })
putspval v =   modify (\s -> s { spval = v })
putinval v =   modify (\s -> s { inval = v })
putrmval v =   modify (\s -> s { rmval = v })
puttival v =   modify (\s -> s { tival = v })
putceval v =   modify (\s -> s { ceval = v })
putcurpage v = modify (\s -> s { curpage = v })
putnewpage v = modify (\s -> s { newpage = v })
putlineno v =  modify (\s -> s { lineno = v })
putplval v =   modify (\s -> s { plval = v })
putm1val v =   modify (\s -> s { m1val = v })
putm2val v =   modify (\s -> s { m2val = v })
putm3val v =   modify (\s -> s { m3val = v })
putm4val v =   modify (\s -> s { m4val = v })
putbottom v =  modify (\s -> s { bottom = v })

putheader, putfooter :: String -> FormatStep ()
putheader v =  modify (\s -> s { header = v })
putfooter v =  modify (\s -> s { footer = v })

While these functions are generally fairly simple, it is useful to have more complex and less general operators for some state manipulations. reversedir flips the Side value of dir. (See spread below for more about what dir does.)

reversedir :: FormatStep ()
reversedir = do { fs <- get; put fs { dir = swap (dir fs) } }
    where
    swap d = case d of L -> R; R -> L

I briefly mentioned the words manipulated while filling text. The current list of words awaiting output are stored in the outbuf element of FormatState. Two primary operations on outbuf are to clear it, replacing it with an empty list, and to add a word to it. (The words are stored in reverse order, to simplify stuffing an new one in.) The basic method of recovering the list of words from the state is getoutbuf, which reverses the list before returning it to the FormatStep.

However, there are two more complex operations used to access the outbuf state: getoutput and spread.

  • getoutput returns the buffered words as a string using the minimal width possible, with each word separated by a single space.
  • spread (below) fills the words to a given width.
clearoutput :: FormatStep ()
clearoutput = modify (\st -> st { outbuf = [] })

addoutput :: String -> FormatStep ()
addoutput word = modify (\st -> st { outbuf = (word : (outbuf st)) })

getoutbuf :: FormatStep [String]
getoutbuf = do { fs <- get; return $ reverse (outbuf fs) }

getoutput :: FormatStep String
getoutput = do { outbuf <- getoutbuf; return (concat $ intersperse " " outbuf) }

spread, according to K&P,

...is tricky, (which is not a compliment), but it performs an elaborate function and performs it correctly.

(I don't know about correctly, here....) This is not K&P's spread, but it does perform the same elaborate function. Where getoutput returns the waiting-to-be output words as a minimal length string, spread returns the same words as a string of length w. To do this, it computes:

  • base, the length of the words,
  • holes, the number of spaces between words,
  • extra, the number of extra spaces needed to fill out width w, in addition to the words with the minimal one space between them,
  • spc, the base number of spaces between words (>= 1),
  • spr, a list representing the "spreading" of the extra spaces (a prefix of 1's followed by 0's, where a 1 indicates that an extra space should be included in the matching hole), and
  • t and t', where t is a list of strings made up of the appropriate number of spaces for each hole and t' is t possibly reversed (based on dir).

dir (and t') are used to alternate between adding the extra spaces to the left of the column of text and to the right, avoiding "rivers" of whitespace.

The resulting string is of length w, made up of a word and a sequence of spaces and words ending with a word. The spaces are arranged so as to be as evenly divided as possible.

spread :: Int -> FormatStep String
spread w = do
           dir <- getfstate dir
           words <- getoutbuf
           let base = sum $ map (\ w -> length w) words
               holes = (length words) - 1
               extra = w - base - holes
               spc = 1 + (extra `div` holes)
               spr = (replicate (extra `mod` holes) 1) ++ (repeat 0)
               t = take holes [ replicate (spc+y) ' ' | y <- spr ]
               t' = case dir of L -> t;  R -> reverse t
           reversedir
           return (concat $ zipWith (++) words (t' ++ [""]))

Basic formatting: lifted IO

"Lifting" is an operation that transforms a non-monadic value or function into a monadic value or function. In this case, that means raising an IO operation (already a monadic value) into the (StateT FormatState IO a) monad.

About lifting in this context, "All About Monads" says,

When using combined monads created by the monad transformers, we avoid having to explicitly manage the inner monad types, resulting in clearer, simpler code. Instead of creating additional do-blocks within the computation to manipulate values in the inner monad type, we can use lifting operations to bring functions from the inner monad into the combined monad.

(fputStrLn uses an explicit s argument to putStrLn, while fputChar and fputStr use the pointfree style, for no apparent reason.)

fputStrLn :: String -> FormatStep ()
fputStrLn s = liftIO (putStrLn s)

fputChar :: Char -> FormatStep ()
fputChar = liftIO . putChar

fputStr :: String -> FormatStep ()
fputStr = liftIO . putStr

putblanks :: Int -> FormatStep ()
putblanks i = mapM_ (\ _ -> fputChar ' ') [ 1 .. i ]

skiplines :: Int -> FormatStep ()
skiplines i = mapM_ (\ _ -> fputChar '\n') [ 1 .. i ]

Now to the meat of the formatter. write prints one line from the output text. It is also responsible (normally) for writing the page headers and footers, indenting the line, and handling line spacing. To do this, it calls a small stack of other functions:

  • write
    • puthead
      • skiplines
        • fputChar
      • puttl
        • fputChar
        • fputStr
    • putblanks
      • fputChar
    • skiplines
      • fputChar
    • putfoot
      • skiplines
        • fputChar
      • puttl
        • fputChar
        • fputStr

The major decisions here are:

  • To print the page header when the current line number is <= 0 or > the bottom of the page.
  • To indent the line by the sum of the persistent indentation and any temporary indentation.
  • To put extra space between the line and the next line by the minimum of the line spacing (minus 1) and the bottom of the page minus the current line number (to avoid extra spaces at the end of the page).
  • To print the page footer when the line number is greater than the bottom of the page.
write :: String -> FormatStep ()
write str = do
            -- Possible header
            (lno,bot) <- getfstate2 lineno bottom
            when (lno <= 0 || lno > bot) puthead
            -- Indentation
            (ind,ti) <- getfstate2 inval tival
            putblanks (ind + ti)
            puttival 0
            -- Line contents
            fputStrLn str
            -- Line spacing
            (lsval, lno, bot) <- getfstate3 lsval lineno bottom
            skiplines (min (lsval - 1) (bot - lno))
            putlineno (lno + lsval)
            -- Possible footer
            (lno,bot) <- getfstate2 lineno bottom
            when (lno > bot) putfoot

Printing a either header or footer is a relatively simple process that is handled by puttl---the complication is that a '#' character should be replaced with the current page number. The original version of puttl that I wrote looked something like:

-- puttl str = do
--             cp <- getfstate curpage
--             format tl cp
--   where
--   ...

(with format defined similar to below), which required the caller to execute something like:

-- hdr <- getfstate header
-- puttl hdr

That did not seem very elegant to me. For one thing, every caller had to recover the header or footer from the state itself, even though puttl was a FormatStep. Also, puttl itself was two lines longer than I thought it needed to be. Finally, and perhaps most importantly, the callers of puttl never modified header or footer after recovering them from the state and before passing them to puttl. What I wanted to write was:

-- puttl header
--
-- puttl field = format (getfstate field) (getfstate curpage)
--    where
--    ...

It seemed possible to get there by lifting puttl, so I rewrote it to have a type:

-- puttl' :: String -> Int -> FormatStep ()

and then applied liftM2 (a generic monad lifting operator for 2-argument functions). The result of (liftM2 puttl') would have the type:

-- (Monad m) => m String -> m Int -> m (FormatStep ())

while the result of (liftM2 puttl' (getfstate header) (getfstate curpage)) would be:

-- FormatStep (FormatStep ())

which is a bit odd looking. Conveniently, there is a "join" monad operator:

-- join :: Monad m => m (m a) -> m a

which removes one level of monadic structure, exactly what needs to be done. Applying it to the lifted and specialized puttl', and parameterizing the resulting function over the (FormatState -> String) function (header or footer, provided by the FormatState record) produces the current definition of puttl, as a simplified FormatStep accepting a state element accessor:

-- puttl header
puttl :: (FormatState -> String) -> FormatStep ()
puttl tlf = join $ (liftM2 puttl') (getfstate tlf) (getfstate curpage)
    where
    puttl' :: String -> Int -> FormatStep ()
    puttl' tl pg = format tl
        where
        format []      = return ()
        format ('#':s) = do { fputStr (show pg); format s }
        format (c:s)   = do { fputChar c;        format s }

In addition to puttl, there was is one other function that frequently takes a state element unchanged as an argument: skiplines. Unfortunately, unlike puttl, it is also called with values from the state that have been modified (by addition, for example). I am providing two versions of it: the simple one above, and a lifted version that accepts a FormatStep Int function:

skiplinesM :: FormatStep Int -> FormatStep ()
skiplinesM = join . (liftM skiplines)

puthead demonstrates all of puttl, skiplines and skiplinesM while it advances the page number, prints a header, and fills out the page top and header/text margins. The function puthead' prints the top-of-page margin (using skiplines with the value of m1val modified by subtracting 1) and the header. The rest of puthead uses skiplinesM to advance by the value of m2val, and finally advances the current line number.

Most, if not all, of these multi-step monadic functions could be re-written functionally, using lifted lower-level functions; given the following three definitions:

-- getm1val :: FormatStep Int
-- getm1val = getfstate m1val
--
-- mminus :: (Monad m, Num a) => m a -> m a -> m a
-- mminus = liftM2 (-)
--
-- ival :: (Monad m, Num a) => a -> m a
-- ival i = return i

the call to skiplines in puthead' could be done by:

-- skiplinesM (getm1val `mminus` (ival 1))

This approach is a nice exercise, but is not terribly useful (in this case; see The Haskell School of Expression for longer, continuing examples that make more sense) because it would be hard to know when to stop creating lifted helpers; puthead' below would require a lifted when as well. The result might well be longer and more complicated than the existing, somewhat skewed, code.

puthead :: FormatStep ()
puthead = do
          newpage <- getfstate newpage
          putcurpage newpage
          putnewpage (newpage + 1)
          puthead'
          skiplinesM getm2val
          (m1,m2) <- getfstate2 m1val m2val
          putlineno (m1 + m2 + 1)
    where
    puthead' = do
               m1val <- getfstate m1val
               when (m1val > 0) (do { skiplines (m1val - 1); puttl header })

putfoot is almost the inverse of puthead, slightly simplified by not fooling with the page number.

putfoot :: FormatStep ()
putfoot = do
          skiplinesM getm3val
          m4val <- getm4val
          when (m4val > 0) putfoot'
    where
    putfoot' = do
               puttl footer
               m4val <- getm4val
               skiplines (m4val - 1)

space is used to implement the .sp command; it advances down the page, potentially handing the bottom of page footer. The minor complexity ensures that the page does not get too long.

The .sp command also terminates a paragraph, showing the first call to parbreak.

space :: Int -> FormatStep ()
space i = do
          parbreak
          (lno,bot) <- getfstate2 lineno bottom
          when (lno <= bot) space'
    where
    space' = do
             lno <- getfstate lineno
             when (lno <= 0) puthead
             (lno,bot) <- getfstate2 lineno bottom
             skiplines (min i (bot + 1 - lno))
             let lno' = lno + i
             putlineno lno'
             when (lno' > bot) putfoot

page is used when flushing the final part of the output after the end of the input, and by the .bp command. It is very similar to space, but more unconditional.

page :: FormatStep ()
page = do
       parbreak
       (lno,bot) <- getfstate2 lineno bottom
       when ((lno > 0) && (lno <= bot))
                (do { skiplines (bot + 1 - lno); putfoot })
       putlineno 0

parbreak is used in space and page above and several places below. It writes the current contents of the output buffer (using the short, one-space-between-words representation) on a line and clears the output buffer. This forms the end of a paragraph in the text.

The guard on the write means that multiple sequential paragraph breaks do nothing.

parbreak :: FormatStep ()
parbreak = do
           outbuf <- getoutput
           when ((length outbuf) > 0) (write outbuf)
           clearoutput

Back in the main loop, there were three important functions: command, flushoutput, and text. These functions result in IO actions; the first and last in IO actions returning a FormatState, and the second returning () (because at that point, the state no longer matters). Perhaps the most important of these, at least in terms of producing output, is text, which accepts a FormatState and a String representing the current input line. text is responsible for the IO action which prints the body of formatted text.

text itself is primarily concerned with translating between the IO monad of the mainloop and the FormatSteps of the internal functions; it uses runStateT to invoke a primary helper, text'. It discards the first element of the result, which is a unit returned by text' and passes back the resulting FormatState.

Functions assisting text are:

  • text' is the FormatStep () that handles a line of text. Its three primary decisions are whether the input line is empty or indented, both of which represent a paragraph break, whether the line should be centered, or whether the line should be filled or written as-is.
  • leadbl handles the case of a paragraph break, primarily by parbreak and checking for leading spaces. leading spaces are translated to a temporary indent used for the first line of the next paragraph. This passes along part of the existing formatting from the input.
  • center centers the line of text, using a temporary indent. The .ce center operations provides a number of lines to act on; center counts those down.
  • putwords handles filling text, one word at a time (write handles unfilled text directly). putword takes each word and does one of two things:
    1. It examines the current output buffer and decides that the new word plus the output buffer would be too wide for a line, in which case it formats and prints the output buffer using spread, then clears the buffer and finally adds the new word to it.
    2. Otherwise, it simply adds the new word to the output buffer.

The calling tree for this function looks like:

  • mainloop
    • ...
      • text
        • text'
          • leadbl
            • parbreak
            • countblank
          • center
            • write
          • write
          • putwords
            • putword
              • dooutline
                • width
                • outline
                  • spread
                  • write
                  • clearoutput
              • addoutput
text :: FormatState -> String -> IO FormatState
text fs str = (liftM snd) $ runStateT text' fs
    where
    text' :: FormatStep ()
    text' = do
            str' <- leadbl str
            (cv,fl) <- getfstate2 ceval fill
            if cv > 0                          then center str'
               else if (null str') || (not fl) then write str'
                    else                            putwords str'

    leadbl :: String -> FormatStep String
    leadbl str | null str = do
                            parbreak
                            return str
    leadbl str | (head str) == ' ' = do
                                     parbreak
                                     let (i,str') = countblank str 0
                                     unless (null str')
                                                (do
                                                 tival <- gettival
                                                 puttival (tival + i))
                                     return str'
               | otherwise = return str

    countblank [] i = (i,[])
    countblank (' ':s) i = countblank s $! i+1
    countblank s@(_:_) i = (i,s)

    center :: String -> FormatStep ()
    center str = do
                 (rm,ti) <- getfstate2 rmval tival
                 puttival (max ((rm + ti - (length str)) `div` 2) 0)
                 write str
                 ceval <- getceval
                 putceval (ceval - 1)

    putwords str = mapM_ putword $ words str

    putword :: String -> FormatStep ()
    putword word = do
                   outbuf <- getoutbuf
                   unless (null outbuf) dooutline
                   addoutput word
        where
        dooutline = do
                    (rm,ind,ti) <- getfstate3 rmval inval tival
                    outbuf <- getoutput
                    let w = (length word) + (length outbuf) + 1
                        llval = rm - ti - ind
                    when (w > llval) (outline llval)

        outline width = do
                        outbuf <- spread width
                        when ((length outbuf) > 0) (write outbuf)
                        clearoutput

Flushing the final output is simple, since the operation is already handled by page. flushoutput, like text, uses runStateT to invoke the page FormatStep, but it returns the first element of the result, the return value of page. This value is (), conveniently matching the desired IO return of flushoutput.

flushoutput :: FormatState -> IO ()
flushoutput fs = (liftM fst) $ runStateT page fs

The final function of the trio is command. This function uses commands, a list of pairs mapping the string "dot-command" to a short String -> FormatStep function which performs the command. Most of the command simply update some element of the state, although many do break paragraphs and a couple perform some other output.

commands :: [ (String, String -> FormatStep ()) ]
commands = [
            ("fi", fi),
            ("nf", nf),
            ("br", br),
            ("ls", ls),
            ("bp", bp),
            ("sp", sp),
            ("in", ind),
            ("rm", rm),
            ("ti", ti),
            ("ce", ce),
            ("he", he),
            ("fo", fo),
            ("pl", pl)
           ]

fi _ = do
       parbreak
       putfill True

nf _ = do
       parbreak
       putfill False

br _ = parbreak

ls s = do
       lsval <- getfstate lsval
       putlsval (setparam (getval s) lsval 1 1 huge)

ce s = do
       parbreak
       ceval <- getceval
       putceval (setparam (getval s) ceval 1 0 huge)

he s = putheader (gettl s)

fo s = putfooter (gettl s)

bp s = do
       page
       curpage <- getfstate curpage
       let i = setparam (getval s) curpage (curpage + 1) (-huge) huge
       putcurpage i
       putnewpage i

sp s = do
       spval <- getfstate spval
       let i = setparam (getval s) spval 1 0 huge
       space i
       putspval i

ind s = do
        (ind,rm) <- getfstate2 inval rmval
        putinval (setparam (getval s) ind 0 0 (rm - 1))

rm s = do
       (ind,ti,rm) <- getfstate3 inval tival rmval
       putrmval (setparam (getval s) rm pagewidth (ind + ti + 1) huge)

ti s = do
       parbreak
       (ti,rm) <- getfstate2 tival rmval
       puttival (setparam (getval s) ti 0 (-huge) rm)

pl s = do
       (m1,m2,m3,m4,pl) <- getfstate5 m1val m2val m3val m4val plval
       let m = m1 + m2 + m3 + m4 + 1
           i = setparam (getval s) pl pagelen m huge
       putplval i
       putbottom (i - m3 - m4)

command's primary concern is to get into the FormatStep using runStateT, and to get back out into the IO monad. The str argument is the incoming line, which is known to be a dot command.

command' meanders down the commands list, testing each string (for whether it is a prefix of the incoming line---this allows the possibility of longer commands, like .fill or .header).

command'' does the actual testing, as well as invoking runStateT. It returns a Maybe FormatState; Nothing indicates the command did not apply. It would be the responsibility of command' to handle an unrecognized command. Like the original, this code simply ignores the offending line.

command :: FormatState -> String -> IO FormatState
command fs str = doCommand $ find (\c -> fst c `isPrefixOf` str) commands
    where
    doCommand (Just (_,cmd)) = liftM snd $ runStateT (cmd str) fs
    doCommand Nothing        = return fs

gloria i ad inferni
faciamus opus

Return to Top | About this site...
Last edited Sat Aug 8 03:29:10 2009.
Copyright © 2005-2012 Tommy M. McGuire