CHAPTER SEVEN
Formatting text
One of the first uses for Unix was as a text processing system, producing well-formatted documents from text input. The tools used to produce that output eventually evolved to the familiar n- and troff (and related) programs.
As an example of this form of text processing the following output was produced by the input file which follows:
PROGRAM
format - produce formatted output
USAGE
format
FUNCTION
format reads its input a line at a time and writes a neatly
formatted version of the input text to the output, with page
headers and footers and with output lines filled to a uniform
right margin. Input text lines may have interspersed among them
command lines that alter this default mode of formatting. A
command line consists of a leading period, followed by a two
letter code, possibly with optional arguments following the
first sequence of blanks and tabs.
Certain commands cause a "break" in the processing of input
text lines, i.e., any partially filled line is output and a new
line is begun. In the following command summary, the letter n
stands for an optional numeric argument. If a numeric argument
is preceded by a + or a -, the current value is changed by this
amount; otherwise the argument represents the new value. If no
argument is given, the default value is used.
command break? default function
.bp n yes n=+1 begin page numbered n
.br yes cause break
.ce n yes n=1 center next n lines
.fi start filling
.fo str no empty footer title
.he str no empty header title
.in n no n=0 indent n spaces
.ls n no n=1 line spacing is n
.nf yes stop filling
.pl n no n=66 set page length to n
.rm n no n=60 set right margin to n
.sp n yes n=1 go down n lines or to end of page
.ti n yes n=0 temporary indent of n
.ul n no n=1 underline words from next n lines
A blank input line causes a break and is passed to the output
unchanged. Similarly, an input line that begins with blanks
causes a break and is written to the output with the leading
blanks preserved. Thus a document formatted in the conventional
manner by hand will retain its original paragraph breaks and
indentation.
Page 1 15 Sep 2007
The input that produced it is:
.he format.hs .fo Page # 15 Sep 2007 .rm 65 .ce 1 PROGRAM .ti 2 format - produce formatted output USAGE .ti 2 format FUNCTION .in 2 format reads its input a line at a time and writes a neatly formatted version of the input text to the output, with page headers and footers and with output lines filled to a uniform right margin. Input text lines may have interspersed among them command lines that alter this default mode of formatting. A command line consists of a leading period, followed by a two letter code, possibly with optional arguments following the first sequence of blanks and tabs. .br .ti 3 Certain commands cause a "break" in the processing of input text lines, i.e., any partially filled line is output and a new line is begun. In the following command summary, the letter n stands for an optional numeric argument. If a numeric argument is preceded by a + or a -, the current value is changed by this amount; otherwise the argument represents the new value. If no argument is given, the default value is used. .sp 2 .nf .in +1 command break? default function .bp n yes n=+1 begin page numbered n .br yes cause break .ce n yes n=1 center next n lines .fi start filling .fo str no empty footer title .he str no empty header title .in n no n=0 indent n spaces .ls n no n=1 line spacing is n .nf yes stop filling .pl n no n=66 set page length to n .rm n no n=60 set right margin to n .sp n yes n=1 go down n lines or to end of page .ti n yes n=0 temporary indent of n .ul n no n=1 underline words from next n lines .in -1 .fi .sp 2 A blank input line causes a break and is passed to the output unchanged. Similarly, an input line that begins with blanks causes a break and is written to the output with the leading blanks preserved. Thus a document formatted in the conventional manner by hand will retain its original paragraph breaks and indentation.
Note: underlining is not implemented by this code. I have no idea whether any terminal I have handy would correctly handle backspaces.
Formatting implementation
K&P's implementation begins with the main loop and command parser, so I have begun with a skeletal loop and handlers. I initially sketched the code, relying on Haskell's undefined function and the type declarations.
One thing I know that will be necessary is FormatState, a blob of data similar to EditState from edit. It encapsulates the entire state necessary to format the output.
-- command :: FormatState -> String -> IO () -- command = undefined -- -- text :: FormatState -> String -> IO () -- text = undefined -- -- flushoutput :: FormatState -> IO () -- flushoutput = undefined
The implementations of the functions were filled out below as the implementation proceeded.
This code is similar to the ubiquitous copy, with the exception that it partially buffers output (requiring flushoutput) and that it incorporates commands, which are specified as lines beginning with a period.
mainloop :: FormatState -> String -> IO FormatState
mainloop fs ('.':l) = command fs l
mainloop fs l = text fs l
main = do
text <- getContents
formatstate' <- foldM mainloop formatstate $ lines text
There are a small number of easily identifiable pure functions in the original, used as supporting infrastructure for the major heavy lifting of formatting.
isSpace, skipblank, and skiptoblank are used to pick apart strings, finding the first non-whitespace and first whitespace in a string.
gettl finds the first non-blank following a command such as .fo or .he and returns it as a title or footer. A single or double quote can be used to include spaces in the title, in which case the title is everything following the quote, including the spaces:
.fo ' a spaced footer
getval and setparam work together to manipulate the numeric parameters for commands such as .bp. getval returns a Value selector, including the numeric argument if one is present; setparam uses this Value selector and the current and default values for the parameter, along with lower and upper bounds (and minmax), to return the appropriate new value.
isSpace c = c == ' ' || c == '\t'
skipblank = dropWhile isSpace
skiptoblank = dropWhile (not . isSpace)
gettl :: String -> String
gettl s = case skipblank (skiptoblank s) of [] -> ""
('"':s') -> s'
('\'':s') -> s'
s' -> s'
data Value = Absolute Int
| Plus Int
| Minus Int
| Default
getval :: String -> Value
getval s = case skipblank (skiptoblank s) of [] -> Default
('+':s') -> Plus (fst (ctoi s'))
('-':s') -> Minus (fst (ctoi s'))
s' -> Absolute (fst (ctoi s'))
setparam (Default) cur def l u = minmax def l u
setparam (Absolute i) cur def l u = minmax i l u
setparam (Plus i) cur def l u = minmax (cur+i) l u
setparam (Minus i) cur def l u = minmax (cur-i) l u
minmax v l u = max l (min u v)
Constants
pagelen = 66 pagewidth = 60 huge = 10000
K&P say,
The majority of the parameters are kept in global variables in the main routine, since they are needed throughout the program and there are far too many to pass around as arguments.
Fortunately or unfortunately, I don't have that option. Instead, I get to create a massive FormatState record, with elements containing each of the individual values necessary for formatting.
The first group of elements are parameters used for formatting, such as whether or not filling should be done, the line spacing, and so forth.
(An odd thing: spval. It is only used by the .sp command, but seems to be persistent. So, repeated uses of .sp +1 would generate an ever-increasing number of blank lines? Why?)
The second group provides state information for the page and line numbers.
The final group provides buffer storage for output filling. Side and dir are part of this mechanism; for their use, see the function spread below.
data Side = L | R deriving Show
data FormatState = FS {
-- Formatting parameters
fill :: Bool, -- fill if true; initially true
lsval :: Int, -- line spaceing; initially 1
spval :: Int, -- # of lines to space
inval :: Int, -- current indent; >= 0; initially 0
rmval :: Int, -- right margin; initially pagewidth
tival :: Int, -- current temporary indent; initially 0
ceval :: Int, -- no of lines to center; initially 0
plval :: Int, -- page length in lines
m1val :: Int, -- margin before & including header
m2val :: Int, -- margin after header
m3val :: Int, -- margin after last text line
m4val :: Int, -- bottom margin with footer
bottom :: Int, -- last line on page
header :: String, -- top of page title
footer :: String, -- bottom of page title
-- State information
curpage :: Int, -- current output page number
newpage :: Int, -- next page number
lineno :: Int, -- next line number
-- Output area
outbuf :: [String], -- lines to be filled collect here
dir :: Side -- side for blank padding
}
deriving Show
Along with the FormatState record definition comes a definition of the initial state used in main.
formatstate = FS {
-- Formatting parameters
fill = True,
lsval = 1,
spval = 0,
inval = 0,
rmval = pagewidth,
tival = 0,
ceval = 0,
-- State information
curpage = 0,
newpage = 1,
lineno = 0,
plval = pagelen,
m1val = 3,
m2val = 2,
m3val = 2,
m4val = 3,
bottom = pagelen - 2 - 3, -- plval - m3val - m4val
header = "\n",
footer = "\n",
-- Output area
outbuf = [],
dir = R
}
The first problem facing format is how to thread that massive FormatState through the IO operations involved in writing output. In edit, which faced a similar probelm, the input and output behavior formed a consistent alternation that made combining the two manually tractible, more or less (see my comments about duplicated returns). edit read a command, performed internal state transitions, then handed the state off to IO code to either write output or to modify the state further (when reading new lines, for example).
format does not have that simple alternation. Consider filling text:
- format reads a line of words, combines those words with previously read words, determines that enough has been read to fill a line, then
- writes the filled line, and then
- records in its state any remaining words that go onto the next line.
Meanwhile, writing the line involves
- checking the state to determine whether printing a page header is needed,
- indentation, also potentially updating the state,
- actually writing the line including line spacing, and
- potentially printing the page footer.
Further, at the end of a paragraph (or after seeing some of the dot-commands), format needs to flush any remaining words, forming the end of a paragraph.
This complex interaction would seem to make it difficult to use both the State monad to futz with internal state changes and the IO monad to handle external state changes, since that means combining actions in the two monads in complex ways. The other option would be to fall back into only the IO monad, and using functional code to handle the byzantine state changes.
Fortunately, Haskell has a way of combining the features of two (or more) monads into one computation: monad transformers.
Monad transformers are special variants of standard monads [such as State] that facilitate the combining of monads.
(From "All About Monads", http://www.haskell.org/all_about_monads/html/ which has more information.)
The transformer version of the State monad is StateT, which accepts an extra parameter for a contained monad, producing the combined monad. (StateT s m a) is the type of values of the combined monad where State is the base monad, s is the state, m is the inner monad, and a is the type returned from the computation. For format, s will be FormatState, and m (the inner monad) will be IO, producing the (FormatStep a) type for a computation in the combined monad producing a value of type a.
type FormatStep a = StateT FormatState IO a
Get operations
As with edit, one of the primary things needed for format is a collection of functions to access the contents of the FormatState from the monadic computation. getfs returns the overall FormatState, gettival returns element tival, and so on.
In order to generalize the state element accessors, I defined getfstate, getfstate2, etc., which allow access of one or more elements in one operation, by element name (actually, by the (FormatState -> v) function created by the record definition). This reduces the amount of code required to access several state elements within the operations below.
getfs :: FormatStep FormatState
getfs = get
getfstate f = do { fs <- getfs; return $ f fs }
getfstate2 f g = do { fs <- getfs; return (f fs, g fs) }
getfstate3 f g h = do { fs <- getfs; return (f fs, g fs, h fs) }
getfstate4 f g h i = do { fs <- getfs; return (f fs, g fs, h fs, i fs) }
getfstate5 f g h i j = do { fs <- getfs; return (f fs, g fs, h fs, i fs, j fs) }
gettival, getceval, getm2val, getm3val, getm4val :: FormatStep Int
gettival = getfstate tival
getceval = getfstate ceval
getm2val = getfstate m2val
getm3val = getfstate m3val
getm4val = getfstate m4val
Put operations
A further collection of functions is used to update the state from within the monad. putfs accepts a FormatState and inserts it into the monadic state, while putfill sets the fill/don't fill flag in the current state.
Unfortunately, there does not seem to be a technique allowing as neat a trick as the getfstate functions.
putfs :: FormatState -> FormatStep ()
putfs = put
putfill :: Bool -> FormatStep ()
putfill v = modify (\s -> s { fill = v })
putlsval, putspval, putinval, putrmval, puttival, putceval,
putcurpage, putnewpage, putlineno, putplval, putm1val, putm2val,
putm3val, putm4val, putbottom :: Int -> FormatStep ()
putlsval v = modify (\s -> s { lsval = v })
putspval v = modify (\s -> s { spval = v })
putinval v = modify (\s -> s { inval = v })
putrmval v = modify (\s -> s { rmval = v })
puttival v = modify (\s -> s { tival = v })
putceval v = modify (\s -> s { ceval = v })
putcurpage v = modify (\s -> s { curpage = v })
putnewpage v = modify (\s -> s { newpage = v })
putlineno v = modify (\s -> s { lineno = v })
putplval v = modify (\s -> s { plval = v })
putm1val v = modify (\s -> s { m1val = v })
putm2val v = modify (\s -> s { m2val = v })
putm3val v = modify (\s -> s { m3val = v })
putm4val v = modify (\s -> s { m4val = v })
putbottom v = modify (\s -> s { bottom = v })
putheader, putfooter :: String -> FormatStep ()
putheader v = modify (\s -> s { header = v })
putfooter v = modify (\s -> s { footer = v })
While these functions are generally fairly simple, it is useful to have more complex and less general operators for some state manipulations. reversedir flips the Side value of dir. (See spread below for more about what dir does.)
reversedir :: FormatStep ()
reversedir = do { fs <- get; put fs { dir = swap (dir fs) } }
where
swap d = case d of L -> R; R -> L
I briefly mentioned the words manipulated while filling text. The current list of words awaiting output are stored in the outbuf element of FormatState. Two primary operations on outbuf are to clear it, replacing it with an empty list, and to add a word to it. (The words are stored in reverse order, to simplify stuffing an new one in.) The basic method of recovering the list of words from the state is getoutbuf, which reverses the list before returning it to the FormatStep.
However, there are two more complex operations used to access the outbuf state: getoutput and spread.
- getoutput returns the buffered words as a string using the minimal width possible, with each word separated by a single space.
- spread (below) fills the words to a given width.
clearoutput :: FormatStep ()
clearoutput = modify (\st -> st { outbuf = [] })
addoutput :: String -> FormatStep ()
addoutput word = modify (\st -> st { outbuf = (word : (outbuf st)) })
getoutbuf :: FormatStep [String]
getoutbuf = do { fs <- get; return $ reverse (outbuf fs) }
getoutput :: FormatStep String
getoutput = do { outbuf <- getoutbuf; return (concat $ intersperse " " outbuf) }
spread, according to K&P,
...is tricky, (which is not a compliment), but it performs an elaborate function and performs it correctly.
(I don't know about correctly, here....) This is not K&P's spread, but it does perform the same elaborate function. Where getoutput returns the waiting-to-be output words as a minimal length string, spread returns the same words as a string of length w. To do this, it computes:
- base, the length of the words,
- holes, the number of spaces between words,
- extra, the number of extra spaces needed to fill out width w, in addition to the words with the minimal one space between them,
- spc, the base number of spaces between words (>= 1),
- spr, a list representing the "spreading" of the extra spaces (a prefix of 1's followed by 0's, where a 1 indicates that an extra space should be included in the matching hole), and
- t and t', where t is a list of strings made up of the appropriate number of spaces for each hole and t' is t possibly reversed (based on dir).
dir (and t') are used to alternate between adding the extra spaces to the left of the column of text and to the right, avoiding "rivers" of whitespace.
The resulting string is of length w, made up of a word and a sequence of spaces and words ending with a word. The spaces are arranged so as to be as evenly divided as possible.
spread :: Int -> FormatStep String
spread w = do
dir <- getfstate dir
words <- getoutbuf
let base = sum $ map (\ w -> length w) words
holes = (length words) - 1
extra = w - base - holes
spc = 1 + (extra `div` holes)
spr = (replicate (extra `mod` holes) 1) ++ (repeat 0)
t = take holes [ replicate (spc+y) ' ' | y <- spr ]
t' = case dir of L -> t; R -> reverse t
reversedir
return (concat $ zipWith (++) words (t' ++ [""]))
Basic formatting: lifted IO
"Lifting" is an operation that transforms a non-monadic value or function into a monadic value or function. In this case, that means raising an IO operation (already a monadic value) into the (StateT FormatState IO a) monad.
About lifting in this context, "All About Monads" says,
When using combined monads created by the monad transformers, we avoid having to explicitly manage the inner monad types, resulting in clearer, simpler code. Instead of creating additional do-blocks within the computation to manipulate values in the inner monad type, we can use lifting operations to bring functions from the inner monad into the combined monad.
(fputStrLn uses an explicit s argument to putStrLn, while fputChar and fputStr use the pointfree style, for no apparent reason.)
fputStrLn :: String -> FormatStep () fputStrLn s = liftIO (putStrLn s) fputChar :: Char -> FormatStep () fputChar = liftIO . putChar fputStr :: String -> FormatStep () fputStr = liftIO . putStr putblanks :: Int -> FormatStep () putblanks i = mapM_ (\ _ -> fputChar ' ') [ 1 .. i ] skiplines :: Int -> FormatStep () skiplines i = mapM_ (\ _ -> fputChar '\n') [ 1 .. i ]
Now to the meat of the formatter. write prints one line from the output text. It is also responsible (normally) for writing the page headers and footers, indenting the line, and handling line spacing. To do this, it calls a small stack of other functions:
- write
- puthead
- skiplines
- fputChar
- puttl
- fputChar
- fputStr
- skiplines
- putblanks
- fputChar
- skiplines
- fputChar
- putfoot
- skiplines
- fputChar
- puttl
- fputChar
- fputStr
- skiplines
- puthead
The major decisions here are:
- To print the page header when the current line number is <= 0 or > the bottom of the page.
- To indent the line by the sum of the persistent indentation and any temporary indentation.
- To put extra space between the line and the next line by the minimum of the line spacing (minus 1) and the bottom of the page minus the current line number (to avoid extra spaces at the end of the page).
- To print the page footer when the line number is greater than the bottom of the page.
write :: String -> FormatStep ()
write str = do
-- Possible header
(lno,bot) <- getfstate2 lineno bottom
when (lno <= 0 || lno > bot) puthead
-- Indentation
(ind,ti) <- getfstate2 inval tival
putblanks (ind + ti)
puttival 0
-- Line contents
fputStrLn str
-- Line spacing
(lsval, lno, bot) <- getfstate3 lsval lineno bottom
skiplines (min (lsval - 1) (bot - lno))
putlineno (lno + lsval)
-- Possible footer
(lno,bot) <- getfstate2 lineno bottom
when (lno > bot) putfoot
Printing a either header or footer is a relatively simple process that is handled by puttl---the complication is that a '#' character should be replaced with the current page number. The original version of puttl that I wrote looked something like:
-- puttl str = do -- cp <- getfstate curpage -- format tl cp -- where -- ...
(with format defined similar to below), which required the caller to execute something like:
-- hdr <- getfstate header -- puttl hdr
That did not seem very elegant to me. For one thing, every caller had to recover the header or footer from the state itself, even though puttl was a FormatStep. Also, puttl itself was two lines longer than I thought it needed to be. Finally, and perhaps most importantly, the callers of puttl never modified header or footer after recovering them from the state and before passing them to puttl. What I wanted to write was:
-- puttl header -- -- puttl field = format (getfstate field) (getfstate curpage) -- where -- ...
It seemed possible to get there by lifting puttl, so I rewrote it to have a type:
-- puttl' :: String -> Int -> FormatStep ()
and then applied liftM2 (a generic monad lifting operator for 2-argument functions). The result of (liftM2 puttl') would have the type:
-- (Monad m) => m String -> m Int -> m (FormatStep ())
while the result of (liftM2 puttl' (getfstate header) (getfstate curpage)) would be:
-- FormatStep (FormatStep ())
which is a bit odd looking. Conveniently, there is a "join" monad operator:
-- join :: Monad m => m (m a) -> m a
which removes one level of monadic structure, exactly what needs to be done. Applying it to the lifted and specialized puttl', and parameterizing the resulting function over the (FormatState -> String) function (header or footer, provided by the FormatState record) produces the current definition of puttl, as a simplified FormatStep accepting a state element accessor:
-- puttl header
puttl :: (FormatState -> String) -> FormatStep ()
puttl tlf = join $ (liftM2 puttl') (getfstate tlf) (getfstate curpage)
where
puttl' :: String -> Int -> FormatStep ()
puttl' tl pg = format tl
where
format [] = return ()
format ('#':s) = do { fputStr (show pg); format s }
format (c:s) = do { fputChar c; format s }
In addition to puttl, there was is one other function that frequently takes a state element unchanged as an argument: skiplines. Unfortunately, unlike puttl, it is also called with values from the state that have been modified (by addition, for example). I am providing two versions of it: the simple one above, and a lifted version that accepts a FormatStep Int function:
skiplinesM :: FormatStep Int -> FormatStep () skiplinesM = join . (liftM skiplines)
puthead demonstrates all of puttl, skiplines and skiplinesM while it advances the page number, prints a header, and fills out the page top and header/text margins. The function puthead' prints the top-of-page margin (using skiplines with the value of m1val modified by subtracting 1) and the header. The rest of puthead uses skiplinesM to advance by the value of m2val, and finally advances the current line number.
Most, if not all, of these multi-step monadic functions could be re-written functionally, using lifted lower-level functions; given the following three definitions:
-- getm1val :: FormatStep Int -- getm1val = getfstate m1val -- -- mminus :: (Monad m, Num a) => m a -> m a -> m a -- mminus = liftM2 (-) -- -- ival :: (Monad m, Num a) => a -> m a -- ival i = return i
the call to skiplines in puthead' could be done by:
-- skiplinesM (getm1val `mminus` (ival 1))
This approach is a nice exercise, but is not terribly useful (in this case; see The Haskell School of Expression for longer, continuing examples that make more sense) because it would be hard to know when to stop creating lifted helpers; puthead' below would require a lifted when as well. The result might well be longer and more complicated than the existing, somewhat skewed, code.
puthead :: FormatStep ()
puthead = do
newpage <- getfstate newpage
putcurpage newpage
putnewpage (newpage + 1)
puthead'
skiplinesM getm2val
(m1,m2) <- getfstate2 m1val m2val
putlineno (m1 + m2 + 1)
where
puthead' = do
m1val <- getfstate m1val
when (m1val > 0) (do { skiplines (m1val - 1); puttl header })
putfoot is almost the inverse of puthead, slightly simplified by not fooling with the page number.
putfoot :: FormatStep ()
putfoot = do
skiplinesM getm3val
m4val <- getm4val
when (m4val > 0) putfoot'
where
putfoot' = do
puttl footer
m4val <- getm4val
skiplines (m4val - 1)
space is used to implement the .sp command; it advances down the page, potentially handing the bottom of page footer. The minor complexity ensures that the page does not get too long.
The .sp command also terminates a paragraph, showing the first call to parbreak.
space :: Int -> FormatStep ()
space i = do
parbreak
(lno,bot) <- getfstate2 lineno bottom
when (lno <= bot) space'
where
space' = do
lno <- getfstate lineno
when (lno <= 0) puthead
(lno,bot) <- getfstate2 lineno bottom
skiplines (min i (bot + 1 - lno))
let lno' = lno + i
putlineno lno'
when (lno' > bot) putfoot
page is used when flushing the final part of the output after the end of the input, and by the .bp command. It is very similar to space, but more unconditional.
page :: FormatStep ()
page = do
parbreak
(lno,bot) <- getfstate2 lineno bottom
when ((lno > 0) && (lno <= bot))
(do { skiplines (bot + 1 - lno); putfoot })
putlineno 0
parbreak is used in space and page above and several places below. It writes the current contents of the output buffer (using the short, one-space-between-words representation) on a line and clears the output buffer. This forms the end of a paragraph in the text.
The guard on the write means that multiple sequential paragraph breaks do nothing.
parbreak :: FormatStep ()
parbreak = do
outbuf <- getoutput
when ((length outbuf) > 0) (write outbuf)
clearoutput
Back in the main loop, there were three important functions: command, flushoutput, and text. These functions result in IO actions; the first and last in IO actions returning a FormatState, and the second returning () (because at that point, the state no longer matters). Perhaps the most important of these, at least in terms of producing output, is text, which accepts a FormatState and a String representing the current input line. text is responsible for the IO action which prints the body of formatted text.
text itself is primarily concerned with translating between the IO monad of the mainloop and the FormatSteps of the internal functions; it uses runStateT to invoke a primary helper, text'. It discards the first element of the result, which is a unit returned by text' and passes back the resulting FormatState.
Functions assisting text are:
- text' is the FormatStep () that handles a line of text. Its three primary decisions are whether the input line is empty or indented, both of which represent a paragraph break, whether the line should be centered, or whether the line should be filled or written as-is.
- leadbl handles the case of a paragraph break, primarily by parbreak and checking for leading spaces. leading spaces are translated to a temporary indent used for the first line of the next paragraph. This passes along part of the existing formatting from the input.
- center centers the line of text, using a temporary indent. The .ce center operations provides a number of lines to act on; center counts those down.
- putwords handles filling text, one word at a time (write handles
unfilled text directly). putword takes each word and does one of
two things:
- It examines the current output buffer and decides that the new word plus the output buffer would be too wide for a line, in which case it formats and prints the output buffer using spread, then clears the buffer and finally adds the new word to it.
- Otherwise, it simply adds the new word to the output buffer.
The calling tree for this function looks like:
- mainloop
- ...
- text
- text'
- leadbl
- parbreak
- countblank
- center
- write
- write
- putwords
- putword
- dooutline
- width
- outline
- spread
- write
- clearoutput
- addoutput
- dooutline
- putword
- leadbl
- text'
- text
- ...
text :: FormatState -> String -> IO FormatState
text fs str = (liftM snd) $ runStateT text' fs
where
text' :: FormatStep ()
text' = do
str' <- leadbl str
(cv,fl) <- getfstate2 ceval fill
if cv > 0 then center str'
else if (null str') || (not fl) then write str'
else putwords str'
leadbl :: String -> FormatStep String
leadbl str | null str = do
parbreak
return str
leadbl str | (head str) == ' ' = do
parbreak
let (i,str') = countblank str 0
unless (null str')
(do
tival <- gettival
puttival (tival + i))
return str'
| otherwise = return str
countblank [] i = (i,[])
countblank (' ':s) i = countblank s $! i+1
countblank s@(_:_) i = (i,s)
center :: String -> FormatStep ()
center str = do
(rm,ti) <- getfstate2 rmval tival
puttival (max ((rm + ti - (length str)) `div` 2) 0)
write str
ceval <- getceval
putceval (ceval - 1)
putwords str = mapM_ putword $ words str
putword :: String -> FormatStep ()
putword word = do
outbuf <- getoutbuf
unless (null outbuf) dooutline
addoutput word
where
dooutline = do
(rm,ind,ti) <- getfstate3 rmval inval tival
outbuf <- getoutput
let w = (length word) + (length outbuf) + 1
llval = rm - ti - ind
when (w > llval) (outline llval)
outline width = do
outbuf <- spread width
when ((length outbuf) > 0) (write outbuf)
clearoutput
Flushing the final output is simple, since the operation is already handled by page. flushoutput, like text, uses runStateT to invoke the page FormatStep, but it returns the first element of the result, the return value of page. This value is (), conveniently matching the desired IO return of flushoutput.
flushoutput :: FormatState -> IO () flushoutput fs = (liftM fst) $ runStateT page fs
The final function of the trio is command. This function uses commands, a list of pairs mapping the string "dot-command" to a short String -> FormatStep function which performs the command. Most of the command simply update some element of the state, although many do break paragraphs and a couple perform some other output.
commands :: [ (String, String -> FormatStep ()) ]
commands = [
("fi", fi),
("nf", nf),
("br", br),
("ls", ls),
("bp", bp),
("sp", sp),
("in", ind),
("rm", rm),
("ti", ti),
("ce", ce),
("he", he),
("fo", fo),
("pl", pl)
]
fi _ = do
parbreak
putfill True
nf _ = do
parbreak
putfill False
br _ = parbreak
ls s = do
lsval <- getfstate lsval
putlsval (setparam (getval s) lsval 1 1 huge)
ce s = do
parbreak
ceval <- getceval
putceval (setparam (getval s) ceval 1 0 huge)
he s = putheader (gettl s)
fo s = putfooter (gettl s)
bp s = do
page
curpage <- getfstate curpage
let i = setparam (getval s) curpage (curpage + 1) (-huge) huge
putcurpage i
putnewpage i
sp s = do
spval <- getfstate spval
let i = setparam (getval s) spval 1 0 huge
space i
putspval i
ind s = do
(ind,rm) <- getfstate2 inval rmval
putinval (setparam (getval s) ind 0 0 (rm - 1))
rm s = do
(ind,ti,rm) <- getfstate3 inval tival rmval
putrmval (setparam (getval s) rm pagewidth (ind + ti + 1) huge)
ti s = do
parbreak
(ti,rm) <- getfstate2 tival rmval
puttival (setparam (getval s) ti 0 (-huge) rm)
pl s = do
(m1,m2,m3,m4,pl) <- getfstate5 m1val m2val m3val m4val plval
let m = m1 + m2 + m3 + m4 + 1
i = setparam (getval s) pl pagelen m huge
putplval i
putbottom (i - m3 - m4)
command's primary concern is to get into the FormatStep using runStateT, and to get back out into the IO monad. The str argument is the incoming line, which is known to be a dot command.
command' meanders down the commands list, testing each string (for whether it is a prefix of the incoming line---this allows the possibility of longer commands, like .fill or .header).
command'' does the actual testing, as well as invoking runStateT. It returns a Maybe FormatState; Nothing indicates the command did not apply. It would be the responsibility of command' to handle an unrecognized command. Like the original, this code simply ignores the offending line.
command :: FormatState -> String -> IO FormatState
command fs str = doCommand $ find (\c -> fst c `isPrefixOf` str) commands
where
doCommand (Just (_,cmd)) = liftM snd $ runStateT (cmd str) fs
doCommand Nothing = return fs
