REVISED: Wednesday, January 24, 2024
Haskell Parsers.
I. HASKELL BASIC PARSER
Almost every program uses some form of parser to pre-process its input. A parser is normally a program that analyses a piece of text to determine its syntactic structure. A parser is also normally a function that takes a string and returns some form of tree. Our first example will be parsing a "comma separated value" (CSV) file. Each line in a CSV file is a record, and each field in the record is separated from the next by a comma. We will work from the following skeleton:
A. FILE
A CSV file contains 0 or more lines, each of which is terminated by the end-of-line character \n.
A CSV file contains 0 or more lines, each of which is terminated by the end-of-line character \n.
B. LINE
A line contains 1 or more cells, separated by a comma.
A line contains 1 or more cells, separated by a comma.
C. CELL
A cell contains 0 or more characters, which must be neither the comma nor the end-of-line character \n.
A cell contains 0 or more characters, which must be neither the comma nor the end-of-line character \n.
D. END-OF-LINE
The end-of-line character is the newline, \n.
The end-of-line character is the newline, \n.
II. HASKELL PARSEC
Prelude>
Parsing is divided into two stages. The first stage is lexical analysis, the domain of tools like Flex. The second stage is parsing itself, performed by programs such as Bison. Both lexical analysis and parsing can be performed by Parsec.
Parsec is a monadic library. Parsec defines its own parsing monad, GenParser. Parsec provides both simple parsing functions, and helper functions to tie them all together. We will use Parsec's parser library to combine small parsing functions into more sophisticated parsers.
III. HASKELL PARSEC EXAMPLE PARSER 1
"Copy Paste" the following csv1 parser example into your text editor and "File, Save As" csv1.hs to your working directory:
-- file: csv1.hs
import Text.ParserCombinators.Parsec{- A CSV file contains 0 or more lines, each of which is terminated
by the end-of-line character (eol). -}
csvFile :: GenParser Char st [[String]] -- First function is csvFile.
csvFile =
do result <- many line
eof
return result
{- Each line contains 1 or more cells, separated by a comma. -}
line :: GenParser Char st [String]
line =
do result <- cells
eol -- end of line
return result
{- Build up a list of cells. Try to parse the first cell, then figure out
what ends the cell. -}
cells :: GenParser Char st [String]
cells =
do first <- cellContent
next <- remainingCells
return (first : next)
{- The cell either ends with a comma, indicating that 1 or more cells follow,
or it doesn't, indicating that we are at the end of the cells for this line. -}
remainingCells :: GenParser Char st [String]
remainingCells =
(char ',' >> cells) -- Found comma? More cells coming
<|> (return []) -- No comma? Return [], no more cells
{- Each cell contains 0 or more characters, which must not be a comma or
EOL. -}
cellContent :: GenParser Char st String
cellContent =
many (noneOf ",\n")
{- The end of line character is \n. -}
eol :: GenParser Char st Char
eol = char '\n'
parseCSV :: String -> Either ParseError [[String]] -- Function parseCSV.
parseCSV input = parse csvFile "(unknown)" input
IV. HASKELL PARSEC PARSER FUNCTION DEMONSTRATION 1
Open GHCi as shown below:
GHCi, version 9.8.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude>
Load csv1 into GHCi as follows:
Prelude> :load csv1.hs
[1 of 1] Compiling Main ( csv1.hs, interpreted )
Ok, modules loaded: Main.
Prelude>
Call the function parseCSV without input:
Prelude> parseCSV ""
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package array-0.5.0.0 ... linking ... done.
Loading package deepseq-1.3.0.2 ... linking ... done.
Loading package bytestring-0.10.4.0 ... linking ... done.
Loading package mtl-2.1.3.1 ... linking ... done.
Loading package text-1.1.0.0 ... linking ... done.
Loading package parsec-3.1.5 ... linking ... done.
Right []
it :: Either ParseError [[String]]
Prelude>
Call the function parseCSV with input:
Prelude> parseCSV "Hello, World!\n"
Right [["Hello"," World!"]]
it :: Either ParseError [[String]]
Prelude>
V. HASKELL PARSEC EXAMPLE PARSER 2
"Copy Paste" the following csv2 example into your text editor and "File, Save As" csv2.hs to your working directory:
import Text.ParserCombinators.Parsec
csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'
parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input
VI. HASKELL PARSEC PARSER FUNCTION DEMONSTRATION 2
Load csv2 into GHCi as shown below:
Prelude> :load csv2.hs
[1 of 1] Compiling Main ( csv2.hs, interpreted )
Ok, modules loaded: Main.
Prelude>
Right []
it :: Either ParseError [[String]]
Prelude>
Prelude>
Right [["Hello"," World!"]]
it :: Either ParseError [[String]]
Prelude>
VII. HASKELL PARSEC EXAMPLE PARSER 3
"Copy Paste" the following example into your text editor and then "File, Save As" csv3.hs to your working directory:
-- file: csv3.hs
import Text.ParserCombinators.Parsec
csvFile = endBy line eol
line = sepBy cell (char ',')
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell =
do char '"'
content <- many quotedChar
char '"' <?> "quote at end of cell"
return content
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return '"')
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input
main =
do c <- getContents
case parse csvFile "(stdin)" c of
Left e -> do putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
VIII. HASKELL PARSEC PARSER FUNCTION DEMONSTRATION 3
Load csv3 into GHCi as shown below:
Prelude> :load csv3.hs
Call parseCSV without input:
Prelude> parseCSV "\n"
Call parseCSV with correctly formatted input:
Prelude> parseCSV "Hello, World!\n"
Call parseCSV with incorrectly formatted input:
Prelude> parseCSV "Hello, World!"
Call parseCSV with correctly formatted input:
Prelude> parseCSV "NA,NA\n"
[1 of 1] Compiling Main ( csv3.hs, interpreted )
Ok, modules loaded: Main.
Prelude>
Right [[""]]
Prelude>Prelude> parseCSV "Hello, World!\n"
Right [["Hello"," World!"]]
Prelude>
Left "(unknown)" (line 1, column 14):
unexpected end of input
expecting "," or end of line
Prelude>
Right [["NA","NA"]]
Prelude>
IX. CONCLUSION
This tutorial has given you an introduction to the Haskell parsec parser function!
Bird, R. (2015). Thinking Functionally with Haskell. Cambridge, England: Cambridge University Press.
Davie, A. (1992). Introduction to Functional Programming Systems Using Haskell. Cambridge, England: Cambridge University Press.
Goerzen, J. & O'Sullivan, B. & Stewart, D. (2008). Real World Haskell. Sebastopol, CA: O'Reilly Media, Inc.
Hutton, G. (2007). Programming in Haskell. New York: Cambridge University Press.
Lipovača, M. (2011). Learn You a Haskell for Great Good!: A Beginner's Guide. San Francisco, CA: No Starch Press, Inc.
Thompson, S. (2011). The Craft of Functional Programming. Edinburgh Gate, Harlow, England: Pearson Education Limited.