Monday, March 4, 2013

HASKELL PARSERS

HASKELL PARSERS

REVISED: Wednesday, January 24, 2024




Haskell Parsers.

I.  HASKELL BASIC PARSER

Almost every program uses some form of parser to pre-process its input. A parser is normally a program that analyses a piece of text to determine its syntactic structure. A parser is also normally a function that takes a string and returns some form of tree. Our first example will be parsing a "comma separated value" (CSV) file. Each line in a CSV file is a record, and each field in the record is separated from the next by a comma. We will work from the following skeleton:

A. FILE

A CSV file contains 0 or more lines, each of which is terminated by the end-of-line character \n.

B. LINE

A line contains 1 or more cells, separated by a comma.

C. CELL

A cell contains 0 or more characters, which must be neither the comma nor the end-of-line character \n.

D. END-OF-LINE

The end-of-line character is the newline, \n

II.  HASKELL PARSEC

You can import the Parsec library which was written by Graham Hutton and Daan Leijen as follows:

Prelude> import Text.ParserCombinators.Parsec
Prelude> 

Parsing is divided into two stages. The first stage is lexical analysis, the domain of tools like Flex. The second stage is parsing itself, performed by programs such as Bison. Both lexical analysis and parsing can be performed by Parsec.

Parsec is a monadic library. Parsec defines its own parsing monad, GenParser.  Parsec provides both simple parsing functions, and helper functions to tie them all together. We will use Parsec's parser library to combine small parsing functions into more sophisticated parsers. 

III.  HASKELL PARSEC EXAMPLE PARSER 1

"Copy Paste" the following csv1 parser example into your text editor and "File, Save As" csv1.hs to your working directory:

-- file: csv1.hs
import Text.ParserCombinators.Parsec

{- A CSV file contains 0 or more lines, each of which is terminated
   by the end-of-line character (eol). -}
csvFile :: GenParser Char st [[String]]   -- First function is csvFile.
csvFile = 
    do result <- many line
       eof
       return result

{- Each line contains 1 or more cells, separated by a comma. -}
line :: GenParser Char st [String]
line = 
    do result <- cells
       eol                       -- end of line
       return result
       
{- Build up a list of cells.  Try to parse the first cell, then figure out 
 what ends the cell. -}
cells :: GenParser Char st [String]
cells = 
    do first <- cellContent
       next <- remainingCells
       return (first : next)

{- The cell either ends with a comma, indicating that 1 or more cells follow,
 or it doesn't, indicating that we are at the end of the cells for this line. -}
remainingCells :: GenParser Char st [String]
remainingCells =
    (char ',' >> cells)            -- Found comma?  More cells coming
    <|> (return [])               -- No comma?  Return [], no more cells

{- Each cell contains 0 or more characters, which must not be a comma or
EOL.  -}
cellContent :: GenParser Char st String
cellContent = 
    many (noneOf ",\n")

{- The end of line character is \n.  -}
eol :: GenParser Char st Char
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]   -- Function parseCSV.
parseCSV input = parse csvFile "(unknown)" input

IV. HASKELL PARSEC PARSER FUNCTION DEMONSTRATION 1

Open GHCi as shown below:

GHCi, version 9.8.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> 

Load csv1 into GHCi as follows:

Prelude>  :load csv1.hs
[1 of 1] Compiling Main             ( csv1.hs, interpreted )
Ok, modules loaded: Main.
Prelude> 

Call the function parseCSV without input:

Prelude>  parseCSV ""
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package array-0.5.0.0 ... linking ... done.
Loading package deepseq-1.3.0.2 ... linking ... done.
Loading package bytestring-0.10.4.0 ... linking ... done.
Loading package mtl-2.1.3.1 ... linking ... done.
Loading package text-1.1.0.0 ... linking ... done.
Loading package parsec-3.1.5 ... linking ... done.
Right []
it :: Either ParseError [[String]]
Prelude>

Call the function parseCSV with input:

Prelude>  parseCSV "Hello, World!\n"
Right [["Hello"," World!"]]
it :: Either ParseError [[String]]
Prelude>
  
V.  HASKELL PARSEC EXAMPLE PARSER 2

"Copy Paste" the following csv2 example into your text editor and "File, Save As" csv2.hs to your working directory:

-- file: csv2.hs
import Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

VI. HASKELL PARSEC PARSER FUNCTION DEMONSTRATION 2

Load csv2 into GHCi as shown below:

Prelude>  :load csv2.hs
[1 of 1] Compiling Main             ( csv2.hs, interpreted )
Ok, modules loaded: Main.
Prelude> 

Call the function parseCSV without input:

Prelude>  parseCSV ""
Right []
it :: Either ParseError [[String]]
Prelude> 

Call the function parseCSV with input:

Prelude>  parseCSV "Hello, World!\n"
Right [["Hello"," World!"]]
it :: Either ParseError [[String]]
Prelude>  
  
VII.  HASKELL PARSEC EXAMPLE PARSER 3

"Copy Paste" the following example into your text editor and then "File, Save As" csv3.hs to your working directory:

-- file: csv3.hs
import Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = quotedCell <|> many (noneOf ",\n\r")

quotedCell = 
    do char '"'
       content <- many quotedChar
       char '"' <?> "quote at end of cell"
       return content

quotedChar =
        noneOf "\""
    <|> try (string "\"\"" >> return '"')

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <?> "end of line"

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

main =
    do c <- getContents
       case parse csvFile "(stdin)" c of
            Left e -> do putStrLn "Error parsing input:"
                         print e
            Right r -> mapM_ print r

VIII. HASKELL PARSEC PARSER FUNCTION DEMONSTRATION 3

Load csv3 into GHCi as shown below:

Prelude>  :load csv3.hs
[1 of 1] Compiling Main             ( csv3.hs, interpreted )
Ok, modules loaded: Main.
Prelude>

Call parseCSV without input:

Prelude>  parseCSV "\n"
Right [[""]]
Prelude>

Call parseCSV with correctly formatted input:

Prelude>  parseCSV "Hello, World!\n"
Right [["Hello"," World!"]]
Prelude>

Call parseCSV with incorrectly formatted input:

Prelude>  parseCSV "Hello, World!"
Left "(unknown)" (line 1, column 14):
unexpected end of input
expecting "," or end of line
Prelude>

Call parseCSV with correctly formatted input:

Prelude>  parseCSV "NA,NA\n"
Right [["NA","NA"]]
Prelude>  

IX. CONCLUSION

This tutorial has given you an introduction to the Haskell parsec parser function!

X. REFERENCES

Bird, R. (2015). Thinking Functionally with Haskell. Cambridge, England: Cambridge University Press.

Davie, A. (1992). Introduction to Functional Programming Systems Using Haskell. Cambridge, England: Cambridge University Press.

Goerzen, J. & O'Sullivan, B. &  Stewart, D. (2008). Real World Haskell. Sebastopol, CA: O'Reilly Media, Inc.

Hutton, G. (2007). Programming in Haskell. New York: Cambridge University Press.

Lipovača, M. (2011). Learn You a Haskell for Great Good!: A Beginner's Guide. San Francisco, CA: No Starch Press, Inc.

Thompson, S. (2011). The Craft of Functional Programming. Edinburgh Gate, Harlow, England: Pearson Education Limited.