Parsec Example Revisited: Custom Configuration File Format Meets the Token Module

As promised, here is a solution to one of the exercises I posed here — namely, to read and parse a configuration file to determine mount options to pass along to the mount command. (No one’s reading this stuff right now, but whatever. Haskell is still that much fun for me!) That is, the previous solution used a rather crude, hard-coded function, fileSystemArgs, as follows:

fileSystemArgs :: String -> [(String, String)]
fileSystemArgs user =
    [ ("ext2", "ext2 -o rw,relatime")
    , ("vfat", "vfat -o rw,uid=" ++ user ++ ",gid=" ++ user)
    ]

I’ve removed this code in favor of reading such options from a configuration file at runtime, as per the exercise. The code has increased to 452 lines (only about 100 more lines than before, with my verbose style), and we make use of the Text.Parsec.Token module to define a super-simple mini-language that we use in the configuration file (two curly brace blocks that define mount options). We only use bits and pieces of it, but it’s still handy because we get some super-convenient parser combinators for free — namely, braces, stringLiteral, and whiteSpace (which are actually aliased to p_braces, p_stringLiteral, and p_whiteSpace in the code). What’s more, these functions intelligently evade trailing whitespace and comment lines automatically. This is why I love Parsec!!!

So without further ado, here’s the code (PUBLIC DOMAIN):

-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
import System.Exit
import System.Process
import Text.Parsec.Char hiding (upper)
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String
import qualified Text.Parsec.Token as PT
import Text.Parsec.Language (emptyDef)
import Control.Monad.Identity

data Opts = Opts
    { all_devices :: Bool
    , unmount :: Bool
    , unmount_all :: Bool
    , discover :: Bool
    , no_color :: Bool
    } deriving (Data, Typeable, Show, Eq)

progOpts :: Opts
progOpts = Opts
    { all_devices = def &= help "mount all USB devices"
    , unmount = def &= help "choose a USB device to unmount"
    , unmount_all = def &= name "U" &= help "unmount all USB devices"
    , discover = def &= help "list all mounted/unmounted USB devices"
    , no_color = def &= help "disable colors"
    }
    &= details
        [ "Notes:"
        , ""
        , "The default behavior without any options is to try to mount a USB device."
            ++ " Here, `device' means a device under the /dev directory, and in our context, is actually a file system partition."
            ++ " Many USB drives have only a single partition, in which case the term `device' means both the USB drive and the single partition it has."
        , ""
        , "Also, allowing the $USER to execute the mount and umount commands with sudo privileges (sudo visudo) will make things less clunky."
        ]

getOpts :: IO Opts
getOpts = cmdArgs $ progOpts
    &= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
    &= program _PROGRAM_NAME
    &= help _PROGRAM_DESC
    &= helpArg [explicit, name "help", name "h"]
    &= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]

_PROGRAM_NAME, _PROGRAM_VERSION, _PROGRAM_INFO, _PROGRAM_DESC, _COPYRIGHT :: String
_PROGRAM_NAME = "usbmnt"
_PROGRAM_VERSION = "0.1.0"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_DESC = "mount/unmount USB device(s)"
_COPYRIGHT = "PUBLIC DOMAIN"

data BlockDevice = BlockDevice
    { shortname :: String
    , uuid :: UUID
    , fsys :: String
    , mountPoint :: MountPoint
    } deriving (Eq)

data MountPoint
    = MPath { path :: FilePath }
    | Swap
    | Unmounted
    | UnknownBlkidVal
    deriving (Eq)

instance Show BlockDevice where
    show BlockDevice{..} = unwords
        [ shortname
        , fsys
        , uuid
        , show mountPoint
        ]

instance Show MountPoint where
    show (MPath path) = path
    show Swap = "Swap"
    show Unmounted = "Unmounted"
    show UnknownBlkidVal = "UnknownBlkidVal"

blockdeviceDefault :: BlockDevice
blockdeviceDefault = BlockDevice
    { shortname = ""
    , uuid = ""
    , fsys = ""
    , mountPoint = MPath {path = ""}
    }

data Config = Config
    { fsyss :: [(String, String)]
    , uuids :: [(UUID, String)]
    } deriving (Eq, Show)

_ALPHANUM :: String
_ALPHANUM = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']

data Color
    = Red
    | Green
    | Yellow
    | Blue
    | CNone
    deriving (Show, Eq)

colorize :: Color -> String -> String
colorize c s = case c of
    Blue -> "\x1b[1;34m" ++ s ++ "\x1b[0m"
    Green -> "\x1b[1;32m" ++ s ++ "\x1b[0m"
    Red -> "\x1b[1;31m" ++ s ++ "\x1b[0m"
    Yellow -> "\x1b[1;33m" ++ s ++ "\x1b[0m"
    _ -> s

main :: IO ()
main = do
    hSetBuffering stdout NoBuffering
    hSetBuffering stderr NoBuffering
    opts <- getOpts
    homeDir <- getEnv "HOME"
    errNo <- argsCheck opts homeDir
    when (errNo > 0) $ exitWith $ ExitFailure errNo
    (devs, takenPaths) <- getDevices opts
    let configLoc = homeDir ++ "/.usbmnt"
    configSrc <- readFile configLoc
    (confErrNo, config) <- parseConfig configSrc configLoc
    when (confErrNo > 0) $ exitWith $ ExitFailure confErrNo
    let mountablePaths = filter (\p -> not $ elem p takenPaths) $ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
        devsKV = zip (map show [(1::Int)..]) . zip devs $ mountablePaths
    prog opts config devsKV

argsCheck :: Opts -> String -> IO Int
argsCheck Opts{..} homeDir
    | null homeDir = e "could not get environment variable $HOME" 1
    | otherwise = return 0
    where
        e :: String -> Int -> IO Int
        e str num = errMsg str >> return num

prog :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
prog opts@Opts{..} config devsKV
    | discover = do
        putStrLn "all devices:"
        mapM_ (\(_, (d, _)) -> putStrLn $ cshow d) devsKV
    | otherwise = do
        putStrLn (if (unmount || unmount_all)
            then "USB device(s) to unmount:"
            else "USB device(s) to mount:")
        mapM_ (\(n, (d, _)) -> putStrLn $ "    " ++ n ++ ") " ++ show' d) devsKV
        putStrLn ""
        mountMenu opts config devsKV
    where
        cshow :: BlockDevice -> String
        cshow b@BlockDevice{..}
            | no_color = show b
            | otherwise = case mountPoint of
                Unmounted -> colorize Green $ show b
                MPath _ -> if not $ null $ getUSBMountPath b
                    then colorize Blue $ show b
                    else show b
                _ -> show b
        show' :: BlockDevice -> String
        show' = if not (unmount || unmount_all)
            then show
            else unwords . init . words . show

mountMenu :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
mountMenu Opts{..} config devsKV
    | unmount = if length devsKV == 1
        then do
            putStrLn "only 1 USB device to unmount"
            tryMount False config (snd . head $ devsKV) >>= exitWith
        else chooseDev prompt devsKV (tryMount False config)
    | unmount_all = do
        putStrLn "unmounting all USB devices..."
        mapM_ (tryMount False config) (map snd devsKV)
        return ()
    | all_devices = do
        putStrLn "mounting all USB devices..."
        mapM_ (tryMount True config) (map snd devsKV)
        return ()
    | length devsKV == 1 = do
        putStrLn "only 1 USB device to mount"
        tryMount True config (snd . head $ devsKV) >>= exitWith
    | otherwise = chooseDev prompt devsKV (tryMount True config)
    where
        prompt :: String
        prompt = if (unmount || unmount_all)
            then "choose USB device to unmount (q to exit)"
            else "choose USB device to mount (q to exit)"

chooseDev :: String -> [(String, (BlockDevice, FilePath))] -> ((BlockDevice, FilePath) -> IO ExitCode) -> IO ()
chooseDev prompt devsKV func = do
    putStrLn prompt
    key <- getLine
    case lookup key devsKV of
        Just dev -> func dev >>= exitWith
        _ -> case key of
            "q" -> return ()
            _ -> chooseDev prompt devsKV func

tryMount :: Bool -> Config -> (BlockDevice, FilePath) -> IO ExitCode
tryMount mount config@Config{..} (bd@BlockDevice{..}, mp)
    | (null margs) = do
        errMsg $ "UUID " ++ squote uuid ++ " was not found in config file"
        errMsg $ "filesystem " ++ squote fsys ++ " was also not found in config file"
        errMsg $ "supported file systems: " ++ (unwords $ map fst fsyss)
        exitWith (ExitFailure 1)
    | otherwise = do
    when mount $ do
        if (null $ mountArgsUUID config uuid)
            then putStrLn $ "filesystem " ++ squote fsys ++ " found in config file"
            else putStrLn $ "UUID " ++ squote uuid ++ " found in config file"
        putStrLn $ "using these arguments: " ++ squote margs
    putStr $ (if mount then "" else "un")
        ++ "mounting "
        ++ shortname
        ++ " (" ++ fsys ++ ") "
        ++ (if mount then "to " ++ mp else "from " ++ show mountPoint)
        ++ ".."
    (_, _, _, p) <- createProcess $ cmd margs shortname
    exitStatus <- waitForProcess p
    if (exitStatus == ExitSuccess)
        then do
            putStrLn "OK"
            return ExitSuccess
        else do
            putStr "FAILED\n"
            errMsg $ (if mount
                then "mount error (perhaps " ++ squote mp ++ " does not exist)"
                else "unmount error")
            return (ExitFailure 1)
    where
        margs = mountArgs config bd
        cmd arguments devPath = CreateProcess
            { cmdspec = ShellCommand (if mount
                then "sudo mount -t " ++ arguments ++ " " ++ devPath ++ " " ++ mp ++ " &>/dev/null"
                else "sudo umount " ++ show mountPoint)
            , cwd = Nothing
            , env = Nothing
            , std_in = CreatePipe
            , std_out = CreatePipe
            , std_err = Inherit
            , close_fds = False
            }

mountArgs :: Config -> BlockDevice -> String
mountArgs Config{..} BlockDevice{..} = case lookup uuid uuids of
    Just a -> a
    _ -> case lookup fsys fsyss of
        Just a -> a
        _ -> []

mountArgsUUID :: Config -> UUID -> String
mountArgsUUID Config{..} uuid' = case lookup uuid' uuids of
    Just a -> a
    _ -> []

getDevices :: Opts -> IO ([BlockDevice], [String])
getDevices Opts{..} = do
    (_, sout, _, p) <- createProcess cmdBlkid
    devs <- case sout of
        Just h -> hGetContents h
        Nothing -> return []
    _ <- waitForProcess p
    let devs' = (map (unwords . words)) . drop 2 . lines $ devs
    devs'' <- mapM parseBlkid devs'
    let toMount = filter (\BlockDevice{..} -> mountPoint == Unmounted) devs''
        toUnmount = filter (\dev -> not $ null $ getUSBMountPath dev) devs''
        takenPaths = filter (not . null) . map getUSBMountPath $ devs''
    when (not discover && null toMount && (not (unmount || unmount_all))) $ do
        errMsg $ "cannot find USB devices to mount"
        exitWith (ExitFailure 1)
    when (not discover && null toUnmount && (unmount || unmount_all)) $ do
        errMsg $ "cannot find USB devices to unmount"
        exitWith (ExitFailure 1)
    return $ formatDevs devs'' toMount toUnmount takenPaths
    where
        formatDevs :: [BlockDevice] -> [BlockDevice] -> [BlockDevice] -> [String] -> ([BlockDevice], [String])
        formatDevs ds m um takenPaths
            | discover = (ds, takenPaths)
            | unmount || unmount_all = (um, takenPaths)
            | otherwise = (m, takenPaths)
        cmdBlkid = CreateProcess
            { cmdspec = ShellCommand ("sudo blkid -o list")
            , cwd = Nothing
            , env = Nothing
            , std_in = CreatePipe
            , std_out = CreatePipe
            , std_err = Inherit
            , close_fds = False
            }

getUSBMountPath :: BlockDevice -> String
getUSBMountPath BlockDevice{..} = case mountPoint of
    MPath str -> if take 6 str == "/mnt/u" && (all (\c -> elem c ['0'..'9']) (drop 6 str))
        then str
        else ""
    _ -> ""

errMsg :: String -> IO ()
errMsg msg = hPutStrLn stderr $ "error: " ++ msg

squote :: String -> String
squote s = "`" ++ s ++ "'"

-- Parsing

-- for parsing the computer-generated output of `sudo blkid -o list'
parserIdentifier :: Parser String
parserIdentifier = many1 $ oneOf $ _ALPHANUM ++ "/-_"

parserWhitespace :: Parser String
parserWhitespace = many1 $ oneOf " \t\n\r"

parserMP :: Parser MountPoint
parserMP =
    try ( do
        a <- oneOf "<("
        b <- manyTill anyChar (lookAhead $ (oneOf ">)"))
        _ <- oneOf ">)"
        let mp = case a of
                '<' -> Swap
                '(' -> case b of
                    "not mounted" -> Unmounted
                    _ -> UnknownBlkidVal
                _ -> UnknownBlkidVal
        return mp
        )
    <|> (parserIdentifier >>= (\s -> return MPath {path = s}))
    <?> "blkid's mount point description"

blkidParser :: Parser BlockDevice
blkidParser =
    try ( do
        sname <- parserIdentifier
        _ <- parserWhitespace
        fs <- parserIdentifier
        _ <- parserWhitespace
        _ <- parserIdentifier -- leave out the "label" column, even if it exists
        _ <- parserWhitespace
        mp <- parserMP
        _ <- parserWhitespace
        uid <- parserIdentifier
        eof
        return BlockDevice
           { shortname = sname
           , uuid = uid
           , fsys = fs
           , mountPoint = mp
           }
        )
    <|>
    do  sname <- parserIdentifier
        _ <- parserWhitespace
        fs <- parserIdentifier
        _ <- parserWhitespace
        mp <- parserMP
        _ <- parserWhitespace
        uid <- parserIdentifier
        eof
        return BlockDevice
            { shortname = sname
            , uuid = uid
            , fsys = fs
            , mountPoint = mp
            }
    <?> "5 or 4 fields to parse"

parseBlkid :: String -> IO BlockDevice
parseBlkid src =
    case parse blkidParser "output of `sudo blkid -o list'" src of
        Left parseError -> errMsg (show parseError) >> return blockdeviceDefault
        Right result -> return result

-- we use a LanguageDef so that we can get whitespace/newline parsing for FREE
-- in our .usbmnt file
configDef :: PT.LanguageDef st
configDef = emptyDef
    { PT.commentStart   = ""
    , PT.commentEnd     = ""
    , PT.commentLine    = "#"
    , PT.nestedComments = False
    -- the identStart/identLetter define what a UUID will look like (a
    -- dash-separated hex number)
    , PT.identStart     = oneOf $ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F']
    , PT.identLetter    = oneOf $ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F'] ++ "-"
    , PT.opStart        = char '.'
    , PT.opLetter       = char '.'
    , PT.reservedOpNames= []
    , PT.reservedNames  = []
    , PT.caseSensitive  = True
    }

-- we call makeTokenParser def and pick out just those we need
lexer :: PT.TokenParser ()
lexer = PT.makeTokenParser configDef

p_identifier :: ParsecT String () Identity String
p_identifier = PT.identifier lexer
p_stringLiteral :: ParsecT String () Identity String
p_stringLiteral = PT.stringLiteral lexer
p_whiteSpace :: ParsecT String () Identity ()
p_whiteSpace = PT.whiteSpace lexer
p_braces :: ParsecT String () Identity a -> ParsecT String () Identity a
p_braces = PT.braces lexer
p_commaSep :: ParsecT String () Identity a -> ParsecT String () Identity [a]
p_commaSep = PT.commaSep lexer
p_symbol :: String -> ParsecT String () Identity String
p_symbol = PT.symbol lexer

type UUID = String

assocParser :: Parser String -> Parser (UUID, String)
assocParser keyParser = do
    key <- keyParser
    _ <- many $ oneOf " \t"
    _ <- string "="
    _ <- many $ oneOf " \t"
    mountOpts <- p_stringLiteral
    return (key, mountOpts)
    <?> "a key-value association"

hashParser :: String -> Parser String -> Parser [(String, String)]
hashParser hashName keyParser = do
    _ <- p_symbol hashName
    _ <- p_symbol "="
    a <- p_braces (p_commaSep $ assocParser keyParser)
    return a
    <?> "a " ++ hashName ++ " curly brace block"

configParser :: Parser Config
configParser = do
    p_whiteSpace -- take care of leading whitespace/comments as defined by configDef
    -- parse FSYS_HASH first
    fsyss' <- hashParser "FSYS_HASH" (many1 alphaNum)
    p_whiteSpace
    -- now parse UUID_HASH
    uuids' <- hashParser "UUID_HASH" (p_identifier)
    eof
    return $ Config {fsyss = fsyss', uuids = uuids'}
    <?> "config with FSYS_HASH and UUID_HASH blocks"

parseConfig :: String -> String -> IO (Int, Config)
parseConfig src loc =
    case parse configParser ("config file at " ++ squote loc) src of
        Left parseError -> errMsg (show parseError) >> return (1, Config [] [])
        Right result -> return (0, result)

And here is the new ~/.usbmnt configuration file that it parses:

# UUID/filesystem mountoptions

# the FSYS_HASH is used as a fallback, if the detected device does not match one of the UUIDs listed above
FSYS_HASH =
    { ext2 = "ext2 -o rw,relatime"
    , vfat = "vfat -o rw,uid=$USER,gid=$USER"
    }

# UUID is pretty straightforward; use blkid to figure it out for a particular device
# as for mount options, these are passed directly into the shell (/bin/sh; see createProcess in System.Process), so you can use things like $USER

# UUID_HASH = {} # if you have no particular device-specific settings

UUID_HASH =
    { 7CBF-B36F =                             "vfat -o rw,uid=$USER,gid=$USER"    # (256 MiB) SD card
    , 9c359666-a4e6-a894-3475-e6cd53660de8 =  "ext2 -o rw,relatime"               # (2 GiB) USB 2.0 thumbdrive
    }

See how there are a bunch of comment lines starting with ‘#‘? With our configDef function which defines a language with “#” as the commentLine, our convenience functions p_whiteSpace, p_symbol, etc. all avoid comment lines automatically! Sure, it’s a trivial automation here, but you can also define multiline comments and more (which I didn’t bother to do since the configuration file format I designed is so simple). I really like how I can robustly parse comment-laden text strings with such ease!

Can you imagine doing the same thing with regular expressions? I used to use regexes for parsing things like this back when I was still a very naive programmer, and then Parsec opened up the wonderful world of parsers for me (especially all the cool convenience functions you get for FREE!). The new rule I have is: use regular expressions to strictly do search-and-replace for unchanging/uniform input (using the Text.Regex.PCRE module), and to use Parsec for everything else. There’s no excuse to put off learning how to use Parsec!

Anyway, the code I posted is a working example and I really do use it in my machines! And all you need is a POSIX system with GHC, Parsec, and CmdArgs to use it.

Next time, I’ll do the second exercise and read in and parse ByteStrings instead of native Haskell Strings — just to show you how trivial the changes are!

Parsec and CmdArgs in Action: A Small Example

In this post, I mentioned that I wrote a ~300 line Haskell program to take care of mounting/unmounting USB drives (especially useful for window-manager-only users like myself). Well, I’ve been using my program (creatively named usbmnt) very happily so far and would like to release it for public consumption. It’s released into the PUBLIC DOMAIN, because that’s how I roll, baby!

-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

import Control.Monad (when)
import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
import System.Exit
import System.Process
import Text.Parsec.Char hiding (upper)
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String

data Opts = Opts
    { all_devices :: Bool
    , unmount :: Bool
    , unmount_all :: Bool
    , discover :: Bool
    , no_color :: Bool
    } deriving (Data, Typeable, Show, Eq)

progOpts :: Opts
progOpts = Opts
    { all_devices = def &= help "mount all USB devices"
    , unmount = def &= help "choose a USB device to unmount"
    , unmount_all = def &= name "U" &= help "unmount all USB devices"
    , discover = def &= help "list all mounted/unmounted USB devices"
    , no_color = def &= help "disable colors"
    }
    &= details
        [ "Notes:"
        , ""
        , "The default behavior without any options is to try to mount a USB device."
            ++ " Here, `device' means a device under the /dev directory, and in our context, is actually a file system partition."
            ++ " Many USB drives have only a single partition, in which case the term `device' means both the USB drive and the single partition it has."
        , ""
        , "Also, allowing the $USER to execute the mount and umount commands with sudo privileges (sudo visudo) will make things less clunky."
        ]

getOpts :: IO Opts
getOpts = cmdArgs $ progOpts
    &= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
    &= program _PROGRAM_NAME
    &= help _PROGRAM_DESC
    &= helpArg [explicit, name "help", name "h"]
    &= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]

_PROGRAM_NAME, _PROGRAM_VERSION, _PROGRAM_INFO, _PROGRAM_DESC, _COPYRIGHT :: String
_PROGRAM_NAME = "usbmnt"
_PROGRAM_VERSION = "0.0.1"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_DESC = "mount/unmount USB device(s)"
_COPYRIGHT = "PUBLIC DOMAIN"

data BlockDevice = BlockDevice
    { shortname :: String
    , uuid :: String
    , fsys :: String
    , mountPoint :: MountPoint
    } deriving (Eq)

data MountPoint
    = MPath { path :: FilePath }
    | Swap
    | Unmounted
    | UnknownBlkidVal
    deriving (Eq)

instance Show BlockDevice where
    show BlockDevice{..} = unwords
        [ shortname
        , fsys
        , uuid
        , show mountPoint
        ]

instance Show MountPoint where
    show (MPath path) = path
    show Swap = "Swap"
    show Unmounted = "Unmounted"
    show UnknownBlkidVal = "UnknownBlkidVal"

blockdeviceDefault :: BlockDevice
blockdeviceDefault = BlockDevice
    { shortname = ""
    , uuid = ""
    , fsys = ""
    , mountPoint = MPath {path = ""}
    }

_ALPHANUM :: String
_ALPHANUM = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']

data Color
    = Red
    | Green
    | Yellow
    | Blue
    | CNone
    deriving (Show, Eq)

colorize :: Color -> String -> String
colorize c s = case c of
    Blue -> "\x1b[1;34m" ++ s ++ "\x1b[0m"
    Green -> "\x1b[1;32m" ++ s ++ "\x1b[0m"
    Red -> "\x1b[1;31m" ++ s ++ "\x1b[0m"
    Yellow -> "\x1b[1;33m" ++ s ++ "\x1b[0m"
    _ -> s

main :: IO ()
main = do
    hSetBuffering stdout NoBuffering
    hSetBuffering stderr NoBuffering
    opts <- getOpts
    user <- getEnv "USER"
    errNo <- argsCheck opts user
    when (errNo > 0) $ exitWith $ ExitFailure errNo
    (devs, takenPaths) <- getDevices opts
    let mountablePaths = filter (\p -> not $ elem p takenPaths) $ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
        devsKV = zip (map show [(1::Int)..]) . zip devs $ mountablePaths
    prog opts user devsKV

argsCheck :: Opts -> String -> IO Int
argsCheck Opts{..} user
    | null user = e "could not get environment variable $USER" 1
    | otherwise = return 0
    where
        e :: String -> Int -> IO Int
        e str num = errMsg str >> return num

prog :: Opts -> String -> [(String, (BlockDevice, FilePath))] -> IO ()
prog opts@Opts{..} user devsKV
    | discover = do
        putStrLn "all devices:"
        mapM_ (\(_, (d, _)) -> putStrLn $ cshow d) devsKV
    | otherwise = do
        putStrLn (if (unmount || unmount_all)
            then "USB device(s) to unmount:"
            else "USB device(s) to mount:")
        mapM_ (\(n, (d, _)) -> putStrLn $ "    " ++ n ++ ") " ++ show' d) devsKV
        putStrLn ""
        mountMenu opts user devsKV
    where
        cshow :: BlockDevice -> String
        cshow b@BlockDevice{..}
            | no_color = show b
            | otherwise = case mountPoint of
                Unmounted -> colorize Green $ show b
                MPath _ -> if not $ null $ getUSBMountPath b
                    then colorize Blue $ show b
                    else show b
                _ -> show b
        show' :: BlockDevice -> String
        show' = if not (unmount || unmount_all)
            then show
            else unwords . init . words . show

mountMenu :: Opts -> String -> [(String, (BlockDevice, FilePath))] -> IO ()
mountMenu Opts{..} user devsKV
    | unmount = if length devsKV == 1
        then do
            putStrLn "only 1 USB device to unmount"
            tryMount False user (snd . head $ devsKV) >>= exitWith
        else chooseDev prompt user devsKV (tryMount False)
    | unmount_all = do
        putStrLn "unmounting all USB devices..."
        mapM_ (tryMount False user) (map snd devsKV)
        return ()
    | all_devices = do
        putStrLn "mounting all USB devices..."
        mapM_ (tryMount True user) (map snd devsKV)
        return ()
    | length devsKV == 1 = do
        putStrLn "only 1 USB device to mount"
        tryMount True user (snd . head $ devsKV) >>= exitWith
    | otherwise = chooseDev prompt  user devsKV (tryMount True)
    where
        prompt :: String
        prompt = if (unmount || unmount_all)
            then "choose USB device to unmount (q to exit)"
            else "choose USB device to mount (q to exit)"

chooseDev :: String -> String -> [(String, (BlockDevice, FilePath))] -> (String -> (BlockDevice, FilePath) -> IO ExitCode) -> IO ()
chooseDev prompt user devsKV func = do
    putStrLn prompt
    key <- getLine
    case lookup key devsKV of
        Just dev -> func user dev >>= exitWith
        _ -> case key of
            "q" -> return ()
            _ -> chooseDev prompt user devsKV func

tryMount :: Bool -> String -> (BlockDevice, FilePath) -> IO ExitCode
tryMount mount user (BlockDevice{..}, mp) = do
    when (null $ mountArgs fsys user) $ do
        errMsg $ "unsupported file system " ++ squote fsys ++ "\nsupported file systems: " ++ (unwords $ map fst (fileSystemArgs user))
        exitWith (ExitFailure 1)
    putStr $ (if mount then "" else "un")
        ++ "mounting "
        ++ shortname
        ++ " (" ++ fsys ++ ") "
        ++ (if mount then "to " ++ mp else "from " ++ show mountPoint)
        ++ ".."
    (_, _, _, p) <- createProcess $ cmd (mountArgs fsys user) shortname
    exitStatus <- waitForProcess p
    if (exitStatus == ExitSuccess)
        then do
            putStrLn "OK"
            return ExitSuccess
        else do
            putStr "FAILED\n"
            errMsg $ (if mount
                then "mount error (perhaps " ++ squote mp ++ " does not exist)"
                else "unmount error")
            return (ExitFailure 1)
    where
        cmd arguments devPath = CreateProcess
            { cmdspec = ShellCommand (if mount
                then "sudo mount -t " ++ arguments ++ " " ++ devPath ++ " " ++ mp ++ " &>/dev/null"
                else "sudo umount " ++ show mountPoint)
            , cwd = Nothing
            , env = Nothing
            , std_in = CreatePipe
            , std_out = CreatePipe
            , std_err = Inherit
            , close_fds = False
            }

fileSystemArgs :: String -> [(String, String)]
fileSystemArgs user =
    [ ("ext2", "ext2 -o rw,relatime")
    , ("vfat", "vfat -o rw,uid=" ++ user ++ ",gid=" ++ user)
    ]

mountArgs :: String -> String -> String
mountArgs fsys user = case lookup fsys (fileSystemArgs user) of
    Just a -> a
    _ -> []

getDevices :: Opts -> IO ([BlockDevice], [String])
getDevices Opts{..} = do
    (_, sout, _, p) <- createProcess cmdBlkid
    devs <- case sout of
        Just h -> hGetContents h
        Nothing -> return []
    _ <- waitForProcess p
    let devs' = (map (unwords . words)) . drop 2 . lines $ devs
    devs'' <- mapM parseBlkid devs'
    let toMount = filter (\BlockDevice{..} -> mountPoint == Unmounted) devs''
        toUnmount = filter (\dev -> not $ null $ getUSBMountPath dev) devs''
        takenPaths = filter (not . null) . map getUSBMountPath $ devs''
    when (not discover && null toMount && (not (unmount || unmount_all))) $ do
        errMsg $ "cannot find USB devices to mount"
        exitWith (ExitFailure 1)
    when (not discover && null toUnmount && (unmount || unmount_all)) $ do
        errMsg $ "cannot find USB devices to unmount"
        exitWith (ExitFailure 1)
    return $ formatDevs devs'' toMount toUnmount takenPaths
    where
        formatDevs :: [BlockDevice] -> [BlockDevice] -> [BlockDevice] -> [String] -> ([BlockDevice], [String])
        formatDevs ds m um takenPaths
            | discover = (ds, takenPaths)
            | unmount || unmount_all = (um, takenPaths)
            | otherwise = (m, takenPaths)
        cmdBlkid = CreateProcess
            { cmdspec = ShellCommand ("sudo blkid -o list")
            , cwd = Nothing
            , env = Nothing
            , std_in = CreatePipe
            , std_out = CreatePipe
            , std_err = Inherit
            , close_fds = False
            }

getUSBMountPath :: BlockDevice -> String
getUSBMountPath BlockDevice{..} = case mountPoint of
    MPath str -> if take 6 str == "/mnt/u" && (all (\c -> elem c ['0'..'9']) (drop 6 str))
        then str
        else ""
    _ -> ""

errMsg :: String -> IO ()
errMsg msg = hPutStrLn stderr $ "error: " ++ msg

squote :: String -> String
squote s = "`" ++ s ++ "'"

-- Parsing
parserIdentifier :: Parser String
parserIdentifier = many1 $ oneOf $ _ALPHANUM ++ "/-_"

parserWhitespace :: Parser String
parserWhitespace = many1 $ oneOf " \t\n\r"

parserMP :: Parser MountPoint
parserMP =
    try ( do
        a <- oneOf "<(" -- "(not mounted)" or "<swap>"
        b <- manyTill anyChar (lookAhead $ (oneOf ">)"))
        _ <- oneOf ">)"
        let mp = case a of
                '<' -> Swap
                '(' -> case b of
                    "not mounted" -> Unmounted
                    _ -> UnknownBlkidVal
                _ -> UnknownBlkidVal
        return mp
        )
    <|> (parserIdentifier >>= (\s -> return MPath {path = s})) -- e.g., "/mnt/blah"
    <?> "blkid's mount point description"

blkidParser :: Parser BlockDevice
blkidParser =
    try ( do
        sname <- parserIdentifier
        _ <- parserWhitespace
        fs <- parserIdentifier
        _ <- parserWhitespace
        _ <- parserIdentifier -- leave out the "label" column, even if it exists
        _ <- parserWhitespace
        mp <- parserMP
        _ <- parserWhitespace
        uid <- parserIdentifier
        eof
        return BlockDevice
           { shortname = sname
           , uuid = uid
           , fsys = fs
           , mountPoint = mp
           }
        )
    <|>
    do  sname <- parserIdentifier
        _ <- parserWhitespace
        fs <- parserIdentifier
        _ <- parserWhitespace
        mp <- parserMP
        _ <- parserWhitespace
        uid <- parserIdentifier
        eof
        return BlockDevice
            { shortname = sname
            , uuid = uid
            , fsys = fs
            , mountPoint = mp
            }
    <?> "5 or 4 fields to parse"

parseBlkid :: String -> IO BlockDevice
parseBlkid src =
    case parse blkidParser "output of `sudo blkid -o list'" src of
        Left parseError -> errMsg (show parseError) >> return blockdeviceDefault
        Right result -> return result

This example compiles with ghc –make -O2 -Wall -Werror. The only 2 dependencies are Parsec, and CmdArgs.

Looking back at it, I think it’s a great example of a real-world Haskell program out in the wild. It uses Parsec to correctly parse the output from blkid, CmdArgs for sane argument handling, and also does some system calls with the createProcess command. Coding-style-wise, it uses the simple where expression wherever possible, for maximum readability and tries to keep leading whitespace to a minimum. The code is written very verbosely, with explicit type signatures for every function, because, well, it helps me keep my sanity.

I think the code is pretty straightforward. It only supports 2 file systems: the ext2 type (recommended for Linux-only USB drives), and vfat (the FAT file system used for legacy Windows support), but this could be easily extended to support ext3, ext4, or any other file system, since it just wraps around the standard mount command.

The only tricky part is blkidParser, which tries to parse 5 fields or 4 fields. The reason behind this is because blkid -o’s output looks like this:

$ sudo blkid -o list
device       fs_type label    mount point      UUID
-----------------------------------------------------------------------------------
/dev/sda1    ntfs             /mnt/windows-xp  XXXXXXXXXXXXXXXX
/dev/sda2    ext4             /                XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda3    ext4             /home            XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda6    swap             <swap>           XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda5    ext4             /mnt/data        XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sdb1    ext2             (not mounted)    XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sdc1    vfat             (not mounted)    XXXX-XXXX

Sometimes, the label field is empty, as in the example above. So, we will end up with just 4 fields instead of 5.

I hope this sample, working program will help newbies out (esp. you who are frustrated by Real World Haskell’s clunky Parsec examples…). And for those looking to actually use it themselves for easy USB mounting/unmounting, here are some additional notes:

  • Create mount points (i.e., directories) /mnt/u0, /mnt/u1, /mnt/u2, etc. so that the mount command won’t choke.
  • Use shell aliases or keyboard hotkeys to avoid typing out “usbmnt -U” every single time.
  • Tweak the options passed to the mount command to suit your needs (see the fileSystemArgs function).

(1) A simple exercise would be to extend usbmnt so that it reads a very simple configuration file to detect the right user-defined options to pass to mount depending on the partition’s UUID. The only additional thing you would have to do is read a file from disk and store it as a String.

(2) An eye-opening exercises for Parsec newcomers would be to pass to the parsing functions a ByteString instead of String. (The transition to a ByteString is extremely straight-forward, and requires minimal changes.)

I will do the exercises myself later on and post them in a couple weeks at the latest.

UPDATE: December 1, 2011: Here are some convenience links for those coming in from Google for the solutions to the 2 exercises: Solution for #1 Solution for #2

Happy hacking!

EDIT: December 1, 2011: Fix typo.

wxHaskell: Scaling a Bitmap Image

I ran into this page while trying to figure out how to scale a bitmap image from within wxHaskell. The solution posted by the OP there scales the drawing context (e.g., the canvas) and not the bitmap itself.

To scale the bitmap itself, just do it like this:

import Graphics.UI.WX
import Graphics.UI.WXCore

scaleBitmap ... = do
    imgScaled <- imageConvertToBitmap =<< imageScale img (sz newWidth newHeight)
    -- do stuff with imgScaled; e.g., draw it somewhere with drawBitmap
    bitmapDelete imgScaled
    where
        img = image "foo.bmp"
        newWidth = 100
        newHeight = 20

The key is to load up the bitmap with the generic image function, and not the usual bitmap function. This way, you can make use of the imageScale function to scale your image, and then you can convert it back again to bitmap format with imageConvertToBitmap. Finally, when we are done with drawing this image somewhere, we free the memory used up to create the bitmap version of it, by calling bitmapDelete. A bit clunky, but simple enough. I tested it with a Windows BMP with an alpha channel and it worked quite nicely.

By the way, wxHaskell is very easy to learn — you just need to skim Daan Leijen’s paper whenever you get stuck. (Leijen also wrote the beautifully powerful Parsec library — another library I enjoyed using once I got the hang of it.)

Detecting Unmounted Partitions With Blkid

Did you know that you can instantly check all partitions on your system (including USB thumb drives), and see if they’re mounted or not? The hero command of the day is sudo blkid -o list:

$ sudo blkid -o list
device       fs_type label    mount point      UUID
-----------------------------------------------------------------------------------
/dev/sda1    ntfs             /mnt/windows-xp  XXXXXXXXXXXXXXXX
/dev/sda2    ext4             /                XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda3    ext4             /home            XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda6    swap             <swap>           XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda5    ext4             /mnt/data        XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sdb1    ext2             (not mounted)    XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sdc1    vfat             (not mounted)    XXXX-XXXX

As you can see, I have two USB drives (one in ext2 and the other in vfat format) plugged in, but not mounted, and blkid detects this for me. Pretty useful, don’t you think? There’s no need to parse /proc/mounts or fiddle with the (very) verbose output of sudo fdisk -l. I stumbled upon blkid’s obscure “-o list” option while trying to write a shell script to automatically mount unmounted USB drives.

The only troublesome aspects of “-o list” are that you can’t customize which columns are displayed (e.g., for me, I’d like to drop the label column), and also you can’t separate the columns by whitespace because of how there is a space inside the (not mounted) mount point. Looking at the sources for blkid, it seems that there are also two other descriptions with whitespace in them: (in use) and (mounted, mtpt unknown) (see http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=blob_plain;f=misc-utils/blkid.c;hb=HEAD). These deficiencies make it hard to easily and reliably parse the output with just a shell script.

I actually ended up writing a 300 line Haskell program that uses Parsec to reliably parse the output of blkid. It took a while to write, considering my newbie Haskell skills (aren’t most Haskellers late bloomers?). Anyway, it also leverages the CmdArgs library, and automatically mounts/unmounts USB devices with ease and grace. Speaking of Haskell, I’m slowly in the process of converting my various error-prone shell scripts into robust, mini Haskell programs, and I’ve been very satisfied with the results. And porting shell scripts into Haskell is a great way to learn more Haskell, too!

UPDATE December 1, 2011: Here is a convenience link to said ~300 line program for you Googlers: Parsec and CmdArgs in Action: A Small Example.

UPDATE December 11, 2011: Fixed broken link. Also, the recent kernel.org fiasco has changed the home of util-linux-ng to: https://github.com/karelzak/util-linux/, and the blkid source is at https://github.com/karelzak/util-linux/blob/master/misc-utils/blkid.c.

Xmonad.hs: The Joy of Refactoring

The more I use Haskell, the more I learn to appreciate its beauty. The thesis of this post is: Haskell code is so easy to refactor. What follows is an account of my experience the other day with extending my XMonad configuration file, and how easy/fun it was for me. It’s written in a very newbie-friendly way for all XMonad users who don’t know much Haskell, if at all.

The other day, I ended up writing something like the following into my xmonad.hs file:

myManageHook :: ManageHook
myManageHook = composeAll
    [
    ...
    , resource  =? "atWorkspace0" --> doShift "0"
    , resource  =? "atWorkspace1" --> doShift "1"
    , resource  =? "atWorkspace2" --> doShift "2"
    , resource  =? "atWorkspace3" --> doShift "3"
    , resource  =? "atWorkspace4" --> doShift "4"
    , resource  =? "atWorkspace5" --> doShift "5"
    , resource  =? "atWorkspace6" --> doShift "6"
    , resource  =? "atWorkspace7" --> doShift "7"
    , resource  =? "atWorkspace8" --> doShift "8"
    , resource  =? "atWorkspace9" --> doShift "9"
    , resource  =? "atWorkspaceF1" --> doShift "F1"
    , resource  =? "atWorkspaceF2" --> doShift "F2"
    , resource  =? "atWorkspaceF3" --> doShift "F3"
    , resource  =? "atWorkspaceF4" --> doShift "F4"
    , resource  =? "atWorkspaceF5" --> doShift "F5"
    , resource  =? "atWorkspaceF6" --> doShift "F6"
    , resource  =? "atWorkspaceF7" --> doShift "F7"
    , resource  =? "atWorkspaceF8" --> doShift "F8"
    , resource  =? "atWorkspaceF9" --> doShift "F9"
    , resource  =? "atWorkspaceF10" --> doShift "F10"
    , resource  =? "atWorkspaceF11" --> doShift "F11"
    , resource  =? "atWorkspaceF12" --> doShift "F12"
    ]

Even if you don’t know Haskell, you can tell that the above is very repetitive. It looks a bit stupid. Vim’s visual block mode makes editing the above lines in bulk pretty easy, but your xmonad.hs file is a program’s source code, and source code must obey the Do not Repeat Yourself rule. It will make life easier down the road.

Let’s first consider what the code above means. First, we observe that myManageHook is a function, of type ManageHook. The composeAll function’s type signature is as follows:

composeAll :: [ManageHook] -> ManageHook

(I loaded up XMonad from GHCi to figure this out.) So composeAll merely “compresses” a list of ManageHook types into a single ManageHook. Great! Now we know exactly what each element in the list really means:

myManageHook = composeAll
    [ a ManageHook
    , a ManageHook
    , a ManageHook
    , a ManageHook
    ]

So to covert the various “atWorkspace0”, “atWorkspace1”, etc. lines we just need to create a function that generates a list of ManageHook types, and append this list to the one that already exists! Like this:

myManageHook :: ManageHook
myManageHook = composeAll $
    [ ... existing MangeHook items
    ]
    ++ workspaceShifts
    where
        workspaceShifts = another list of ManageHooks

Notice how we had to add in a “$” symbol, because:

myManageHook = composeAll [list 1] ++ [list 2]

means

myManageHook = (composeAll [list 1]) ++ [list 2]

which means

myManageHook = ManageHook ++ [list 2]

which is definitely not what we want. We want this:

myManageHook = composeAll ([list 1] ++ [list 2])

which is

myManageHook = composeAll [lists 1 and 2 combined]

and we can do this with the “$” dollar symbol. We could just use the explicit parentheses instead; ultimately it is a matter of style/taste.

So, let’s keep going. The workspaceShifts function needs to generate a list of ManageHooks. From the first code listing, it’s clear that all the generated ManageHooks need only vary slightly from each other:

    [
    ...
    , resource  =? "atWorkspace0" --> doShift "0"
    , resource  =? "atWorkspace1" --> doShift "1"
    , resource  =? "atWorkspace2" --> doShift "2"
    , resource  =? "atWorkspace3" --> doShift "3"
    , resource  =? "atWorkspace4" --> doShift "4"
    , resource  =? "atWorkspace5" --> doShift "5"
    , resource  =? "atWorkspace6" --> doShift "6"
    , resource  =? "atWorkspace7" --> doShift "7"
    , resource  =? "atWorkspace8" --> doShift "8"
    , resource  =? "atWorkspace9" --> doShift "9"
    , resource  =? "atWorkspaceF1" --> doShift "F1"
    , resource  =? "atWorkspaceF2" --> doShift "F2"
    , resource  =? "atWorkspaceF3" --> doShift "F3"
    , resource  =? "atWorkspaceF4" --> doShift "F4"
    , resource  =? "atWorkspaceF5" --> doShift "F5"
    , resource  =? "atWorkspaceF6" --> doShift "F6"
    , resource  =? "atWorkspaceF7" --> doShift "F7"
    , resource  =? "atWorkspaceF8" --> doShift "F8"
    , resource  =? "atWorkspaceF9" --> doShift "F9"
    , resource  =? "atWorkspaceF10" --> doShift "F10"
    , resource  =? "atWorkspaceF11" --> doShift "F11"
    , resource  =? "atWorkspaceF12" --> doShift "F12"
    ]

So the only thing that really changes is the suffix after “atWorkspace”; it goes from “0” to “F12”. Let’s express this idea in code:

myManageHook :: ManageHook
myManageHook = composeAll $
    [ ...
    ]
    ++ workspaceShifts
    where
        workspaceShifts = genList (s1 ++ s2)
        genList :: [String] -> [ManageHook]
        genList ss = map (\s -> resource =? ("atWorkspace" ++ s) --> doShift s) ss

We create a new genList function, which needs an argument (a [String], or “list of strings” to be exact). We creatively call them ss in the code above. The map function simply modifies each item in a list in the same way. So here, we modify every string (s) to become a MangeHook, by giving map‘s first argument as:

(\s -> resource =? ("atWorkspace" ++ s) --> doShift s)

The backslash followed by s binds a single string from the ss list as the variable s. The right arrow (->) signals the start of the function definition. Do you see the resemblance?

    ...
    , resource  =? "atWorkspace0" --> doShift "0"
    , resource  =? "atWorkspace1" --> doShift "1"
    , resource  =? "atWorkspace2" --> doShift "2"
    , resource  =? "atWorkspace3" --> doShift "3"
    ...

    vs.

    resource =? ("atWorkspace" ++ s) --> doShift s

Nice, clean, and succinct.

Now we just need to define those strings to feed into genList. Here’s s1:

        s1 :: [String] -- a list of strings
        s1 = map show [0..9]

Again, we use the map function. It’s such a handy little function! Haskell is very much built around such useful little named primitives (by convention, most things that look like “operators” such as multi-punctuation arrows, ampersands, and colons are part of some niche library, and not Haskell the core language). The show function merely converts its argument (here, a list of Ints), into strings, so the

s1 = map show [0..9]

part really means: [“0″,”1″,”2″,”3″,”4″,”5″,”6″,”7″,”8″,”9”].

So that’s s1. The second list, s2, is almost identical:

s2 = map (("F" ++) . show) [1..12]

The only difference is that it adds “F” as a prefix, so that we get “F1” instead of “1”, “F2” instead of “2”, and so on.

So, that’s it! The complete code is as follows:

myManageHook :: ManageHook
myManageHook = composeAll $
    [ ...
    ]
    ++ workspaceShifts
    where
        workspaceShifts = genList (s1 ++ s2)
        genList :: [String] -> [ManageHook]
        genList ss = map (\s -> resource =? ("atWorkspace" ++ s) --> doShift s) ss
        s1, s2 :: [String]
        s1 = map show [0..9]
        s2 = map (("F" ++) . show) [1..12]

We can actually shorten it even more:

myManageHook :: ManageHook
myManageHook = composeAll $
    [ ...
    ]
    ++ map (\s -> resource =? ("atWorkspace" ++ s) --> doShift s) (s1 ++ s2)
    where
        s1 = map show [0..9]
        s2 = map (("F" ++) . show) [1..12]

In the end, we’ve reduced 22 lines of stupid, repetitive code prone to human error and typos down to 4 lines of smart, modular code. I really enjoy this kind of refactoring: reduction of human-error-prone code.

What’s more, the whole experience was quite easy and enjoyable. I did not need to know what exactly a ManageHook type represented; instead, all I needed to know were the type signatures of the various existing parts. And that’s why Haskell is so fun to refactor: type signatures, combined with industrial-strength type checking, makes things safe. Plus, if you look at the code, everything matters. It’s really hard to write spaghetti code in Haskell (even for relative newbies like myself).

For the curious, here is a list of type signatures for composeAll, resource, doShift, (–<), and (=?):

Prelude XMonad> :t composeAll
composeAll :: [ManageHook] -> ManageHook
Prelude XMonad> :t resource
resource :: Query String
Prelude XMonad> :t doShift
doShift :: WorkspaceId -> ManageHook
Prelude XMonad> :t (-->)
(-->) :: Query Bool -> ManageHook -> ManageHook
Prelude XMonad> :t (=?)
(=?) :: Eq a => Query a -> a -> Query Bool

UPDATE July 3, 2011: Hendrik mentions a more concise approach using list comprehensions (arguably more Haskell-y, and simper):

myManageHook :: ManageHook
myManageHook = composeAll $
    [ ...
    ]
    ++ [resource =? ("atWorkspace" ++ s) --> doShift s 
       | s <- map show [0..9] ++ map (('F':) . show) [1..12] ]

UPDATE November 8, 2011: Fixed typo.

Haskell: Using CmdArgs (Single and Multi-Mode)

I use the CmdArgs module all the time for all of my personal Haskell projects. CmdArgs is really the one-stop solution for all of your command-line option handling needs. Its “killer” feature is the ability to support multi-mode options (e.g., “myprog mode1 [mode1 options]”, “myprog mode2 [mode2 options]”, etc.). Unfortunately, not many people know about CmdArgs, it seems. So, I’m writing this post to help spread its popularity among fellow newbie/intermediate Haskellers. I’m including two sample programs — a traditional single-mode program, and a multi-mode program, for your tinkering pleasure. They are hereby released into the PUBLIC DOMAIN.

The following is a simple single-mode program, singleMode.hs.

-- singleMode.hs
-- License: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Console.CmdArgs
import System.Environment (getArgs, withArgs)
import System.Exit
import Control.Monad (when)

data MyOptions = MyOptions
    { color :: Bool
    , first_name :: String
    , age :: Int
    , directory :: FilePath
    } deriving (Data, Typeable, Show, Eq)

-- Customize your options, including help messages, shortened names, etc.
myProgOpts :: MyOptions
myProgOpts = MyOptions
    { color = def &= help "use color"
    , first_name = def &= help "your first name"
    , age = def &= explicit &= name "g" &= name "age" &= help "your age"
    , directory = def &= typDir &= help "your first name"
    }

getOpts :: IO MyOptions
getOpts = cmdArgs $ myProgOpts
    &= verbosityArgs [explicit, name "Verbose", name "V"] []
    &= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]
    &= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
    &= help _PROGRAM_ABOUT
    &= helpArg [explicit, name "help", name "h"]
    &= program _PROGRAM_NAME

_PROGRAM_NAME = "myProg"
_PROGRAM_VERSION = "0.1.2.3"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_ABOUT = "a sample CmdArgs program for you tinkering pleasure"
_COPYRIGHT = "(C) Your Name Here 2011"

main :: IO ()
main = do
    args <- getArgs
    -- If the user did not specify any arguments, pretend as "--help" was given
    opts <- (if null args then withArgs ["--help"] else id) getOpts
    optionHandler opts

-- Before directly calling your main program, you should warn your user about incorrect arguments, if any.
optionHandler :: MyOptions -> IO ()
optionHandler opts@MyOptions{..}  = do
    -- Take the opportunity here to weed out ugly, malformed, or invalid arguments.
    when (null first_name) $ putStrLn "--first-name is blank!" >> exitWith (ExitFailure 1)
    when (age < 5) $ putStrLn "you must be at least 5 years old to run this program" >> exitWith (ExitFailure 1)
    -- When you're done, pass the (corrected, or not) options to your actual program.
    exec opts

exec :: MyOptions -> IO ()
exec opts@MyOptions{..} = do
    putStrLn $ "Hello, " ++ firstname ++ "!"
    putStrLn $ "You are " ++ showAge ++ " years old."
    where
        firstname = if color
            then "\x1b[1;31m" ++ first_name ++ "\x1b[0m"
            else first_name
        showAge = if color
            then "\x1b[1;32m" ++ show age ++ "\x1b[0m"
            else show age

Screenshot:

Don’t you just love how CmdArgs handles all the formatting and line breaking for you automagically? Anyway, onto the discussion.

You first specify your program’s options by declaring a data type (in this case, MyOptions). Pretty self-explanatory. The myProgOpts function is where you specify your program’s options. You can pass in “annotations” (as CmdArgs calls them) to your options. Here, I used the explicit annotation to prevent CmdArgs from auto-generating the shorter name for the age option. The typ &= “NAME” makes it so that we get “NAME” in the help message for the –first-name option. typDir is just a shortcut for typ &= “DIR” because it’s so common.

The def function is just a sane default type for many data types, such as Bool (False), String (“”), Int (0), and others. Generally, you should use def unless you really want a different default value for an option.

Notice how the option full_name turned into –full-name? The use of underscores in your options is converted into dashes by CmdArgs. Neat!

The getOpts function has some customizations: we use -v for –version instead of -V (the default), and also use -h for the short name for –help (-? is the default). We also specify the program’s metadata (copyright info, etc.).

In the main function, if no arguments are given, we display the default help message (and exit). It took me a while to figure this one out, so take heed!

If you don’t want the –verbose or –quiet options, then just remove the line “&= verbosityArgs …”.

The RecordWildCards pragma makes the code extremely concise and easy to use. I must admit, ever since stumbling upon this pragma, I’ve used it everywhere in my own code in other projects as well.

OK, so that’s what a simple single-mode program looks like. How about multi-mode programs? Here is one, multiMode.hs:

-- multiMode.hs
-- License: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Console.CmdArgs
import System.Environment (getArgs, withArgs)
import System.Exit
import Control.Monad (when)

data MyOptions =
    Mode1   { first_name :: String
            , last_name :: String
            }
    |
    Mode2   { height :: Double
            , weight :: Double
            } deriving (Data, Typeable, Show, Eq)

mode1 :: MyOptions
mode1 = Mode1
    { first_name = "FIRSTNAME" &= help "your first name"
    , last_name = "LASTNAME" &= help "your last name"
    }
    &= details  [ "Examples:"
                , "Blah blah blah."
                ]

mode2 :: MyOptions
mode2 = Mode2
    { height = def &= help "your height, in centimeters"
    , weight = def &= help "your weight, in kilograms"
    }
    &= details  [ "Examples:"
                , "Blah blah blah again."
                ]

myModes :: Mode (CmdArgs MyOptions)
myModes = cmdArgsMode $ modes [mode1, mode2]
    &= verbosityArgs [explicit, name "Verbose", name "V"] []
    &= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]
    &= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
    &= help _PROGRAM_ABOUT
    &= helpArg [explicit, name "help", name "h"]
    &= program _PROGRAM_NAME

_PROGRAM_NAME = "myProg"
_PROGRAM_VERSION = "0.1.2.3"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_ABOUT = "a sample CmdArgs program for you tinkering pleasure"
_COPYRIGHT = "(C) Your Name Here 2011"

main :: IO ()
main = do
    args <- getArgs
    -- If the user did not specify any arguments, pretend as "--help" was given
    opts <- (if null args then withArgs ["--help"] else id) $ cmdArgsRun myModes
    optionHandler opts

optionHandler :: MyOptions -> IO ()
optionHandler opts@Mode1{..}  = do
    when (null first_name) $ putStrLn "warning: --first-name is blank"
    when (null last_name) $ putStrLn "warning: --last-name is blank"
    exec opts
optionHandler opts@Mode2{..}  = do
    when (height == 0.0) $ putStrLn "warning: --height is 0.0"
    when (weight == 0.0) $ putStrLn "warning: --weight is 0.0"
    exec opts

exec :: MyOptions -> IO ()
exec opts@Mode1{..} = putStrLn $ "Hello, " ++ first_name ++ " " ++ last_name ++ "!"
exec opts@Mode2{..} = putStrLn $ "You are " ++ show height ++ "cm tall, and weigh " ++ show weight ++ "kg!"

Screenshot:

The program uses 2 modes, mode1 and mode2. If you use modes without overlapping letters, then you can invoke them with the least amount of unambiguous letters. E.g., if the modes were named foo and bar, you could just do “multiMode f …” and foo mode would be in effect. CmdArgs does this for us.

If your modes have lots and lots of options, then calling “multiMode -h” would give a brief summary, instead of giving each mode’s detailed help message. Here, the modes have only a couple options each, so that’s why the default help message is very detailed.

You might have noticed that I included a details annotation for both modes. This annotation is useful if you want to give examples on which options to use in what way.

Happy coding!

UPDATE April 26, 2011: Victor from the comments asked a question about including a custom type as an option parameter. I never needed this option, but, it’s do-able (with the new CmdArgs 0.6.9). Anyway, here is my minimal example:

instance Default MyComplexType where
    def = MyComplexType { brand = "BRAND"
                        , eggs = 0
                        , flowers = 0
                        , nest = Nestor { books = 0, tag = "TAG", author = "AUTHOR" }
                        }

data MyComplexType = MyComplexType
    { brand :: String
    , eggs :: Int
    , flowers :: Integer
    , nest :: Nestor
    } deriving (Data, Typeable, Show, Eq)

data Nestor = Nestor
    { books :: Int
    , tag :: String
    , author :: String
    } deriving (Data, Typeable, Show, Eq)

data MyOptions = MyOptions
    { color :: Bool
    , first_name :: String
    , age :: Int
    , directory :: FilePath
    , custom :: MyComplexType
    } deriving (Data, Typeable, Show, Eq)

Can you believe it? CmdArgs can even handle nested, custom complex types! Here’s how CmdArgs renders the above, in the help message:

CmdArgs will automatically expect a string of comma-separate values (CSVs). If you don’t like the auto-generated “ITEM,INT,INT,INT,ITEM,…” description, you could easily replace it with a custom string, with a &= typ “X,Y,Z,…” annotation.

If you don’t like the CSV format that CmdArgs imposes on your custom complex type, you’re out of luck. Your best option is to just use a single String type as the option argument, and then parse that with Parsec on your own (probably in the optionHandler function, in case of any errors) and then pass on the parsed data into your main program.

UPDATE July 21, 2011: I tried out CmdArgs with a custom algebraic datatype (something like “data Color = Red | Green | Blue | ColorNone”) and it still works! Your argument for this sort of data type will be, e.g., “–color-rgb red” and the “red” argument will be interpreted as the “Red” type. Actually, CmdArgs is very lenient and will accept any non-ambiguous argument, so here, “–color-rgb r” will also be interpreted as “Red”. Here is an example of the relevant portions:

instance Default Color where
    def = ColorNone

data Color = Red
           | Green
           | Blue
           | ColorNone
    deriving (Data, Typeable, Show, Eq)

myProgOpts = MyOptions
    { ...
    , color_rgb = def &= help "select an RGB color (Red, Green, or Blue)"
    ...
    }

UPDATE August 5, 2011: You can even specify an option as a list:

data Format     = Text
                | TeX
                | HTML
                | FNone
    deriving (Data, Typeable, Show, Eq)

data MyOptions = MyOptions
    { ...
    , format :: [Format]
    , ...
    } deriving (Data, Typeable, Show, Eq)

myProgOpts = MyOptions
    { ...
    , format = [] &= help "format of output"
    , ...
    }

Here, you can do something like “-f TeX -f HTML -f Text” to set the format option to the value [TeX, HTML, Text]. This possibility to use a list as an option parameter is quite useful (such as the example here, where you want to specify multiple output targets).

My Newbie Experience With Haskell’s IO Monad

So I’ve been on the Haskell train for about two and a half months now, and wrote 1804 lines of Haskell code according to cloc. There are oodles of articles out there extolling the virtues of Haskell, so I want to limit this post to just one thing: the I/O Monad (with a discussion on bind operators, do-notation, and the return function) and what I’ve learned about them through many lines of trial and error. As you bump into do-notation more and more, you’ll have to eventually learn what it really means, and how the return and bind ((>>) and (>>=)) functions play a big role in any monadic operation. This post is meant for Haskell newbies, like myself two and a half months ago.

DISCLAIMER: Before I start, I must warn you: I don’t have a “true” understanding of Monads, even after a couple thousand lines! I’ve greatly simplified everything down so that even an ultra-beginner-newbie would understand what I’ve written here. Some of my explanations are probably not “correct,” but hey, there’s no wrong in trying, right? Besides, it’s probably better to understand things imperfectly, like a child, and then slowly refine things over time. If you insist on at least a preview of the true understanding behind Monads, go to this page first (but if you really are a Haskell newbie I seriously doubt you’d benefit from it).

In a simple “Hello World” program, you have something like this:

main :: IO ()
main = putStrLn "Hello world!"

The type signature above looked very scary to me for a long time. Not any more. The type “IO ()”, in plain English, means: “This is a function that does some I/O stuff at some point, and in the end, does not return any value to the calling function.” The () (aka “unit”) is Haskell’s rough equivalent of a NULL value in C.

An equivalent (as far as types are concerned) way to write the above function is:

main :: IO ()
main = do   {
            ; putStrLn "Hello world!"
            ; return ()
            }

or (since do-notation in the above example is a little clunky, even without the braces and semicolons)

main :: IO ()
main = putStrLn "Hello world!" >> return ()

But, the putStrLn function already results in a “IO ()” by itself:

putStrLn :: String -> IO ()

so doing

main :: IO ()
main = do   {
            ; putStrLn "Hello world!"
            ; return ()
            }

is redundant. It’s like doing this:

main :: IO ()
main = do   {
            ; putStrLn "Hello world!"
            ; return ()
            ; return ()
            ; return ()
            ; return ()
            ; return ()
            }

Which is still valid, but obviously redundant. If the above code’s validity surprised you, then you are in good shape. (See the discussion about the return function below.) Anyway, sorry for the detour. Let’s focus on this example:

main :: IO ()
main = putStrLn "Hello world!" >> return ()

The (>>) operator just means “I don’t care what is on the left side; just compute what’s on the right hand side.” In this case, we explicitly force the computation of “return ()”, after computing the “putStrLn” portion. But the use of a “return ()” explicitly is often unnecessary and wordy in actual practice, at least with the IO monad, because functions like putStrLn (as I stated above), and putStr already give back a “IO ()”.

Here is a funny example: you could do something (meaningless) like this:

main :: IO (Int)
main = putStrLn "Hello world!" >> return 123

And the code will compile and work just as well as the preceding example. The only conceptual flaw is that since main is the only function, no other function actually uses the main function’s (return 123) computation, so it becomes absolutely meaningless.

Let’s look at something a little more complicated.

import IO

main :: IO ()
main = do   {
            ; hSetBuffering stdout NoBuffering
            ; hSetBuffering stdin NoBuffering
            ; putStr "Press a number key: "
            ; c <- getChar
            ; putStrLn ""
            ; natureOfC <- isThisADigit c
            ; if natureOfC == True
                    then putStrLn "You pressed a number key."
                    else putStrLn "You pressed something other than a number key."
            }

isThisADigit :: Char -> IO Bool
isThisADigit c = if elem c ['0'..'9']
                    then return (True)
                    else do {
                            ; putStrLn "Error: non-number character detected."
                            ; return (False)
                            }

I want you to understand the “isThisADigit” function. Here, the “IO Bool” type tells us that this function, at some point, does some IO, before resulting in a “IO Bool” value. The caller must “extract” the pure Bool value out of this “tainted” IO Bool type, with the left arrow <-. In the example above, the variable “natureOfC” captures this extracted Bool value.

The key about the IO Monad (like all other monads) is that they sort of act as a warning sign for all of the rest of your program’s pure code. Any function with a monad in its type signature is a big red flag that says, “Hey! I’m a dangerous function that results in lots of side effects/changed state! Be careful when calling me!” This is great because now we can easily tell, based on a function’s type signature alone, whether it’s pure or not. Hiding impure code in Haskell is thus impossible to do, as long as you explicitly write down your functions’ type signatures.

Here’s another example (just to illustrate the separation of pure vs. impure code in a real example):

import IO

main :: IO ()
main = do   {
            ; hSetBuffering stdout NoBuffering
            ; hSetBuffering stdin NoBuffering
            ; putStr "Press a number key: "
            ; c <- getChar
            ; putStrLn ""
            ; natureOfC <- isThisADigit c
            ; if natureOfC == True
                    then putStrLn "You pressed a number key."
                    else putStrLn "You pressed something other than a number key."
            ; putStrLn $ "You get " ++ (show $ handleBool natureOfC) ++ " points for your answer."
            }

isThisADigit :: Char -> IO Bool
isThisADigit c = if elem c ['0'..'9']
                    then return (True)
                    else do {
                            ; putStrLn "Error: non-number character detected."
                            ; return (False)
                            }

handleBool :: Bool -> Int
handleBool b = if b == True
                    then 100
                    else 0

The new handleBool function is the only pure function above, and its job is to convert a bool value into a single Int value, of either 100 or 0. The fact that the line “natureOfC <- isThisADigit c” extracts the pure value out of the impure “IO Bool” allows us to use the handleBool function with it.

It may help to write “IO (Bool)” instead of “IO Bool” if you want to get the visual image of a pure value being “wrapped” inside the IO monad.

Here’s another example (a rather contrived, un-idiomatic Haskell example), using lots of if/then/else statements just to illustrate the notion that the function return used in the context of a monad does NOT mean the same thing as “return” in C.

import IO
import qualified System.Exit as SE

main :: IO ()
main = do   {
            ; hSetBuffering stdout NoBuffering
            ; hSetBuffering stdin NoBuffering
            ; putStr "Press a lowercase character key that is between 'h' and 'o' (inclusive): "
            ; c <- getChar
            ; putStrLn ""
            ; isLower <- isThisLowercase c
            ; if isLower == True
                    then return ()
                    else SE.exitWith $ SE.ExitFailure 1
            ; isLessThanH <- isThisLessThanH c
            ; if (not isLessThanH)
                    then return ()
                    else SE.exitWith $ SE.ExitFailure 2
            ; isGreaterThanO <- isThisGreaterThanO c
            ; if (not isGreaterThanO)
                    then return ()
                    else SE.exitWith $ SE.ExitFailure 3
            ; putStrLn $ "You pressed `" ++ [c] ++ "'! Congrats!"
            }

isThisLowercase :: Char -> IO Bool
isThisLowercase c = if elem c ['a'..'z']
                        then return True
                        else do {
                                ; putStrLn $ "Error: char `" ++ [c] ++ "' not lowercase"
                                ; return False
                                }

isThisLessThanH :: Char -> IO Bool
isThisLessThanH c = if elem c ['a'..'g']
                        then do {
                                ; putStrLn "Error: char less than `h'"
                                ; return True
                                }
                        else return False

isThisGreaterThanO :: Char -> IO Bool
isThisGreaterThanO c = if elem c ['p'..'z']
                            then do {
                                    ; putStrLn "Error: char greater than `o'"
                                    ; return True
                                    }
                            else return False

I want you to focus on the series of if/then/else statements in the main function. Here, we can see that we test 3 properties for the user-given character, c: if it is lowercase, if it is less than ‘h’, and if it is greater than ‘o’. If any three of these conditions are met, we immediately exit the program with an error, using the System.Exit module. Otherwise, we just “return ()” for this stage of the overall do-notation computation. Here’s some pseudocode (with the exception of the (>>) operator, which retains its meaning) to explain:

TEST WAS PASSED, so:
return () >> carry on with the next computation that follows…

Maybe that made it a little clearer. The big idea with monadic code (and do-notation) is this: every step of the computation needs to be monadic! So in the main function above, if we pass a test, we have to sort of “glue” the remaining function together by “injecting” a “return ()” if we pass the test. Or, more accurately, we “inject” the unit value “()” into the IO monad with the “return” function. This allows us to carry on with the rest of the do-notation.

Why do we have to have monadic values at each step of a monadic operation (do-notation)? Well, it’s because do-notation is syntactic sugar for a series of computations requiring the use of the bind (>> or >>=) operators. And the bind operators’ type signatures are mandated as:

(>>=) :: m a -> (a -> m b) -> m b
(>>)  :: m a -> m b -> m b

So in the case of the IO monad (hooray Haskell typeclasses!), the bind operators mean:

(>>=) :: IO something -> (something -> IO something_else) -> IO something_else
(>>)  :: IO something -> IO something_else -> IO something_else

Notice how both bind operators requires that its left-side value (they are used as infix operators) be a monadic value, “m a” or in our example, IO something. So that’s why we are required to put in the “return ()” in the if/then/else statements in our last big example: the do-notation that we use in there requires that every step of the do-notation results in a monadic value, be it () or whatever. The return function allows us to meet this requirement: it injects a pure value into a monad:

return :: a -> m a

If the meaning of return still eludes you, you can think of it as a train conductor: a “return (140)” means “I, return, as train conductor of the Monad typeclass train, now allow you, pure value 140, to enter this train. The only way for you to get off the Monad train is with the (<-) operator, understood?”

Here is another example to illustrate this last point:

main :: IO ()
main = do   {
            ; pureNum <- boardTrain (140)
            ; putStrLn $ show pureNum
            }

boardTrain = return

So now you know what the return function is all about: injecting pure values into the Monad typeclass, so that you can use them as part of (larger) monadic operations, be it a big do-notation function or a simple monadic function that figures out a pure value, like the isThisLowercase function.

Here are some more explanations of the bind operator (using the IO monad) in plain English:

(>>=) :: IO something -> (something -> IO something_else) -> IO something_else

Plain English: Take a (IO) monadic value, and a function that takes a pure version of this value and converts it into some new monadic value, and give back this new monadic value. The (>>=) operator is thus useful if you want to “pass along” the results of one monadic computation to another monadic computation.

(>>)  :: IO something -> IO something_else -> IO something_else

Plain English: Take a (IO) monadic value, and another monadic value, and give back this latter monadic value. I.e., compute the first value, but we don’t care what the result of this first monadic operation is; just continue on with the second monadic value (IO something_else) after we’re done with the first one (IO something). The (>>) operator is thus just like (>>=), except that we do not pass along any values from one monadic operation to the next.

The reason why the (>>) and (>>=) operators are called “bind” operators might have dawned upon you now: they allow you to “chain” together multiple monadic computations together! This is why some Haskellers like to think of Monads as just another form of function composition, but for impure functions. Indeed, do-notation is just syntactic sugar for these two bind operators that allow you to easily “glue” together small, impure functions into a larger *sequence*.

There is also a “flipped” or “reverse” bind operator, (=<<), but with the arguments switched around. It’s useful if, for syntactic reasons you want to pass along a monadic value in the opposite “direction.” Usually, you use it when you want to visualize one monadic operation as an “argument” for another, to aid reading from left to right. For example:

teamWork :: IO ()
teamWork = putStrLn =<< getLine

Here, the contents of getLine act as an argument to the function putStrLn. Here’s another example:

C version:
        x = get_x_value();
        printf("%f\n", (round(abs(sqrt(x)))));

Haskell version:
teamWork :: IO ()
teamWork = putStrLn =<< round =<< abs =<< sqrt =<< get_x_value

See how the reverse bind operator allows us to mimic what C looks like? The reverse bind operator lets us write things in a more natural way, mimicking the syntax of pure function composition, such as “e . d . c . b $ a”, where the order of computation reads right-to-left. Personally, I rarely use the (=<<) operator because I use do-notation whenever I need to do some serious monadic computations — but, it's there when you need it.

Here's an example using a bunch of bind operators to give you their feel.

import IO

main :: IO ()
main     = putStr "Tell me your name: "
        >> hFlush stdout
        >> getLine
        >>= greet1
        >> putStr "Tell me your age: "
        >> hFlush stdout
        >> (greet2 =<< getLine)
        >> putStrLn "Bye!"
        where   greet1 str = putStrLn ("Hello, " ++ decorate str ++ "!")
                greet2 str = putStrLn ("You are " ++ decorate str ++ " years old!")

decorate :: String -> String
decorate xs = "-=xX" ++ xs ++ "Xx=-"

Notice how avoiding do-notation obviates the need to use the left arrow (<-) operator for getLine.

Let us revisit do-notation one more time:

import IO

main :: IO ()
main = do   {
            ; putStr "Tell me your name: "
            ; hFlush stdout
            ; name <- getLine
            ; putStrLn ("Hello, " ++ decorate name ++ "!")
            }

decorate :: String -> String
decorate xs = "-=xX" ++ xs ++ "Xx=-"

The point I want to make in this example is, believe it or not, the main function’s type signature. After everything I told you about how bind-operator-this and do-notation-that all work together to chain together sequences of monadic operations, I want you to reflect back on main‘s type signature: after all the fancy work that it does in the example above, main‘s type signature is just “IO ()”! The type signature of main is the type signature of the last monadic operation: in this case, the type signature for putStrLn.

Here’s a contrived example just for kicks, to encourage you to avoid do-notation for simple functions:

everyManForHimself :: SomeMonad (func4's return type)
everyManForHimself = func1 >> func2 >> func3 >> func4

The function above shows how the weak bind operator (>>) merely sequences independent monadic operations together, who all refuse to interact with each other. The component functions func1, func2, etc. may or may not have meaningful return types; i.e., func1 may be a “return ()” type, or a “return (Bool)” or something else — but the point is that the (>>) operator forcefully discards whatever is on the left and just moves on to the next operation.

Hopefully, this post shed some light into some non-intuitive examples on the IO monad. I hope things like IO Char, IO Int, IO Bool, IO String, and other types feel more natural now to you.

Reference: The “Gentle Introduction to Haskell, Version 98” has a good discussion of do-notation here.

UPDATE April 7, 2011: Apparently, the use of if/then (some monadic operation)/else (no monadic operation) pattern is so common that there is a shorter way. Simply import the when function from the Control.Monad module. So instead of

if (blah blah)
    then putStrLn "OK"
    else return ()

you can just do

when (blah blah) $ putStrLn "OK"

Moral of the story: if you ever feel like you are repeating yourself too much, look into the standard libraries (like Control.Monad).

UPDATE August 4, 2011: Edited discussion of the reverse bind operator.