# Random Haskell Ephiphany: Using the liftM function

I’ve been tinkering with some monadic code and it just dawned on me that the often-used liftM function is just a shortcut (less typing) for a common use scenario which arises whenever you need to use a pure function along with an impure one.

For example, consider the following pseudocode:

```randoms :: IO [Int]
```

Calling randoms generates a list of random Int values. Suppose we want just the first value from this list, so we naively write the following:

```main :: IO ()
main = do
rs <- randoms
let r = head rs
...
```

But then you realize that you will only use just a single random Int, and that the variable rs is useless. So, you remember what return in Haskell means and then write this instead:

```main :: IO ()
main = do
r <- return . head =<< randoms
...
```

But this looks a bit awkward. This is where liftM comes in:

```main :: IO ()
main = do
r <- liftM head \$ randoms
...
```

The code is now much simpler and cleaner. What’s not to like?

UPDATE July 15, 2012: I just realized that there is even a shorter solution, using the (<\$>) function from the very useful Control.Applicative module:

```main :: IO ()
main = do
r <- head <\$> randoms
...
```

# The KISS PRNG (2011 version) in C and Haskell

Did you know that Dr. George Marsaglia (1924-2011), creator of the famed Diehard battery of randomness tests, devised a super-simple PRNG algorithm, just a month prior to his passing? Called the KISS PRNG (or Super-KISS, as there have been multiple KISS versions released previously by Marsaglia), this algorithm boasts a period in excess of 10^40million (10^40,000,000) — an astoundingly large number, many orders of magnitude larger than the famed Mersenne Twister‘s period (only 2^19,937 − 1, or only about 4.3 x 10^6,001 according to gcalctool). Plus, it’s so simple codewise, and also very fast.

Here’s the C implementation (adapted from Marsaglia’s own code):

```/* AUTHOR: Shinobu (zuttobenkyou.wordpress.com) */
/* LICENSE: PUBLIC DOMAIN */
#include <stdint.h>

typedef uint64_t u64;

#define QSIZE 0x200000
#define CNG (cng = 6906969069ULL * cng + 13579)
#define XS (xs ^= (xs << 13), xs ^= (xs >> 17), xs ^= (xs << 43))
#define KISS (B64MWC() + CNG + XS)

static u64 QARY[QSIZE];
static int j;
static u64 carry;
static u64 xs;
static u64 cng;

void randk_reset(void)
{
j = QSIZE - 1;
carry = 0;
xs = 362436069362436069ULL;
cng = 123456789987654321ULL; /* use this as the seed */
}

u64 B64MWC(void)
{
u64 t, x;
j = (j + 1) & (QSIZE - 1);
x = QARY[j];
t = (x << 28) + carry;
carry = (x >> 36) - (t < x);
return (QARY[j] = t - x);
}

/* Initialize PRNG with default seed */
void randk_seed(void)
{
u64 i;
/* Seed QARY[] with CNG+XS: */
for (i = 0; i < QSIZE; i++)
QARY[i] = CNG + XS;
}

void randk_seed_manual(u64 seed)
{
cng ^= seed;
xs ^= cng;
randk_seed();
}

void randk_warmup(int rounds)
{
int i;
/* Run through several rounds to warm up the state */
for (i = 0; i < rounds; i++)
randk();
}

/* Generate a pseudorandom 64-bit unsigned integer. */
u64 randk(void)
{
return KISS;
}
```

Simple, eh? This algorithm is actually 3 PRNG’s in one: the 64-bit Multiply-With-Carry PRNG (B64MWC()), the XOR-Shift PRNG (XS), and the simple Linear Congruential PRNG (CNG). The exorbitant period comes from the fact that this algorithm relies on three different states of the three PRNGs to generate a random number.

Now, where does Haskell come into the picture? Well, I ported the code to Haskell because I wanted a simple PRNG that was of higher quality than the default System.Random RNG. Plus, if you look into the actual source of System.Random, here is an unnerving bit of code:

```stdSplit            :: StdGen -> (StdGen, StdGen)
stdSplit std@(StdGen s1 s2)
= (left, right)
where
-- no statistical foundation for this!
left    = StdGen new_s1 t2
right   = StdGen t1 new_s2

new_s1 | s1 == 2147483562 = 1
| otherwise        = s1 + 1

new_s2 | s2 == 1          = 2147483398
| otherwise        = s2 - 1

StdGen t1 t2 = snd (next std)
```

See, the RandomGen type class requires the definition of next, split, and genRange functions (see this page). The split function’s purpose is to take one PRNG state, and give two distinct PRNG states, so that you can get multiple unique PRNG’s to work with (this comes up in functional programming in real practice — I speak from experience). The thing is, the statistical robustness of the split function for the StdGen PRNG that comes with Haskell, as can be seen in the source listing, is a bit… annoying/worrying.

Well, when I saw this, I thought: “Hey, why not use KISS? It has 3 PRNGs built into one, so when implementing split, it could just change the state of one of the PRNGs, and you’d get a *completely* different PRNG!” And so that’s exactly what I did:

```-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module KISS where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))

type U64 = Word64

-- | This is the last KISS-type RNG (2011) that Dr. George Marsaglia (1924-2011)
-- released to the internet before his death. The only difference with this
-- version is that the kissMWCArraySize is 0xfff (4096), instead of 0x200000
-- (2,097,152), for purposes of speed. The period of the original was
-- approximated by Marsaglia as 10^40million, which is practically infinity for
-- most, if not all, needs for everyday programs. The reduced state size for the
-- MWC algorithm from 0x200000 to 0xfff should shorten the period, but it should
-- still be excellent for general usage; because KISS combines not only MWC but
-- also CNG (Congruential) and XSHF (XOR-shift) generators, the period should
-- still be very large.
--
-- TODO: Determine period of this KISS rng.
data KISSRNG = KISSRNG
{ kissMWCArray :: UArray Int U64
, kissMWCArraySize :: U64
, kissMWCIndex :: U64
, kissMWCCarry :: U64
, kissCNG :: U64
, kissXSHF :: U64
}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
where
round1 b = xor b (unsafeShiftL b 13)
round2 b = xor b (unsafeShiftR b 17)
round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
{ kissMWCArray = array'
, kissMWCIndex = index'
, kissMWCCarry = carry'
}
where
index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
x = kissMWCArray ! (fromIntegral index')
t = unsafeShiftL x 28 + kissMWCCarry
carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) \$ zip [0..] kissArray
, kissMWCArraySize = kissStateSize
, kissMWCIndex = kissStateSize - 1
, kissMWCCarry = 0
, kissCNG = cngWarmed
, kissXSHF = xshfWarmed
}
where
-- seed the MWC array with the Congruential and XOR-Shift RNG's
(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
where
cng' = roundCNG cng
xshf' = roundXSHF xshf
-- default Congruential RNG seed
seedCNG = 123456789987654321
-- default XOR-Shift RNG seed
seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
where
k' = roundB64MWC k
kissMWC = (kissMWCArray k') ! (fromIntegral \$ kissMWCIndex k')
cng = roundCNG \$ kissCNG k
xshf = roundXSHF \$ kissXSHF k

instance Show KISSRNG where
show KISSRNG{..} = "kissMWC: [kissMWC..]"
++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral \$ kissStateSize - 1)) ++ "\n"
++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
++ "kissCNG: " ++ show kissCNG ++ "\n"
++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
next rng = (fromIntegral n, rng')
where
(n, rng') = randKISS rng
split rng = (rng1, rng2)
where
rng1 = warmupXSHF rng
rng2 = warmupMWC rng
genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
where
warmup g' _ = snd \$ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF \$ kissXSHF rng}
```

In the implementation of split, you can clearly see that we simply warm up one of PRNGs (move on to the next state in the period) to get a new PRNG. Again, since KISS depends on all three PRNGs, simply changing the state of one of the PRNGs will give you a completely different PRNG.

Oh, the only weakness of the Haskell version is that its QSIZE is only 0xfff, not 0x200000 as in the original, for performance reasons. I certainly hope someone could improve the performance of the code and release it on Hackage (my code is hereby released into the PUBLIC DOMAIN, so do what you like with it), restoring the state size to 0x200000 as in Marsaglia’s original. Also, I’m not sure how large the period is, but judging by how the XOR-Shift PRNG has a large period on its own, it should still be very, very large with a 0xfff state size for the MWC algorithm.

I would sincerely appreciate it if someone more familiar with combinatorics/compsci could tell me what the size of the period is with a 0xfff state size for the MWC.

I was also pleasantly surprised by the very good quality of KISS. I used my code to write some random bits into a file, and used the ent program to judge the entroy of it. Here are the results:

```Entropy = 7.999829 bits per byte.

Optimum compression would reduce the size
of this 1048576 byte file by 0 percent.

Chi square distribution for 1048576 samples is 248.29, and randomly
would exceed this value 60.65 percent of the times.

Arithmetic mean value of data bytes is 127.5231 (127.5 = random).
Monte Carlo value for Pi is 3.141895835 (error 0.01 percent).
Serial correlation coefficient is 0.001437 (totally uncorrelated = 0.0).
```

The results show that the KISS RNG has excellent quality random numbers. These figures make it as good (randomness-wise) as, e.g., the one based on AES encryption (AES in counter mode), which has also been analyzed with ent, as stated on the github page:

Using ent, a randomness property maker on one 1Mb sample.

cprng-AES:

Entropy = 7.999837 bits per byte.
Optimum compression would reduce the size of this 1048576 byte file by 0 percent.
Chi square distribution for 1048576 samples is 237.02.
Arithmetic mean value of data bytes is 127.3422 (127.5 = random).
Monte Carlo value for Pi is 3.143589568 (error 0.06 percent).

The rather ugly code I used to generate this file (and believe me, it took forever to generate a 1MiB file because the code is horribly unoptimized…) is below:

```-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module Main where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))
import qualified Data.ByteString as BS

type U64 = Word64

data KISSRNG = KISSRNG
{ kissMWCArray :: UArray Int U64
, kissMWCArraySize :: U64
, kissMWCIndex :: U64
, kissMWCCarry :: U64
, kissCNG :: U64
, kissXSHF :: U64
}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
where
round1 b = xor b (unsafeShiftL b 13)
round2 b = xor b (unsafeShiftR b 17)
round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
{ kissMWCArray = array'
, kissMWCIndex = index'
, kissMWCCarry = carry'
}
where
index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
x = kissMWCArray ! (fromIntegral index')
t = unsafeShiftL x 28 + kissMWCCarry
carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) \$ zip [0..] kissArray
, kissMWCArraySize = kissStateSize
, kissMWCIndex = kissStateSize - 1
, kissMWCCarry = 0
, kissCNG = xor cngWarmed seed
, kissXSHF = xor xshfWarmed seed
}
where
-- seed the MWC array with the Congruential and XOR-Shift RNG's
(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
where
cng' = roundCNG cng
xshf' = roundXSHF xshf
-- default Congruential RNG seed
seedCNG = 123456789987654321
-- default XOR-Shift RNG seed
seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
where
k' = roundB64MWC k
kissMWC = (kissMWCArray k') ! (fromIntegral \$ kissMWCIndex k')
cng = roundCNG \$ kissCNG k
xshf = roundXSHF \$ kissXSHF k

instance Show KISSRNG where
show KISSRNG{..} = "kissMWC: [kissMWC..]"
++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral \$ kissStateSize - 1)) ++ "\n"
++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
++ "kissCNG: " ++ show kissCNG ++ "\n"
++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
next rng = (fromIntegral n, rng')
where
(n, rng') = randKISS rng
split rng = (rng1, rng2)
where
rng1 = warmupXSHF rng
rng2 = warmupMWC rng
genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
where
warmup g' _ = snd \$ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF \$ kissXSHF rng}

main :: IO ()
main = do
let
rng = makeKISSRNG 0
(bytes1MiB, _) = genBytesKISS 0x100000 rng
BS.writeFile "data" bytes1MiB

genBytesKISS :: U64 -> KISSRNG -> (BS.ByteString, KISSRNG)
genBytesKISS len kissrng = foldl' step (BS.empty, kissrng) [1..(div len 8)] -- divide by 8, b/c e.g., to generate 8 bytes, we only need 1 U64
where
step (bs, rng) _ = (foldl' BS.snoc bs \$ octets u64, rng')
where
(u64, rng') = randKISS rng

smallerChunks :: [U64] -> [Word8]
smallerChunks = concatMap octets

-- | Get a number and split it up into 8 8-bit parts (64 bits total).
octets :: (Bits a, Integral a) => a -> [Word8]
octets w = map (\n -> fromIntegral \$ shiftR w n) . reverse \$ take 8 [0,8..]
```

Here are the C and Haskell standalone versions that prove that the Haskell port behaves in the same way as the C version, given the right starting seed values and state size (both 0xfff for the MWC PRNG):

C standalone version (compile with gcc -o ckiss kiss.c):

```/* AUTHOR: Shinobu (zuttobenkyou.wordpress.com) */
/* LICENSE: PUBLIC DOMAIN */
#include <stdio.h>
#include <inttypes.h>
#include <stdint.h>

typedef uint64_t u64;

#define QSIZE 0xfff
#define CNG (cng = 6906969069ULL * cng + 13579)
#define XS (xs ^= (xs << 13), xs ^= (xs >> 17), xs ^= (xs << 43))
#define KISS (B64MWC() + CNG + XS)

static u64 QARY[QSIZE];
static int j;
static u64 carry;
static u64 xs;
static u64 cng;

u64 B64MWC(void)
{
u64 t, x;
j = (j + 1) & (QSIZE - 1);
x = QARY[j];
t = (x << 28) + carry;
carry = (x >> 36) - (t < x);
return (QARY[j] = t - x);
}

/* Initialize PRNG with default seed */
void randk_seed(void)
{
u64 i;
j = QSIZE - 1;
carry = 0;
xs = 362436069362436069ULL;
cng = 123456789987654321ULL;
/* Seed QARY[] with CNG+XS: */
for (i = 0; i < QSIZE; i++)
QARY[i] = CNG + XS;
}

/* Generate a pseudorandom 64-bit unsigned integer. */
u64 randk(void)
{
return KISS;
}

int main(void)
{
randk_seed();
printf("randk_seed called!\n");
printf("KISS idx: %"PRIu64"\n", j);
printf("qary[idx] is: %"PRIu64"\n", QARY[j]);
printf("qary[0] is: %"PRIu64"\n", QARY[0]);
printf("qary[QSIZE - 1] is: %"PRIu64"\n", QARY[QSIZE - 1]);
printf("carry: %"PRIu64"\n", carry);
printf("cng: %"PRIu64"\n", cng);
printf("xs: %"PRIu64"\n", xs);
u64 x = KISS;
printf("\nKISS called! KISS num is: %"PRIu64"\n", x);
printf("\nKISS idx: %"PRIu64"\n", j);
printf("qary[idx] is: %"PRIu64"\n", QARY[j]);
printf("qary[0] is: %"PRIu64"\n", QARY[0]);
printf("qary[QSIZE - 1] is: %"PRIu64"\n", QARY[QSIZE - 1]);
printf("carry: %"PRIu64"\n", carry);
printf("cng: %"PRIu64"\n", cng);
printf("xs: %"PRIu64"\n", xs);

printf("x + 18334599312639636657 is: %"PRIu64"\n", x + 18334599312639636657ULL);
}
```

Haskell standalone version (run with runhaskell kiss.hs):

```-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module KISS where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))

type U64 = Word64

data KISSRNG = KISSRNG
{ kissMWCArray :: UArray Int U64
, kissMWCArraySize :: U64
, kissMWCIndex :: U64
, kissMWCCarry :: U64
, kissCNG :: U64
, kissXSHF :: U64
}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
where
round1 b = xor b (unsafeShiftL b 13)
round2 b = xor b (unsafeShiftR b 17)
round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
{ kissMWCArray = array'
, kissMWCIndex = index'
, kissMWCCarry = carry'
}
where
index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
x = kissMWCArray ! (fromIntegral index')
t = unsafeShiftL x 28 + kissMWCCarry
carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) \$ zip [0..] kissArray
, kissMWCArraySize = kissStateSize
, kissMWCIndex = kissStateSize - 1
, kissMWCCarry = 0
, kissCNG = cngWarmed
, kissXSHF = xshfWarmed
}
where
-- seed the MWC array with the Congruential and XOR-Shift RNG's
(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
where
cng' = roundCNG cng
xshf' = roundXSHF xshf
-- default Congruential RNG seed
seedCNG = 123456789987654321
-- default XOR-Shift RNG seed
seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
where
k' = roundB64MWC k
kissMWC = (kissMWCArray k') ! (fromIntegral \$ kissMWCIndex k')
cng = roundCNG \$ kissCNG k
xshf = roundXSHF \$ kissXSHF k

instance Show KISSRNG where
show KISSRNG{..} = "kissMWC: [kissMWC..]"
++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral \$ kissStateSize - 1)) ++ "\n"
++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
++ "kissCNG: " ++ show kissCNG ++ "\n"
++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
next rng = (fromIntegral n, rng')
where
(n, rng') = randKISS rng
split rng = (rng1, rng2)
where
rng1 = warmupXSHF rng
rng2 = warmupMWC rng
genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
where
warmup g' _ = snd \$ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF \$ kissXSHF rng}

main :: IO ()
main = do
let rng = makeKISSRNG 0
putStrLn \$ show rng
```

EDIT May 3, 2012: Sorry about the somewhat redundant-looking code listings, but I’m too lazy/busy to clean them up.
EDIT May 4, 2012: Alternate period number for the Mersenne Twister, for easier comparison of period size with KISS.

# Shinobu’s Haskell Coding Style (SHCS)

I’ve been coding in Haskell for a couple years. Now, Haskell does not really have an established coding style, like C (e.g., the excellent Linux Kernel Coding Style (LKCS)). This is partly due to the extremely flexible nature of the indentation syntax alone — Haskell enjoys the unique honor of having significant whitespace, but at the same time allowing the use of optional curly braces and semicolons for “imperative-styled” do-notation (do-notation is quite common in Haskell). I present to you my own coding style, aptly named “Shinobu’s Haskell Coding Style” — it stresses, above all, consistency and predictable indentation. Since I love the LKCS so much, I decided to use tabs — how you set the actual length of the tab is up to you and your editor-of-choice, but I personally use 4-space-wide tabs.

A picture is worth a thousand words, so here is a “picture” of SHCS in action, with ample pseudocode for your reading pleasure:

```{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE OverloadedStrings #-}
module Foo.Bar where

import Some.Standard.System.Library
import Some.Standard.System.Library2 (func)

import Foo
import Quux

-- | A haddock-compatible comment.
funcWithLongTypeSignature
:: Eq a
=> Int
-> Int
-> Int
-> Int
-> IO Int
funcWithLongTypeSignature a b c d = do
let a' = foo a
e <- blah a' b
let
c' = foo c
d' = foo d
f <- bleh c' d'
-- A comment about (someTest f). Blah blah blah blah blah blah blah blah
-- blah blah blah.
when (someTest f)
. func1
. func2
\$ func3 arg
when (someTest2 f)
. func1
. func2
. func3
. func4
\$ arg
return \$ e + f
where
e a b = do
...
return x
f c d = case convert c of
Z n -> return n
_ -> return y
where
-- Break up 80+ character lines into parts.
convert c = hoolaaloolaa'
. hoolaaloolaa''
. hoolaaloolaa'''
\$ hoolaaloolaa'''' c
-- This version is also acceptable, if it increases legibility:
--
-- convert c = hoolaaloolaa'
--	. hoolaaloolaa''
--	. hoolaaloolaa'''
--	. hoolaaloolaa''''
--  \$ c

someLongList :: [(Int, String)]
someLongList =
[ (1, "a")
, (2, "b")
, (3, "c")
, (4, "d")
, (5, "e")
, (6, "f")
]

listComprehension :: [Int]
listComprehension =
[ (a, b)
| a <- [1..]
, b <- [2..]
, a + b < 100
]

baz :: Int -> Int -> Char
baz a b c
| a == 0 = 'o'
| b == 100 = if True
then if True
then 'p'
else 's'
else 'q'
| c == 22 = 'z'
| otherwise = f 'k'
where
f c = blah \$ bleh c

longStringConcatenation :: String
longStringConcatenation = "The quick "
++ "fox "
++ "jumps "
++ "over "
++ "the "
++ "lazy "
++ "evaluation."

data Geometry = Geometry
{ geoGeo :: Int
, geoGeoGeo :: Double
} deriving (Eq)

instance Show Geometry where
show Geometry{..} = show geoGeo
++ show geoGeoGeo
++ "!"

recordConstruction :: Geometry
recordConstruction = Geometry
{ geoGeo = 1
, geoGeoGeo = 2.0
}

data SomeEnum
= SEA
| SEB
| SEC
| SED
| SEE
deriving (Eq, Enum, Ord, Show)

quaff :: SomeEnum -> Int
quaff s = case s of
SEA -> 1
SEB -> 2
SEC -> 3
SED -> 4
SEE -> 5
```

WordPress renders the indentation tabs as spaces — sorry about that. Anyway, my greatest “triumph” was when I discovered that you could write the “where” clause at the same level of indentation as all the functions below it (e.g., functionWithLongTypeSignature). This reduced a lot of my code’s indentation, because I prefer to use where clauses wherever possible.

Here are the rules:

• Place LANGUAGE extensions at the beginning of the file, with one extension per line.
• Import standard, “system” modules first on its own block. Add all of your own modules on their own block below it. List all modules in alphabetical order.
• 80 character limit. Try very hard to keep lines within 80 characters. This rule is especially pertinent to type signatures. Speaking of which…
• Write explicit type signatures for all top-level functions. You may also want to write type signatures for functions underneath “where” clauses (but I think you have to use the ScopedTypeVariables extension to do this).
• Use the “leading comma” rule when writing long lists or data structure definitions (e.g., the “Geometry” example above).
• Indent long string concatenations (++).
• Use GHC with at least the -Wall and -Werror flags.
• When writing a case expression for a type that is like a C enum (e.g., SomeEnum above), explicitly write all the possible enumeration symbols for that type, instead of writing (n – 1) of them and using the “_” catch-all syntax at the end.
• Use spaces between arithmetic components (e.g., “(n + 1)” instead of “(n+1)”.
• Avoid use of curly braces/semicolons for do-notation blocks.
• Avoid use of let statements, and instead use where clauses. The use of a where clause forces you to write in a top-down manner, with the “overall picture” first, followed by smaller component functions. I’d rather be faced with a one-liner function followed by 100 helper functions under a where clause, rather than first seeing the 100 helper functions in a gigantic let expression, followed by the one-liner.
• Try to avoid nesting where clauses as much as possible.
• When confronted with a long line, break it up into smaller parts (by function, if you’re composing many of them, or by argument, if it’s just one function with many arguments) and indent the “tail” of these parts, each on their own line (see “convert c” in the example).
• Prefer to use the dollar symbol operator (\$) instead of using parentheses. E.g., (foo \$ bar x) instead of (foo (bar x)).
• Instead of (foo (bar (baz (quux x)))), write (foo . bar . baz \$ quux x). I believe this is an “unstated” style rule among Haskellers, as it is. And so I formally state it here.
• Catch-all rule: if the code is short enough, it is optional whether you write it in one line or multiple (indented) lines (e.g., list comprehensions, lists, etc.)

These are just a handful of rules, but they go a long way to make things look consistent. And, because the indentation is predictable, it makes the code easier to read.

# Intro to QuickCheck and Hpc

And you thought my Haskell honeymoon was over.

So the other day, I finally got around to using the legendary QuickCheck testing library for a semi-serious project of mine. And I must say that I am very impressed with QuickCheck — it caught two subtle (but serious) errors in my core functions — in day one! I’m going to explain some basic concepts to help you get started with QuickCheck immediately; afterwards I’ll throw in a brief discussion about the Haskell Program Coverage (“Hpc”) tool, too, since it goes well with QuickCheck.

## QuickCheck

The point of QuickCheck is to generate tons of random, junk values (the nastier-looking, the better) to feed into your functions. Well, they’re not really junk values, from a type system perspective, but the point is to generate lots of edge-case values that we forget to consider when writing the code. QuickCheck supports testing of pure functions as well as monadic ones (ST monad), but I’ll just focus on the pure functions because that’s what I have experience with.

So the next question is, how does QuickCheck generate random values? Well, if your function uses any of the basic types like Int, [Integer], Double, Word64, etc., QuickCheck knows how to generate the random values already. But if you use a custom type, or a unique combination of ordinary types (e.g., (Maybe Int, Maybe Int)), you have to instruct QuickCheck how to generate it. You do this by writing an instance for QuickCheck’s Arbitrary typeclass.

So let’s say my custom type is (Maybe Int, Maybe Int), and that it has a special requirement: if it is (Just a, Just b), then b must be greater than a. No problem.

```newtype MPair = MPair (Maybe Int, Maybe Int)
deriving (Show)

instance Arbitrary MPair where
arbitrary = do
-- First, generate a random Int.
a <- arbitrary :: Gen Int
-- Now generate another random Int, but such that it is greater than *a*.
b <- suchThat (arbitrary :: Gen Int) (>a)
-- Now choose between either (Just a), or Nothing.
a' <- elements [Just a, Nothing]
-- Repeat the same coin flip for *b*.
b' <- elements [Just b, Nothing]
-- Return the random result.
return \$ MPair (a', b')
```

There is a method to the madness! There are other useful combinators besides suchThat and elements, of course.

So now you’ve told QuickCheck how to generate the random values to stress-test your functions. The next step is to define the stress-tests. In the QuickCheck world, such testing functions are used to test very specific properties of your functions that must hold true no matter what you throw at it. By convention, these functions are named “prop_[function name]”.

So here’s an example, prop_isTouching, which, appropriately enough, tests the isTouching function..

```prop_isTouching :: MPair -> MPair -> Bool
prop_isTouching x y =
isTouching x' y' == isTouching y' x'
where
x' = fromMPair x
y' = fromMPair y

isTouching :: (Ord a) => (Maybe a, Maybe a) -> (Maybe a, Maybe a) -> Bool
isTouching p@(pStart, pEnd) q@(qStart, qEnd)
-- if no pair has both of its values defined, then just return False;
-- technically this notion is incorrect, but the greater codebase behind this
-- interprets it this way, so it's OK
| not (isBoth p) || not (isBoth q) = False
-- if one or both pairs are empty, then return False
| isBlank p || isBlank q = False
| isLeft q = inP qStart'
| isRight q = inP qEnd'
| isLeft p = inQ pStart'
| isRight p = inQ pEnd'
| otherwise = inP qStart'
|| inP qEnd'
|| inQ pStart'
|| inQ pEnd'
where
pStart' = fromJust pStart
pEnd' = fromJust pEnd
qStart' = fromJust qStart
qEnd' = fromJust qEnd
inP = isInside (pStart', pEnd')
inQ = isInside (qStart', qEnd')

isLeft
, isRight
, isBoth
, isBlank :: (Maybe a, Maybe a) -> Bool
isLeft (a, b) = isJust a && isNothing b
isRight (a, b) = isNothing a && isJust b
isBoth (a, b) = isJust a && isJust b
isBlank (a, b) = isNothing a && isNothing b

fromMPair :: MPair -> (Maybe Int, Maybe Int)
fromMPair (MPair a) = a

isInside :: (Ord a) => (a, a) -> a -> Bool
isInside (a, b) c = a <= c && c <= b
```

You might be wondering, “Hey, I thought you were going to test a function that takes (Maybe Int, Maybe Int), not (Maybe a, Maybe a)!” Well, believe me, it pays off a lot to write higher-order functions like this that work on multiple types. The fact that we can test it using Maybe Ints (a very simple type) is just one benefit. Notice how I’ve made sure to restrict isTouching‘s arguments to the Ord typeclass, since we expect the right hand value in the pair to be greater than the one on the left (if it exists). The fancy Arbitrary instance up above was not in vain.

Anyway, the isTouching function merely checks to see if one given pair “falls in” or “touches” the other pair of values. It’s a pretty mundane function, but such functions often form the backbone of the rest of your code, so it’s really important to get these 100% right. The various helper functions like isLeft, isRight, fromMPair, etc. may seem annoying, as if they get in the way of the example to test isTouching itself. But think about it: all of these auxiliary functions will, by virtue of their necessity, be tested by calling prop_isTouching! And if prop_isTouching keeps failing (and your custom type or function is too complex for a 10-minute debugging session), you can always add more prop_ functions to test these auxiliary functions in isolation. Haskell embraces small, cute functions, and so should you!

The prop_isTouching function itself is straightforward enough: it tests the “commutative” property of isTouching — that the order of the arguments does not matter.

So far so good. Now we just need to run prop_isTouching hundreds (or thousands) of times to see if it holds. QuickCheck defines some basic test running functions, for use in the IO monad. The simplest one is aptly named quickCheck. So, we can run the function above like so:

```import Test.QuickCheck

import [Your testing module, where prop_isTouching resides]

main :: IO ()
main = quickCheck prop_isTouching
```

This will run prop_isTouching 100 times with the default testing settings. But default settings are usually not desirable (for me, I only caught errors when I “upped” the number of runs to 1000, among other things). Oh, and quickCheck will not tell you the name of the function it is testing. So here is a more useful version:

```{-# LANGUAGE RecordWildCards #-}
import System.Exit
import Test.QuickCheck

import [Your testing module, where prop_isTouching resides]

-- Rigorous test arguments.
rigorous :: Args
rigorous = Args
{ replay = Nothing
, maxSuccess = 1000 -- tests to run
, maxDiscard = 1000 -- the number of tests that are thrown out and ignored b/c of "==>" conditions, before "giving up" and failing due to too many discarded tests
, maxSize = 1000 -- if a prop_ function uses a list ([]) type, maxSize is the max length of the randomly generated list
, chatty = True
}

-- Quick test arguments.
quick :: Args
quick = Args
{ replay = Nothing
, maxSuccess = 100
, maxDiscard = 100
, maxSize = 100
, chatty = True
}

runTests :: [String] -> IO ()
runTests as = case as of
[] -> runTests' quick
a -> case head a of
"1" -> runTests' quick
"2" -> runTests' rigorous
_ -> runTests' quick
where
runTests' :: Args -> IO ()
runTests' testArgs = do
-- if all of your prop_ functions are of the same type, you can put
-- them in a list and use mapM_ instead
f prop_isTouching "prop_isTouching"
f prop_someOtherFunc1 "someOtherFunc1"
f prop_someOtherFunc2 "someOtherFunc2"
f prop_someOtherFunc3 "someOtherFunc3"
f prop_someOtherFunc4 "someOtherFunc4"
where
f prop str = do
putStrLn str
quitOnFail =<< quickCheckWithResult testArgs prop
quitOnFail r = case r of
-- pattern match with just two dots with RecordWildCards because I'm lazy
Success{..} -> return ()
_ -> exitFailure

main :: IO ()
main = getArgs >>= runTests
```

If you compile the above as “test”, then running “./test 2” will use the “rigorous” test settings. The key difference is that instead of quickCheck, we use quickCheckWithResult. With it, we can provide our choice of test settings (the Args type), and also get some feedback on what the test results were. For simplicity’s sake, we only check if the test was a complete success; if it’s anything else, we abort immediately.

Here’s a quick note about the term “shrink” that you might encounter: if QuickCheck spots a failure, it will first try to shrink the size of the random input repeatedly while maintaining the “failure” result (and QuickCheck will tell you about how many shrinks were performed). This is to help you work with reasonably small values (esp. useful if you have list arguments with hundreds of items each, like in prop_foo above).)

The careful reader would have wondered what the “==>” was about in the comments. Well, the “==>” function is called the “implication” function, and is used to throw out invalid values before running the prop_ function. It’s another way to customize the random value, sort of like how we defined a custom Arbitrary instance up above for the MPair type synonym. For example,

```prop_foo :: [Int] -> [Int] -> Prop
prop_foo xs ys =
not (null xs) ==> -- xs must not be null
(length ys > 3) ==> -- ys must have at least 4 elements
foo xs ys == foo ys xs
```

and it only gets to the “foo xs ys == bar xs ys” part if the two statements above are true. The only difference is that we have to end up with a Prop type instead of Bool, as was the case in prop_isTouching. No other part of your testing code needs to change. The advantage in using (==>) is its ease of use — you can trivially write any rare edge-case condition that you know does not conform to the spec, without bothering to write an Arbitrary instance. However, the disadvantage is that QuickCheck will waste time generating invalid inputs before it gets to the test the function in question. Because of this, you should first try to create an instance of Arbitrary before going with (==>).

Here is a version without using (==>) just for illustrative purposes:

```import Data.List
import Test.QuickCheck

main = verboseCheck prop_foo

newtype Xs = Xs [Int]
deriving (Show)
newtype Ys = Ys [Int]
deriving (Show)

prop_foo :: Xs -> Ys -> Bool
prop_foo (Xs xs) (Ys ys) =
foo xs ys == foo ys xs

foo :: [Int] -> [Int] -> [Int]
foo xs ys = sort \$ xs ++ ys

instance Arbitrary Xs where
arbitrary = do
-- xs must not be null
xs <- suchThat (arbitrary :: Gen [Int]) (not . null)
return \$ Xs xs

instance Arbitrary Ys where
arbitrary = do
-- ys must have at least 4 elements
ys <- suchThat (arbitrary :: Gen [Int]) ((>3) . length)
return \$ Ys ys
```

That’s it! You now know enough to get started with QuickCheck today. No excuses! Make it a New Year’s resolution for 2012, if you have to!

QuickCheck has taught me to be skeptical of the “if it compiles, it is OK” Haskell attitude — it has really opened my eyes. I now want QuickCheck as a bare minimum testing setup for all of my Haskell code. No tests = poor design (Medieval II: Total War, I’m looking at you…).

QuickCheck can even be used to test C code with the Foreign Function Interface (FFI), so I’m forever tempted to go back to my other C projects and test them with QuickCheck as well (can you imagine the millions of bugs that lie hidden in C code everywhere — that could be exposed with just a little bit of Haskell knowledge and QuickCheck by the everyday developer? Suddenly, I am overcome with optimism… yet I fear that it is really pessimism underneath… hmm.)

## Hpc

I said I would mention Hpc at the beginning. So, what does it do?

Simply put, Hpc tells you which codepaths were used (or not used) after running a binary executable. This is a perfect match for the binary you used for running QuickCheck tests, because during the course of its run QuickCheck will have fed tens of thousands of random values into your functions and have visited most, if not all, parts of your code. Thankfully, Hpc is included with GHC, so you already have it! Just compile your binary with the -fhpc flag. Here is the process:

1. Compile your program with -fhpc. (ghc -fhpc –make foo.hs)
2. Run your program. (./foo)
3. Your program should have generated a *.tix metadata file, as well as a .hpc folder with *.mix files for hpc to analyze.
4. Run hpc report foo for a quick plaintext report on your codepaths.
5. Run hpc markup foo to generate prettified HTML files for a more comprehensive overview.
6. Rinse and repeat. Remove the old *.tix files if you don’t want combined results.

One big caveat: make sure to delete any existing object files (*.o, *.hi) that you had earlier which were produced without the -fhpc flag! Otherwise, those portions of code will not generate hpc metadata! Alternatively, instead of deleting every *.o or *.hi file you have down your project’s folder tree, you could also pass the -fforce-recompile flag to ghc (this is the new name for the deprecated -no-recomp flag, which was used (sans explanation!) in Real World Haskell, Chapter 11).

Go forth, fellow Haskell newbies — embrace testing and code coverage analysis today!

# Simple Password Generation with Haskell

So, I’ve been using a custom password generator for a while. It’s great because it has some useful settings like “only generate alphanumeric passwords” or “only generate letters” to deal with stupid, legacy-code websites that refuse to modernize their password handling code. I use this generator to create 50-character passwords, because hey, if it’s randomly generated, then you might as well generate really long passwords! I recently upgraded the pseudorandom number generator (PRNG) to use a cryptographically secure PRNG (CPRNG), just for fun.

Anyway, here is the program:

```-- LICENSE: PUBLIC DOMAIN
module Main where

import Control.Monad.State
import Crypto.Random.AESCtr
import Data.Binary (decode)
import qualified Data.ByteString.Lazy as B
import Data.List (nub)
import IO
import System (getArgs)
import System.IO (hSetEcho)

keysChar, keysNum, keysPunc, keysCharNum, keysAll, keysHex :: String
keysChar = ['a'..'z'] ++ ['A'..'Z']
keysHex = ['a'..'f']
keysNum = ['0'..'9']
keysPunc = "`~!@#\$%^&*()-_=+[{]}\\|;:'\",<.>/? "
keysCharNum = keysChar ++ keysNum
keysAll = keysChar ++ keysNum ++ keysPunc

giveKey ::  String -> Char -> Int -> Char
giveKey keysCustom c n = extractChar \$ case c of
'i'  -> (keysNum ++ keysHex)
'j'  -> keysNum
'k'  -> keysChar
'l'  -> keysCharNum
';'  -> keysPunc
'h'  -> (keysCharNum ++ keysCustom)
'\n' -> ['\n']
_    -> keysAll
where
extractChar xs = xs!!mod n (length xs)

showRandomKey :: String -> StateT AESRNG IO ()
showRandomKey keysCustom = handleKey =<< liftIO getChar
where
handleKey key = case key of
'\n' -> liftIO (putChar '\n') >> showRandomKey keysCustom
'q' -> (liftIO \$ putStrLn "\nBye!") >> return ()
_ -> mapM_ f [0..(49)::Int] >> (liftIO \$ putStrLn []) >> showRandomKey keysCustom
where
f _ = liftIO
. putChar
. giveKey keysCustom key
. (\n -> mod n (length (keysAll ++ keysCustom) - 1))
=<< aesRandomInt

aesRandomInt :: StateT AESRNG IO Int
aesRandomInt = do
aesState <- get
let (bs, aesState') = genRandomBytes aesState 16
put aesState'
return (decode \$ B.fromChunks [bs])

main :: IO ()
main = do
hSetBuffering stdin NoBuffering -- disable buffering from STDIN
hSetBuffering stdout NoBuffering -- disable buffering from STDOUT
hSetEcho stdin False -- disable terminal echo
as <- getArgs
let as' = filter (\c -> elem c keysAll) . nub \$ unwords as
mapM_ putStrLn
[ []
, "poke: 'q'     quit"
, "      'j'     number"
, "      'k'     letter"
, "      'l'     alphanumeric"
, "      ';'     punctuation"
, "      'h'     alphanumeric" ++ (if null as' then [] else " + " ++ as')
, "      'i'     hexadecimal"
, "      'ENTER' newline"
, "      else    any"
, []
]
aesState <- makeSystem -- gather entropy from the system to use as the initial seed
_ <- runStateT (showRandomKey as') aesState -- enter loop
return ()
```

Yes, the CPRNG used is the Advanced Encryption Standard (AES) algorithm in counter mode. It sure beats using the default System.Random module in terms of… coolness (for the purposes of this application, the use of a CPRNG over a regular PRNG really gives no benefit). Anyway, here is a sample run:

```\$ poke \!@#\$%
poke: 'q'     quit
'j'     number
'k'     letter
'l'     alphanumeric
';'     punctuation
'h'     alphanumeric + !@#\$%
'i'     hexadecimal
'ENTER' newline
else    any

%g@c@geisBaqK81ihYkEC5NyUrWXU2ndCKr3wHkklpjvCWF9I0
zaJ,!z{db.|~vR7,MvOnPU-5v7N;cCy'],bl/d;^s[hI}RM?j>
KfdJKVygPSOufgllMAZlEaLWHSHpDrHIcgmryETcEsx5uUWlQb
49388c0958e9c51f2f4dc06a8097eb8169f715b4fcfb3ca555
```

I named it “poke” because I couldn’t think of any other short, UNIX-y name.

The program takes a string as input and treats it as a special class of characters to consider when generating a password with the ‘h’ key. This way, you can fit into the requirements of legacy website code that say something like, “You may use alphanumeric characters and also ‘!’ ‘@’ ‘#’ ‘\$’ and ‘%’ symbols”.

The output above is from pressing ‘h’, then SPACE, then ‘l’ and finally ‘i’. You could easily extend it to take more fine-tuned options, such as a shorter password length instead of the default 50. (Hint: use this post.)

The interesting side effect of using 50-character long, randomly generated passwords is that I myself do not know these passwords! The only thing I remember is the single password used for my GnuPG private key, used to decrypt the master password file.

In case you are curious, my complete setup regarding passwords is as follows: I store all of my passwords in plaintext in a “master” file, then encrypt it with GnuPG. I use a git repo to track this encrypted file, so that I can sanely remove/delete old passwords without worrying about how to get it back if I need it. Once in a while, I use a simple shell command, gpg2 -d mypasswords.gpg | less, to view the passwords for entry into some website (hooray for copy and paste!). If I need to update/add/delete passwords, I just decrypt the master file, then edit it, and re-encrypt it and commit the change into git (btw, I decrypt it into a RAM partition to avoid leaving any traces of the plaintext file).

The GnuPG private key used to encrypt the master file is itself encrypted with CAST5 (again using GnuPG) and tracked in git. The CAST5 encryption (with the “-c” flag) is a way to encrypt a file with symmetric encryption — i.e., you supply the same password for encryption and decryption, and plus you don’t need a GnuPG key to do it! (You wouldn’t want to encrypt it using another GnuPG key, because then you’d need to encrypt the key for that GnuPG key as well — recursion!) I constantly sync the git repo holding these important, encrypted files across all of my personal machines, thumb drives, etc., for reliability and convenience.

I could learn to use KeyPass or some other software to manage my passwords, but, I’m too lazy. I’m also too paranoid to trust anyone other than myself to handle my passwords. I also take care not to store anything *too* valuable into the passwords file, or anything I type into any file anywhere, just to play it safe.

EDIT: Hey, this is my 100th post! Congratulations to me!

# Parsec Example Revisited (Again): Parsing Lazy ByteStrings

So in this post, I said that I would parse ByteStrings next time just to show how easy it is. Well, I’ve done just that. Since most of the code is identical, I’m going to post a diff after the full code.

```-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

import qualified Data.ByteString.Lazy as BL
import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
import System.Exit
import System.Process
import Text.Parsec.Char hiding (upper)
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String
import qualified Text.Parsec.ByteString.Lazy as PB
import qualified Text.Parsec.Token as PT
import Text.Parsec.Language (emptyDef)
import Control.Monad.Identity

data Opts = Opts
{ all_devices :: Bool
, unmount :: Bool
, unmount_all :: Bool
, discover :: Bool
, no_color :: Bool
} deriving (Data, Typeable, Show, Eq)

progOpts :: Opts
progOpts = Opts
{ all_devices = def &= help "mount all USB devices"
, unmount = def &= help "choose a USB device to unmount"
, unmount_all = def &= name "U" &= help "unmount all USB devices"
, discover = def &= help "list all mounted/unmounted USB devices"
, no_color = def &= help "disable colors"
}
&= details
[ "Notes:"
, ""
, "The default behavior without any options is to try to mount a USB device."
++ " Here, `device' means a device under the /dev directory, and in our context, is actually a file system partition."
++ " Many USB drives have only a single partition, in which case the term `device' means both the USB drive and the single partition it has."
, ""
, "Also, allowing the \$USER to execute the mount and umount commands with sudo privileges (sudo visudo) will make things less clunky."
]

getOpts :: IO Opts
getOpts = cmdArgs \$ progOpts
&= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
&= program _PROGRAM_NAME
&= help _PROGRAM_DESC
&= helpArg [explicit, name "help", name "h"]
&= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]

_PROGRAM_NAME, _PROGRAM_VERSION, _PROGRAM_INFO, _PROGRAM_DESC, _COPYRIGHT :: String
_PROGRAM_NAME = "usbmnt"
_PROGRAM_VERSION = "0.1.0"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_DESC = "mount/unmount USB device(s)"
_COPYRIGHT = "(C) Linus Arver 2011"

data BlockDevice = BlockDevice
{ shortname :: String
, uuid :: UUID
, fsys :: String
, mountPoint :: MountPoint
} deriving (Eq)

data MountPoint
= MPath { path :: FilePath }
| Swap
| Unmounted
| UnknownBlkidVal
deriving (Eq)

instance Show BlockDevice where
show BlockDevice{..} = unwords
[ shortname
, fsys
, uuid
, show mountPoint
]

instance Show MountPoint where
show (MPath path) = path
show Swap = "Swap"
show Unmounted = "Unmounted"
show UnknownBlkidVal = "UnknownBlkidVal"

blockdeviceDefault :: BlockDevice
blockdeviceDefault = BlockDevice
{ shortname = ""
, uuid = ""
, fsys = ""
, mountPoint = MPath {path = ""}
}

data Config = Config
{ fsyss :: [(String, String)]
, uuids :: [(UUID, String)]
} deriving (Eq, Show)

_ALPHANUM :: String
_ALPHANUM = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']

data Color
= Red
| Green
| Yellow
| Blue
| CNone
deriving (Show, Eq)

colorize :: Color -> String -> String
colorize c s = case c of
Blue -> "\x1b[1;34m" ++ s ++ "\x1b[0m"
Green -> "\x1b[1;32m" ++ s ++ "\x1b[0m"
Red -> "\x1b[1;31m" ++ s ++ "\x1b[0m"
Yellow -> "\x1b[1;33m" ++ s ++ "\x1b[0m"
_ -> s

main :: IO ()
main = do
hSetBuffering stdout NoBuffering
hSetBuffering stderr NoBuffering
opts <- getOpts
homeDir <- getEnv "HOME"
errNo <- argsCheck opts homeDir
when (errNo > 0) \$ exitWith \$ ExitFailure errNo
(devs, takenPaths) <- getDevices opts
let configLoc = homeDir ++ "/.usbmnt"
configSrc <- BL.readFile configLoc
(confErrNo, config) <- parseConfig configSrc configLoc
when (confErrNo > 0) \$ exitWith \$ ExitFailure confErrNo
let mountablePaths = filter (\p -> not \$ elem p takenPaths) \$ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
devsKV = zip (map show [(1::Int)..]) . zip devs \$ mountablePaths
prog opts config devsKV

argsCheck :: Opts -> String -> IO Int
argsCheck Opts{..} homeDir
| null homeDir = e "could not get environment variable \$HOME" 1
| otherwise = return 0
where
e :: String -> Int -> IO Int
e str num = errMsg str >> return num

prog :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
prog opts@Opts{..} config devsKV
| discover = do
putStrLn "all devices:"
mapM_ (\(_, (d, _)) -> putStrLn \$ cshow d) devsKV
| otherwise = do
putStrLn (if (unmount || unmount_all)
then "USB device(s) to unmount:"
else "USB device(s) to mount:")
mapM_ (\(n, (d, _)) -> putStrLn \$ "    " ++ n ++ ") " ++ show' d) devsKV
putStrLn ""
mountMenu opts config devsKV
where
cshow :: BlockDevice -> String
cshow b@BlockDevice{..}
| no_color = show b
| otherwise = case mountPoint of
Unmounted -> colorize Green \$ show b
MPath _ -> if not \$ null \$ getUSBMountPath b
then colorize Blue \$ show b
else show b
_ -> show b
show' :: BlockDevice -> String
show' = if not (unmount || unmount_all)
then show
else unwords . init . words . show

mountMenu :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
mountMenu Opts{..} config devsKV
| unmount = if length devsKV == 1
then do
putStrLn "only 1 USB device to unmount"
tryMount False config (snd . head \$ devsKV) >>= exitWith
else chooseDev prompt devsKV (tryMount False config)
| unmount_all = do
putStrLn "unmounting all USB devices..."
mapM_ (tryMount False config) (map snd devsKV)
return ()
| all_devices = do
putStrLn "mounting all USB devices..."
mapM_ (tryMount True config) (map snd devsKV)
return ()
| length devsKV == 1 = do
putStrLn "only 1 USB device to mount"
tryMount True config (snd . head \$ devsKV) >>= exitWith
| otherwise = chooseDev prompt devsKV (tryMount True config)
where
prompt :: String
prompt = if (unmount || unmount_all)
then "choose USB device to unmount (q to exit)"
else "choose USB device to mount (q to exit)"

chooseDev :: String -> [(String, (BlockDevice, FilePath))] -> ((BlockDevice, FilePath) -> IO ExitCode) -> IO ()
chooseDev prompt devsKV func = do
putStrLn prompt
key <- getLine
case lookup key devsKV of
Just dev -> func dev >>= exitWith
_ -> case key of
"q" -> return ()
_ -> chooseDev prompt devsKV func

tryMount :: Bool -> Config -> (BlockDevice, FilePath) -> IO ExitCode
tryMount mount config@Config{..} (bd@BlockDevice{..}, mp)
| (null margs) = do
errMsg \$ "UUID " ++ squote uuid ++ " was not found in config file"
errMsg \$ "filesystem " ++ squote fsys ++ " was also not found in config file"
errMsg \$ "supported file systems: " ++ (unwords \$ map fst fsyss)
exitWith (ExitFailure 1)
| otherwise = do
when mount \$ do
if (null \$ mountArgsUUID config uuid)
then putStrLn \$ "filesystem " ++ squote fsys ++ " found in config file"
else putStrLn \$ "UUID " ++ squote uuid ++ " found in config file"
putStrLn \$ "using these arguments: " ++ squote margs
putStr \$ (if mount then "" else "un")
++ "mounting "
++ shortname
++ " (" ++ fsys ++ ") "
++ (if mount then "to " ++ mp else "from " ++ show mountPoint)
++ ".."
(_, _, _, p) <- createProcess \$ cmd margs shortname
exitStatus <- waitForProcess p
if (exitStatus == ExitSuccess)
then do
putStrLn "OK"
return ExitSuccess
else do
putStr "FAILED\n"
errMsg \$ (if mount
then "mount error (perhaps " ++ squote mp ++ " does not exist)"
else "unmount error")
return (ExitFailure 1)
where
margs = mountArgs config bd
cmd arguments devPath = CreateProcess
{ cmdspec = ShellCommand (if mount
then "sudo mount -t " ++ arguments ++ " " ++ devPath ++ " " ++ mp ++ " &>/dev/null"
else "sudo umount " ++ show mountPoint)
, cwd = Nothing
, env = Nothing
, std_in = CreatePipe
, std_out = CreatePipe
, std_err = Inherit
, close_fds = False
}

mountArgs :: Config -> BlockDevice -> String
mountArgs Config{..} BlockDevice{..} = case lookup uuid uuids of
Just a -> a
_ -> case lookup fsys fsyss of
Just a -> a
_ -> []

mountArgsUUID :: Config -> UUID -> String
mountArgsUUID Config{..} uuid' = case lookup uuid' uuids of
Just a -> a
_ -> []

getDevices :: Opts -> IO ([BlockDevice], [String])
getDevices Opts{..} = do
(_, sout, _, p) <- createProcess cmdBlkid
devs <- case sout of
Just h -> hGetContents h
Nothing -> return []
_ <- waitForProcess p
let devs' = (map (unwords . words)) . drop 2 . lines \$ devs
devs'' <- mapM parseBlkid devs'
let toMount = filter (\BlockDevice{..} -> mountPoint == Unmounted) devs''
toUnmount = filter (\dev -> not \$ null \$ getUSBMountPath dev) devs''
takenPaths = filter (not . null) . map getUSBMountPath \$ devs''
when (not discover && null toMount && (not (unmount || unmount_all))) \$ do
errMsg \$ "cannot find USB devices to mount"
exitWith (ExitFailure 1)
when (not discover && null toUnmount && (unmount || unmount_all)) \$ do
errMsg \$ "cannot find USB devices to unmount"
exitWith (ExitFailure 1)
return \$ formatDevs devs'' toMount toUnmount takenPaths
where
formatDevs :: [BlockDevice] -> [BlockDevice] -> [BlockDevice] -> [String] -> ([BlockDevice], [String])
formatDevs ds m um takenPaths
| discover = (ds, takenPaths)
| unmount || unmount_all = (um, takenPaths)
| otherwise = (m, takenPaths)
cmdBlkid = CreateProcess
{ cmdspec = ShellCommand ("sudo blkid -o list")
, cwd = Nothing
, env = Nothing
, std_in = CreatePipe
, std_out = CreatePipe
, std_err = Inherit
, close_fds = False
}

getUSBMountPath :: BlockDevice -> String
getUSBMountPath BlockDevice{..} = case mountPoint of
MPath str -> if take 6 str == "/mnt/u" && (all (\c -> elem c ['0'..'9']) (drop 6 str))
then str
else ""
_ -> ""

errMsg :: String -> IO ()
errMsg msg = hPutStrLn stderr \$ "error: " ++ msg

squote :: String -> String
squote s = "`" ++ s ++ "'"

-- Parsing

-- for parsing the computer-generated output of `sudo blkid -o list'
parserIdentifier :: Parser String
parserIdentifier = many1 \$ oneOf \$ _ALPHANUM ++ "/-_"

parserWhitespace :: Parser String
parserWhitespace = many1 \$ oneOf " \t\n\r"

parserMP :: Parser MountPoint
parserMP =
try ( do
a <- oneOf "<("
b <- manyTill anyChar (lookAhead \$ (oneOf ">)"))
_ <- oneOf ">)"
let mp = case a of
'<' -> Swap
'(' -> case b of
"not mounted" -> Unmounted
_ -> UnknownBlkidVal
_ -> UnknownBlkidVal
return mp
)
<|> (parserIdentifier >>= (\s -> return MPath {path = s}))
<?> "blkid's mount point description"

blkidParser :: Parser BlockDevice
blkidParser =
try ( do
sname <- parserIdentifier
_ <- parserWhitespace
fs <- parserIdentifier
_ <- parserWhitespace
_ <- parserIdentifier -- leave out the "label" column, even if it exists
_ <- parserWhitespace
mp <- parserMP
_ <- parserWhitespace
uid <- parserIdentifier
eof
return BlockDevice
{ shortname = sname
, uuid = uid
, fsys = fs
, mountPoint = mp
}
)
<|>
do  sname <- parserIdentifier
_ <- parserWhitespace
fs <- parserIdentifier
_ <- parserWhitespace
mp <- parserMP
_ <- parserWhitespace
uid <- parserIdentifier
eof
return BlockDevice
{ shortname = sname
, uuid = uid
, fsys = fs
, mountPoint = mp
}
<?> "5 or 4 fields to parse"

parseBlkid :: String -> IO BlockDevice
parseBlkid src =
case parse blkidParser "output of `sudo blkid -o list'" src of
Left parseError -> errMsg (show parseError) >> return blockdeviceDefault
Right result -> return result

-- we use a LanguageDef so that we can get whitespace/newline parsing for FREE
-- in our .usbmnt file
configDef :: PT.GenLanguageDef BL.ByteString () Identity
configDef = emptyDef
{ PT.commentStart   = ""
, PT.commentEnd     = ""
, PT.commentLine    = "#"
, PT.nestedComments = False
-- the identStart/identLetter define what a UUID will look like (a
-- dash-separated hex number)
, PT.identStart     = oneOf \$ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F']
, PT.identLetter    = oneOf \$ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F'] ++ "-"
, PT.opStart        = char '.'
, PT.opLetter       = char '.'
, PT.reservedOpNames= []
, PT.reservedNames  = []
, PT.caseSensitive  = True
}

-- we call makeTokenParser def and pick out just those we need
lexer :: PT.GenTokenParser BL.ByteString () Identity
lexer = PT.makeTokenParser configDef

p_identifier :: ParsecT BL.ByteString () Identity String
p_identifier = PT.identifier lexer
p_stringLiteral :: ParsecT BL.ByteString () Identity String
p_stringLiteral = PT.stringLiteral lexer
p_whiteSpace :: ParsecT BL.ByteString () Identity ()
p_whiteSpace = PT.whiteSpace lexer
p_braces :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity a
p_braces = PT.braces lexer
p_commaSep :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity [a]
p_commaSep = PT.commaSep lexer
p_symbol :: String -> ParsecT BL.ByteString () Identity String
p_symbol = PT.symbol lexer

type UUID = String

assocParser :: PB.Parser String -> PB.Parser (UUID, String)
assocParser keyParser = do
key <- keyParser
_ <- many \$ oneOf " \t"
_ <- string "="
_ <- many \$ oneOf " \t"
mountOpts <- p_stringLiteral
return (key, mountOpts)
<?> "a key-value association"

hashParser :: String -> PB.Parser String -> PB.Parser [(String, String)]
hashParser hashName keyParser = do
_ <- p_symbol hashName
_ <- p_symbol "="
a <- p_braces (p_commaSep \$ assocParser keyParser)
return a
<?> "a " ++ hashName ++ " curly brace block"

configParser :: PB.Parser Config
configParser = do
p_whiteSpace -- take care of leading whitespace/comments as defined by configDef
-- parse FSYS_HASH first
fsyss' <- hashParser "FSYS_HASH" (many1 alphaNum)
p_whiteSpace
-- now parse UUID_HASH
uuids' <- hashParser "UUID_HASH" (p_identifier)
eof
return \$ Config {fsyss = fsyss', uuids = uuids'}
<?> "config with FSYS_HASH and UUID_HASH blocks"

parseConfig :: BL.ByteString -> String -> IO (Int, Config)
parseConfig src loc =
case parse configParser ("config file at " ++ squote loc) src of
Left parseError -> errMsg (show parseError) >> return (1, Config [] [])
Right result -> return (0, result)
```

And here is the diff:

```--- usbmnt.hsold	2011-11-09 11:43:02.871554967 -0800
+++ usbmnt.hsnew	2011-11-09 11:43:09.761453413 -0800
@@ -2,6 +2,7 @@
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

+import qualified Data.ByteString.Lazy as BL
import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
@@ -11,6 +12,7 @@
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String
+import qualified Text.Parsec.ByteString.Lazy as PB
import qualified Text.Parsec.Token as PT
import Text.Parsec.Language (emptyDef)
import Control.Monad.Identity
@@ -126,7 +128,7 @@
when (errNo > 0) \$ exitWith \$ ExitFailure errNo
(devs, takenPaths) <- getDevices opts
let configLoc = homeDir ++ "/.usbmnt"
-    configSrc <- readFile configLoc
+    configSrc <- BL.readFile configLoc
(confErrNo, config) <- parseConfig configSrc configLoc
when (confErrNo > 0) \$ exitWith \$ ExitFailure confErrNo
let mountablePaths = filter (\p -> not \$ elem p takenPaths) \$ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
@@ -379,7 +381,7 @@

-- we use a LanguageDef so that we can get whitespace/newline parsing for FREE
-- in our .usbmnt file
-configDef :: PT.LanguageDef st
+configDef :: PT.GenLanguageDef BL.ByteString () Identity
configDef = emptyDef
{ PT.commentStart   = ""
, PT.commentEnd     = ""
@@ -397,25 +399,25 @@
}

-- we call makeTokenParser def and pick out just those we need
-lexer :: PT.TokenParser ()
+lexer :: PT.GenTokenParser BL.ByteString () Identity
lexer = PT.makeTokenParser configDef

-p_identifier :: ParsecT String () Identity String
+p_identifier :: ParsecT BL.ByteString () Identity String
p_identifier = PT.identifier lexer
-p_stringLiteral :: ParsecT String () Identity String
+p_stringLiteral :: ParsecT BL.ByteString () Identity String
p_stringLiteral = PT.stringLiteral lexer
-p_whiteSpace :: ParsecT String () Identity ()
+p_whiteSpace :: ParsecT BL.ByteString () Identity ()
p_whiteSpace = PT.whiteSpace lexer
-p_braces :: ParsecT String () Identity a -> ParsecT String () Identity a
+p_braces :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity a
p_braces = PT.braces lexer
-p_commaSep :: ParsecT String () Identity a -> ParsecT String () Identity [a]
+p_commaSep :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity [a]
p_commaSep = PT.commaSep lexer
-p_symbol :: String -> ParsecT String () Identity String
+p_symbol :: String -> ParsecT BL.ByteString () Identity String
p_symbol = PT.symbol lexer

type UUID = String

-assocParser :: Parser String -> Parser (UUID, String)
+assocParser :: PB.Parser String -> PB.Parser (UUID, String)
assocParser keyParser = do
key <- keyParser
_ <- many \$ oneOf " \t"
@@ -425,7 +427,7 @@
return (key, mountOpts)
<?> "a key-value association"

-hashParser :: String -> Parser String -> Parser [(String, String)]
+hashParser :: String -> PB.Parser String -> PB.Parser [(String, String)]
hashParser hashName keyParser = do
_ <- p_symbol hashName
_ <- p_symbol "="
@@ -433,7 +435,7 @@
return a
<?> "a " ++ hashName ++ " curly brace block"

-configParser :: Parser Config
+configParser :: PB.Parser Config
configParser = do
p_whiteSpace -- take care of leading whitespace/comments as defined by configDef
-- parse FSYS_HASH first
@@ -445,7 +447,7 @@
return \$ Config {fsyss = fsyss', uuids = uuids'}
<?> "config with FSYS_HASH and UUID_HASH blocks"

-parseConfig :: String -> String -> IO (Int, Config)
+parseConfig :: BL.ByteString -> String -> IO (Int, Config)
parseConfig src loc =
case parse configParser ("config file at " ++ squote loc) src of
Left parseError -> errMsg (show parseError) >> return (1, Config [] [])
```

Like I said, pretty straightforward, right? The code actually only uses lazy bytestrings for parsing the configuration file. The output of the blkid command is still parsed with the old way (native Haskell strings). The only tricky part is to import the bytestring stuff (Data.ByteString.Lazy and Text.Parsec.ByteString.Lazy) as qualified imports to avoid namespace clashes. The configDef function’s type signature had to change a bit because Parsec 3.1.1 does not have a convenience type alias for LanguageDef that uses bytestrings. To be honest, I couldn’t figure out the type signatures for p_identifier, p_symbol, etc., so I had GHC do it for me. But I mean, it all took like 5-10 minutes. Easy.

So now you can parse bytestrings — lazily!

# Parsec Example Revisited: Custom Configuration File Format Meets the Token Module

As promised, here is a solution to one of the exercises I posed here — namely, to read and parse a configuration file to determine mount options to pass along to the mount command. (No one’s reading this stuff right now, but whatever. Haskell is still that much fun for me!) That is, the previous solution used a rather crude, hard-coded function, fileSystemArgs, as follows:

```fileSystemArgs :: String -> [(String, String)]
fileSystemArgs user =
[ ("ext2", "ext2 -o rw,relatime")
, ("vfat", "vfat -o rw,uid=" ++ user ++ ",gid=" ++ user)
]
```

I’ve removed this code in favor of reading such options from a configuration file at runtime, as per the exercise. The code has increased to 452 lines (only about 100 more lines than before, with my verbose style), and we make use of the Text.Parsec.Token module to define a super-simple mini-language that we use in the configuration file (two curly brace blocks that define mount options). We only use bits and pieces of it, but it’s still handy because we get some super-convenient parser combinators for free — namely, braces, stringLiteral, and whiteSpace (which are actually aliased to p_braces, p_stringLiteral, and p_whiteSpace in the code). What’s more, these functions intelligently evade trailing whitespace and comment lines automatically. This is why I love Parsec!!!

So without further ado, here’s the code (PUBLIC DOMAIN):

```-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
import System.Exit
import System.Process
import Text.Parsec.Char hiding (upper)
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String
import qualified Text.Parsec.Token as PT
import Text.Parsec.Language (emptyDef)
import Control.Monad.Identity

data Opts = Opts
{ all_devices :: Bool
, unmount :: Bool
, unmount_all :: Bool
, discover :: Bool
, no_color :: Bool
} deriving (Data, Typeable, Show, Eq)

progOpts :: Opts
progOpts = Opts
{ all_devices = def &= help "mount all USB devices"
, unmount = def &= help "choose a USB device to unmount"
, unmount_all = def &= name "U" &= help "unmount all USB devices"
, discover = def &= help "list all mounted/unmounted USB devices"
, no_color = def &= help "disable colors"
}
&= details
[ "Notes:"
, ""
, "The default behavior without any options is to try to mount a USB device."
++ " Here, `device' means a device under the /dev directory, and in our context, is actually a file system partition."
++ " Many USB drives have only a single partition, in which case the term `device' means both the USB drive and the single partition it has."
, ""
, "Also, allowing the \$USER to execute the mount and umount commands with sudo privileges (sudo visudo) will make things less clunky."
]

getOpts :: IO Opts
getOpts = cmdArgs \$ progOpts
&= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
&= program _PROGRAM_NAME
&= help _PROGRAM_DESC
&= helpArg [explicit, name "help", name "h"]
&= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]

_PROGRAM_NAME, _PROGRAM_VERSION, _PROGRAM_INFO, _PROGRAM_DESC, _COPYRIGHT :: String
_PROGRAM_NAME = "usbmnt"
_PROGRAM_VERSION = "0.1.0"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_DESC = "mount/unmount USB device(s)"
_COPYRIGHT = "PUBLIC DOMAIN"

data BlockDevice = BlockDevice
{ shortname :: String
, uuid :: UUID
, fsys :: String
, mountPoint :: MountPoint
} deriving (Eq)

data MountPoint
= MPath { path :: FilePath }
| Swap
| Unmounted
| UnknownBlkidVal
deriving (Eq)

instance Show BlockDevice where
show BlockDevice{..} = unwords
[ shortname
, fsys
, uuid
, show mountPoint
]

instance Show MountPoint where
show (MPath path) = path
show Swap = "Swap"
show Unmounted = "Unmounted"
show UnknownBlkidVal = "UnknownBlkidVal"

blockdeviceDefault :: BlockDevice
blockdeviceDefault = BlockDevice
{ shortname = ""
, uuid = ""
, fsys = ""
, mountPoint = MPath {path = ""}
}

data Config = Config
{ fsyss :: [(String, String)]
, uuids :: [(UUID, String)]
} deriving (Eq, Show)

_ALPHANUM :: String
_ALPHANUM = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']

data Color
= Red
| Green
| Yellow
| Blue
| CNone
deriving (Show, Eq)

colorize :: Color -> String -> String
colorize c s = case c of
Blue -> "\x1b[1;34m" ++ s ++ "\x1b[0m"
Green -> "\x1b[1;32m" ++ s ++ "\x1b[0m"
Red -> "\x1b[1;31m" ++ s ++ "\x1b[0m"
Yellow -> "\x1b[1;33m" ++ s ++ "\x1b[0m"
_ -> s

main :: IO ()
main = do
hSetBuffering stdout NoBuffering
hSetBuffering stderr NoBuffering
opts <- getOpts
homeDir <- getEnv "HOME"
errNo <- argsCheck opts homeDir
when (errNo > 0) \$ exitWith \$ ExitFailure errNo
(devs, takenPaths) <- getDevices opts
let configLoc = homeDir ++ "/.usbmnt"
configSrc <- readFile configLoc
(confErrNo, config) <- parseConfig configSrc configLoc
when (confErrNo > 0) \$ exitWith \$ ExitFailure confErrNo
let mountablePaths = filter (\p -> not \$ elem p takenPaths) \$ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
devsKV = zip (map show [(1::Int)..]) . zip devs \$ mountablePaths
prog opts config devsKV

argsCheck :: Opts -> String -> IO Int
argsCheck Opts{..} homeDir
| null homeDir = e "could not get environment variable \$HOME" 1
| otherwise = return 0
where
e :: String -> Int -> IO Int
e str num = errMsg str >> return num

prog :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
prog opts@Opts{..} config devsKV
| discover = do
putStrLn "all devices:"
mapM_ (\(_, (d, _)) -> putStrLn \$ cshow d) devsKV
| otherwise = do
putStrLn (if (unmount || unmount_all)
then "USB device(s) to unmount:"
else "USB device(s) to mount:")
mapM_ (\(n, (d, _)) -> putStrLn \$ "    " ++ n ++ ") " ++ show' d) devsKV
putStrLn ""
mountMenu opts config devsKV
where
cshow :: BlockDevice -> String
cshow b@BlockDevice{..}
| no_color = show b
| otherwise = case mountPoint of
Unmounted -> colorize Green \$ show b
MPath _ -> if not \$ null \$ getUSBMountPath b
then colorize Blue \$ show b
else show b
_ -> show b
show' :: BlockDevice -> String
show' = if not (unmount || unmount_all)
then show
else unwords . init . words . show

mountMenu :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
mountMenu Opts{..} config devsKV
| unmount = if length devsKV == 1
then do
putStrLn "only 1 USB device to unmount"
tryMount False config (snd . head \$ devsKV) >>= exitWith
else chooseDev prompt devsKV (tryMount False config)
| unmount_all = do
putStrLn "unmounting all USB devices..."
mapM_ (tryMount False config) (map snd devsKV)
return ()
| all_devices = do
putStrLn "mounting all USB devices..."
mapM_ (tryMount True config) (map snd devsKV)
return ()
| length devsKV == 1 = do
putStrLn "only 1 USB device to mount"
tryMount True config (snd . head \$ devsKV) >>= exitWith
| otherwise = chooseDev prompt devsKV (tryMount True config)
where
prompt :: String
prompt = if (unmount || unmount_all)
then "choose USB device to unmount (q to exit)"
else "choose USB device to mount (q to exit)"

chooseDev :: String -> [(String, (BlockDevice, FilePath))] -> ((BlockDevice, FilePath) -> IO ExitCode) -> IO ()
chooseDev prompt devsKV func = do
putStrLn prompt
key <- getLine
case lookup key devsKV of
Just dev -> func dev >>= exitWith
_ -> case key of
"q" -> return ()
_ -> chooseDev prompt devsKV func

tryMount :: Bool -> Config -> (BlockDevice, FilePath) -> IO ExitCode
tryMount mount config@Config{..} (bd@BlockDevice{..}, mp)
| (null margs) = do
errMsg \$ "UUID " ++ squote uuid ++ " was not found in config file"
errMsg \$ "filesystem " ++ squote fsys ++ " was also not found in config file"
errMsg \$ "supported file systems: " ++ (unwords \$ map fst fsyss)
exitWith (ExitFailure 1)
| otherwise = do
when mount \$ do
if (null \$ mountArgsUUID config uuid)
then putStrLn \$ "filesystem " ++ squote fsys ++ " found in config file"
else putStrLn \$ "UUID " ++ squote uuid ++ " found in config file"
putStrLn \$ "using these arguments: " ++ squote margs
putStr \$ (if mount then "" else "un")
++ "mounting "
++ shortname
++ " (" ++ fsys ++ ") "
++ (if mount then "to " ++ mp else "from " ++ show mountPoint)
++ ".."
(_, _, _, p) <- createProcess \$ cmd margs shortname
exitStatus <- waitForProcess p
if (exitStatus == ExitSuccess)
then do
putStrLn "OK"
return ExitSuccess
else do
putStr "FAILED\n"
errMsg \$ (if mount
then "mount error (perhaps " ++ squote mp ++ " does not exist)"
else "unmount error")
return (ExitFailure 1)
where
margs = mountArgs config bd
cmd arguments devPath = CreateProcess
{ cmdspec = ShellCommand (if mount
then "sudo mount -t " ++ arguments ++ " " ++ devPath ++ " " ++ mp ++ " &>/dev/null"
else "sudo umount " ++ show mountPoint)
, cwd = Nothing
, env = Nothing
, std_in = CreatePipe
, std_out = CreatePipe
, std_err = Inherit
, close_fds = False
}

mountArgs :: Config -> BlockDevice -> String
mountArgs Config{..} BlockDevice{..} = case lookup uuid uuids of
Just a -> a
_ -> case lookup fsys fsyss of
Just a -> a
_ -> []

mountArgsUUID :: Config -> UUID -> String
mountArgsUUID Config{..} uuid' = case lookup uuid' uuids of
Just a -> a
_ -> []

getDevices :: Opts -> IO ([BlockDevice], [String])
getDevices Opts{..} = do
(_, sout, _, p) <- createProcess cmdBlkid
devs <- case sout of
Just h -> hGetContents h
Nothing -> return []
_ <- waitForProcess p
let devs' = (map (unwords . words)) . drop 2 . lines \$ devs
devs'' <- mapM parseBlkid devs'
let toMount = filter (\BlockDevice{..} -> mountPoint == Unmounted) devs''
toUnmount = filter (\dev -> not \$ null \$ getUSBMountPath dev) devs''
takenPaths = filter (not . null) . map getUSBMountPath \$ devs''
when (not discover && null toMount && (not (unmount || unmount_all))) \$ do
errMsg \$ "cannot find USB devices to mount"
exitWith (ExitFailure 1)
when (not discover && null toUnmount && (unmount || unmount_all)) \$ do
errMsg \$ "cannot find USB devices to unmount"
exitWith (ExitFailure 1)
return \$ formatDevs devs'' toMount toUnmount takenPaths
where
formatDevs :: [BlockDevice] -> [BlockDevice] -> [BlockDevice] -> [String] -> ([BlockDevice], [String])
formatDevs ds m um takenPaths
| discover = (ds, takenPaths)
| unmount || unmount_all = (um, takenPaths)
| otherwise = (m, takenPaths)
cmdBlkid = CreateProcess
{ cmdspec = ShellCommand ("sudo blkid -o list")
, cwd = Nothing
, env = Nothing
, std_in = CreatePipe
, std_out = CreatePipe
, std_err = Inherit
, close_fds = False
}

getUSBMountPath :: BlockDevice -> String
getUSBMountPath BlockDevice{..} = case mountPoint of
MPath str -> if take 6 str == "/mnt/u" && (all (\c -> elem c ['0'..'9']) (drop 6 str))
then str
else ""
_ -> ""

errMsg :: String -> IO ()
errMsg msg = hPutStrLn stderr \$ "error: " ++ msg

squote :: String -> String
squote s = "`" ++ s ++ "'"

-- Parsing

-- for parsing the computer-generated output of `sudo blkid -o list'
parserIdentifier :: Parser String
parserIdentifier = many1 \$ oneOf \$ _ALPHANUM ++ "/-_"

parserWhitespace :: Parser String
parserWhitespace = many1 \$ oneOf " \t\n\r"

parserMP :: Parser MountPoint
parserMP =
try ( do
a <- oneOf "<("
b <- manyTill anyChar (lookAhead \$ (oneOf ">)"))
_ <- oneOf ">)"
let mp = case a of
'<' -> Swap
'(' -> case b of
"not mounted" -> Unmounted
_ -> UnknownBlkidVal
_ -> UnknownBlkidVal
return mp
)
<|> (parserIdentifier >>= (\s -> return MPath {path = s}))
<?> "blkid's mount point description"

blkidParser :: Parser BlockDevice
blkidParser =
try ( do
sname <- parserIdentifier
_ <- parserWhitespace
fs <- parserIdentifier
_ <- parserWhitespace
_ <- parserIdentifier -- leave out the "label" column, even if it exists
_ <- parserWhitespace
mp <- parserMP
_ <- parserWhitespace
uid <- parserIdentifier
eof
return BlockDevice
{ shortname = sname
, uuid = uid
, fsys = fs
, mountPoint = mp
}
)
<|>
do  sname <- parserIdentifier
_ <- parserWhitespace
fs <- parserIdentifier
_ <- parserWhitespace
mp <- parserMP
_ <- parserWhitespace
uid <- parserIdentifier
eof
return BlockDevice
{ shortname = sname
, uuid = uid
, fsys = fs
, mountPoint = mp
}
<?> "5 or 4 fields to parse"

parseBlkid :: String -> IO BlockDevice
parseBlkid src =
case parse blkidParser "output of `sudo blkid -o list'" src of
Left parseError -> errMsg (show parseError) >> return blockdeviceDefault
Right result -> return result

-- we use a LanguageDef so that we can get whitespace/newline parsing for FREE
-- in our .usbmnt file
configDef :: PT.LanguageDef st
configDef = emptyDef
{ PT.commentStart   = ""
, PT.commentEnd     = ""
, PT.commentLine    = "#"
, PT.nestedComments = False
-- the identStart/identLetter define what a UUID will look like (a
-- dash-separated hex number)
, PT.identStart     = oneOf \$ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F']
, PT.identLetter    = oneOf \$ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F'] ++ "-"
, PT.opStart        = char '.'
, PT.opLetter       = char '.'
, PT.reservedOpNames= []
, PT.reservedNames  = []
, PT.caseSensitive  = True
}

-- we call makeTokenParser def and pick out just those we need
lexer :: PT.TokenParser ()
lexer = PT.makeTokenParser configDef

p_identifier :: ParsecT String () Identity String
p_identifier = PT.identifier lexer
p_stringLiteral :: ParsecT String () Identity String
p_stringLiteral = PT.stringLiteral lexer
p_whiteSpace :: ParsecT String () Identity ()
p_whiteSpace = PT.whiteSpace lexer
p_braces :: ParsecT String () Identity a -> ParsecT String () Identity a
p_braces = PT.braces lexer
p_commaSep :: ParsecT String () Identity a -> ParsecT String () Identity [a]
p_commaSep = PT.commaSep lexer
p_symbol :: String -> ParsecT String () Identity String
p_symbol = PT.symbol lexer

type UUID = String

assocParser :: Parser String -> Parser (UUID, String)
assocParser keyParser = do
key <- keyParser
_ <- many \$ oneOf " \t"
_ <- string "="
_ <- many \$ oneOf " \t"
mountOpts <- p_stringLiteral
return (key, mountOpts)
<?> "a key-value association"

hashParser :: String -> Parser String -> Parser [(String, String)]
hashParser hashName keyParser = do
_ <- p_symbol hashName
_ <- p_symbol "="
a <- p_braces (p_commaSep \$ assocParser keyParser)
return a
<?> "a " ++ hashName ++ " curly brace block"

configParser :: Parser Config
configParser = do
p_whiteSpace -- take care of leading whitespace/comments as defined by configDef
-- parse FSYS_HASH first
fsyss' <- hashParser "FSYS_HASH" (many1 alphaNum)
p_whiteSpace
-- now parse UUID_HASH
uuids' <- hashParser "UUID_HASH" (p_identifier)
eof
return \$ Config {fsyss = fsyss', uuids = uuids'}
<?> "config with FSYS_HASH and UUID_HASH blocks"

parseConfig :: String -> String -> IO (Int, Config)
parseConfig src loc =
case parse configParser ("config file at " ++ squote loc) src of
Left parseError -> errMsg (show parseError) >> return (1, Config [] [])
Right result -> return (0, result)
```

And here is the new ~/.usbmnt configuration file that it parses:

```# UUID/filesystem mountoptions

# the FSYS_HASH is used as a fallback, if the detected device does not match one of the UUIDs listed above
FSYS_HASH =
{ ext2 = "ext2 -o rw,relatime"
, vfat = "vfat -o rw,uid=\$USER,gid=\$USER"
}

# UUID is pretty straightforward; use blkid to figure it out for a particular device
# as for mount options, these are passed directly into the shell (/bin/sh; see createProcess in System.Process), so you can use things like \$USER

# UUID_HASH = {} # if you have no particular device-specific settings

UUID_HASH =
{ 7CBF-B36F =                             "vfat -o rw,uid=\$USER,gid=\$USER"    # (256 MiB) SD card
, 9c359666-a4e6-a894-3475-e6cd53660de8 =  "ext2 -o rw,relatime"               # (2 GiB) USB 2.0 thumbdrive
}
```

See how there are a bunch of comment lines starting with ‘#‘? With our configDef function which defines a language with “#” as the commentLine, our convenience functions p_whiteSpace, p_symbol, etc. all avoid comment lines automatically! Sure, it’s a trivial automation here, but you can also define multiline comments and more (which I didn’t bother to do since the configuration file format I designed is so simple). I really like how I can robustly parse comment-laden text strings with such ease!

Can you imagine doing the same thing with regular expressions? I used to use regexes for parsing things like this back when I was still a very naive programmer, and then Parsec opened up the wonderful world of parsers for me (especially all the cool convenience functions you get for FREE!). The new rule I have is: use regular expressions to strictly do search-and-replace for unchanging/uniform input (using the Text.Regex.PCRE module), and to use Parsec for everything else. There’s no excuse to put off learning how to use Parsec!

Anyway, the code I posted is a working example and I really do use it in my machines! And all you need is a POSIX system with GHC, Parsec, and CmdArgs to use it.

Next time, I’ll do the second exercise and read in and parse ByteStrings instead of native Haskell Strings — just to show you how trivial the changes are!

# Parsec and CmdArgs in Action: A Small Example

In this post, I mentioned that I wrote a ~300 line Haskell program to take care of mounting/unmounting USB drives (especially useful for window-manager-only users like myself). Well, I’ve been using my program (creatively named usbmnt) very happily so far and would like to release it for public consumption. It’s released into the PUBLIC DOMAIN, because that’s how I roll, baby!

```-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

import Control.Monad (when)
import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
import System.Exit
import System.Process
import Text.Parsec.Char hiding (upper)
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String

data Opts = Opts
{ all_devices :: Bool
, unmount :: Bool
, unmount_all :: Bool
, discover :: Bool
, no_color :: Bool
} deriving (Data, Typeable, Show, Eq)

progOpts :: Opts
progOpts = Opts
{ all_devices = def &= help "mount all USB devices"
, unmount = def &= help "choose a USB device to unmount"
, unmount_all = def &= name "U" &= help "unmount all USB devices"
, discover = def &= help "list all mounted/unmounted USB devices"
, no_color = def &= help "disable colors"
}
&= details
[ "Notes:"
, ""
, "The default behavior without any options is to try to mount a USB device."
++ " Here, `device' means a device under the /dev directory, and in our context, is actually a file system partition."
++ " Many USB drives have only a single partition, in which case the term `device' means both the USB drive and the single partition it has."
, ""
, "Also, allowing the \$USER to execute the mount and umount commands with sudo privileges (sudo visudo) will make things less clunky."
]

getOpts :: IO Opts
getOpts = cmdArgs \$ progOpts
&= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
&= program _PROGRAM_NAME
&= help _PROGRAM_DESC
&= helpArg [explicit, name "help", name "h"]
&= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]

_PROGRAM_NAME, _PROGRAM_VERSION, _PROGRAM_INFO, _PROGRAM_DESC, _COPYRIGHT :: String
_PROGRAM_NAME = "usbmnt"
_PROGRAM_VERSION = "0.0.1"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_DESC = "mount/unmount USB device(s)"
_COPYRIGHT = "PUBLIC DOMAIN"

data BlockDevice = BlockDevice
{ shortname :: String
, uuid :: String
, fsys :: String
, mountPoint :: MountPoint
} deriving (Eq)

data MountPoint
= MPath { path :: FilePath }
| Swap
| Unmounted
| UnknownBlkidVal
deriving (Eq)

instance Show BlockDevice where
show BlockDevice{..} = unwords
[ shortname
, fsys
, uuid
, show mountPoint
]

instance Show MountPoint where
show (MPath path) = path
show Swap = "Swap"
show Unmounted = "Unmounted"
show UnknownBlkidVal = "UnknownBlkidVal"

blockdeviceDefault :: BlockDevice
blockdeviceDefault = BlockDevice
{ shortname = ""
, uuid = ""
, fsys = ""
, mountPoint = MPath {path = ""}
}

_ALPHANUM :: String
_ALPHANUM = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']

data Color
= Red
| Green
| Yellow
| Blue
| CNone
deriving (Show, Eq)

colorize :: Color -> String -> String
colorize c s = case c of
Blue -> "\x1b[1;34m" ++ s ++ "\x1b[0m"
Green -> "\x1b[1;32m" ++ s ++ "\x1b[0m"
Red -> "\x1b[1;31m" ++ s ++ "\x1b[0m"
Yellow -> "\x1b[1;33m" ++ s ++ "\x1b[0m"
_ -> s

main :: IO ()
main = do
hSetBuffering stdout NoBuffering
hSetBuffering stderr NoBuffering
opts <- getOpts
user <- getEnv "USER"
errNo <- argsCheck opts user
when (errNo > 0) \$ exitWith \$ ExitFailure errNo
(devs, takenPaths) <- getDevices opts
let mountablePaths = filter (\p -> not \$ elem p takenPaths) \$ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
devsKV = zip (map show [(1::Int)..]) . zip devs \$ mountablePaths
prog opts user devsKV

argsCheck :: Opts -> String -> IO Int
argsCheck Opts{..} user
| null user = e "could not get environment variable \$USER" 1
| otherwise = return 0
where
e :: String -> Int -> IO Int
e str num = errMsg str >> return num

prog :: Opts -> String -> [(String, (BlockDevice, FilePath))] -> IO ()
prog opts@Opts{..} user devsKV
| discover = do
putStrLn "all devices:"
mapM_ (\(_, (d, _)) -> putStrLn \$ cshow d) devsKV
| otherwise = do
putStrLn (if (unmount || unmount_all)
then "USB device(s) to unmount:"
else "USB device(s) to mount:")
mapM_ (\(n, (d, _)) -> putStrLn \$ "    " ++ n ++ ") " ++ show' d) devsKV
putStrLn ""
mountMenu opts user devsKV
where
cshow :: BlockDevice -> String
cshow b@BlockDevice{..}
| no_color = show b
| otherwise = case mountPoint of
Unmounted -> colorize Green \$ show b
MPath _ -> if not \$ null \$ getUSBMountPath b
then colorize Blue \$ show b
else show b
_ -> show b
show' :: BlockDevice -> String
show' = if not (unmount || unmount_all)
then show
else unwords . init . words . show

mountMenu :: Opts -> String -> [(String, (BlockDevice, FilePath))] -> IO ()
mountMenu Opts{..} user devsKV
| unmount = if length devsKV == 1
then do
putStrLn "only 1 USB device to unmount"
tryMount False user (snd . head \$ devsKV) >>= exitWith
else chooseDev prompt user devsKV (tryMount False)
| unmount_all = do
putStrLn "unmounting all USB devices..."
mapM_ (tryMount False user) (map snd devsKV)
return ()
| all_devices = do
putStrLn "mounting all USB devices..."
mapM_ (tryMount True user) (map snd devsKV)
return ()
| length devsKV == 1 = do
putStrLn "only 1 USB device to mount"
tryMount True user (snd . head \$ devsKV) >>= exitWith
| otherwise = chooseDev prompt  user devsKV (tryMount True)
where
prompt :: String
prompt = if (unmount || unmount_all)
then "choose USB device to unmount (q to exit)"
else "choose USB device to mount (q to exit)"

chooseDev :: String -> String -> [(String, (BlockDevice, FilePath))] -> (String -> (BlockDevice, FilePath) -> IO ExitCode) -> IO ()
chooseDev prompt user devsKV func = do
putStrLn prompt
key <- getLine
case lookup key devsKV of
Just dev -> func user dev >>= exitWith
_ -> case key of
"q" -> return ()
_ -> chooseDev prompt user devsKV func

tryMount :: Bool -> String -> (BlockDevice, FilePath) -> IO ExitCode
tryMount mount user (BlockDevice{..}, mp) = do
when (null \$ mountArgs fsys user) \$ do
errMsg \$ "unsupported file system " ++ squote fsys ++ "\nsupported file systems: " ++ (unwords \$ map fst (fileSystemArgs user))
exitWith (ExitFailure 1)
putStr \$ (if mount then "" else "un")
++ "mounting "
++ shortname
++ " (" ++ fsys ++ ") "
++ (if mount then "to " ++ mp else "from " ++ show mountPoint)
++ ".."
(_, _, _, p) <- createProcess \$ cmd (mountArgs fsys user) shortname
exitStatus <- waitForProcess p
if (exitStatus == ExitSuccess)
then do
putStrLn "OK"
return ExitSuccess
else do
putStr "FAILED\n"
errMsg \$ (if mount
then "mount error (perhaps " ++ squote mp ++ " does not exist)"
else "unmount error")
return (ExitFailure 1)
where
cmd arguments devPath = CreateProcess
{ cmdspec = ShellCommand (if mount
then "sudo mount -t " ++ arguments ++ " " ++ devPath ++ " " ++ mp ++ " &>/dev/null"
else "sudo umount " ++ show mountPoint)
, cwd = Nothing
, env = Nothing
, std_in = CreatePipe
, std_out = CreatePipe
, std_err = Inherit
, close_fds = False
}

fileSystemArgs :: String -> [(String, String)]
fileSystemArgs user =
[ ("ext2", "ext2 -o rw,relatime")
, ("vfat", "vfat -o rw,uid=" ++ user ++ ",gid=" ++ user)
]

mountArgs :: String -> String -> String
mountArgs fsys user = case lookup fsys (fileSystemArgs user) of
Just a -> a
_ -> []

getDevices :: Opts -> IO ([BlockDevice], [String])
getDevices Opts{..} = do
(_, sout, _, p) <- createProcess cmdBlkid
devs <- case sout of
Just h -> hGetContents h
Nothing -> return []
_ <- waitForProcess p
let devs' = (map (unwords . words)) . drop 2 . lines \$ devs
devs'' <- mapM parseBlkid devs'
let toMount = filter (\BlockDevice{..} -> mountPoint == Unmounted) devs''
toUnmount = filter (\dev -> not \$ null \$ getUSBMountPath dev) devs''
takenPaths = filter (not . null) . map getUSBMountPath \$ devs''
when (not discover && null toMount && (not (unmount || unmount_all))) \$ do
errMsg \$ "cannot find USB devices to mount"
exitWith (ExitFailure 1)
when (not discover && null toUnmount && (unmount || unmount_all)) \$ do
errMsg \$ "cannot find USB devices to unmount"
exitWith (ExitFailure 1)
return \$ formatDevs devs'' toMount toUnmount takenPaths
where
formatDevs :: [BlockDevice] -> [BlockDevice] -> [BlockDevice] -> [String] -> ([BlockDevice], [String])
formatDevs ds m um takenPaths
| discover = (ds, takenPaths)
| unmount || unmount_all = (um, takenPaths)
| otherwise = (m, takenPaths)
cmdBlkid = CreateProcess
{ cmdspec = ShellCommand ("sudo blkid -o list")
, cwd = Nothing
, env = Nothing
, std_in = CreatePipe
, std_out = CreatePipe
, std_err = Inherit
, close_fds = False
}

getUSBMountPath :: BlockDevice -> String
getUSBMountPath BlockDevice{..} = case mountPoint of
MPath str -> if take 6 str == "/mnt/u" && (all (\c -> elem c ['0'..'9']) (drop 6 str))
then str
else ""
_ -> ""

errMsg :: String -> IO ()
errMsg msg = hPutStrLn stderr \$ "error: " ++ msg

squote :: String -> String
squote s = "`" ++ s ++ "'"

-- Parsing
parserIdentifier :: Parser String
parserIdentifier = many1 \$ oneOf \$ _ALPHANUM ++ "/-_"

parserWhitespace :: Parser String
parserWhitespace = many1 \$ oneOf " \t\n\r"

parserMP :: Parser MountPoint
parserMP =
try ( do
a <- oneOf "<(" -- "(not mounted)" or "<swap>"
b <- manyTill anyChar (lookAhead \$ (oneOf ">)"))
_ <- oneOf ">)"
let mp = case a of
'<' -> Swap
'(' -> case b of
"not mounted" -> Unmounted
_ -> UnknownBlkidVal
_ -> UnknownBlkidVal
return mp
)
<|> (parserIdentifier >>= (\s -> return MPath {path = s})) -- e.g., "/mnt/blah"
<?> "blkid's mount point description"

blkidParser :: Parser BlockDevice
blkidParser =
try ( do
sname <- parserIdentifier
_ <- parserWhitespace
fs <- parserIdentifier
_ <- parserWhitespace
_ <- parserIdentifier -- leave out the "label" column, even if it exists
_ <- parserWhitespace
mp <- parserMP
_ <- parserWhitespace
uid <- parserIdentifier
eof
return BlockDevice
{ shortname = sname
, uuid = uid
, fsys = fs
, mountPoint = mp
}
)
<|>
do  sname <- parserIdentifier
_ <- parserWhitespace
fs <- parserIdentifier
_ <- parserWhitespace
mp <- parserMP
_ <- parserWhitespace
uid <- parserIdentifier
eof
return BlockDevice
{ shortname = sname
, uuid = uid
, fsys = fs
, mountPoint = mp
}
<?> "5 or 4 fields to parse"

parseBlkid :: String -> IO BlockDevice
parseBlkid src =
case parse blkidParser "output of `sudo blkid -o list'" src of
Left parseError -> errMsg (show parseError) >> return blockdeviceDefault
Right result -> return result
```

This example compiles with ghc –make -O2 -Wall -Werror. The only 2 dependencies are Parsec, and CmdArgs.

Looking back at it, I think it’s a great example of a real-world Haskell program out in the wild. It uses Parsec to correctly parse the output from blkid, CmdArgs for sane argument handling, and also does some system calls with the createProcess command. Coding-style-wise, it uses the simple where expression wherever possible, for maximum readability and tries to keep leading whitespace to a minimum. The code is written very verbosely, with explicit type signatures for every function, because, well, it helps me keep my sanity.

I think the code is pretty straightforward. It only supports 2 file systems: the ext2 type (recommended for Linux-only USB drives), and vfat (the FAT file system used for legacy Windows support), but this could be easily extended to support ext3, ext4, or any other file system, since it just wraps around the standard mount command.

The only tricky part is blkidParser, which tries to parse 5 fields or 4 fields. The reason behind this is because blkid -o’s output looks like this:

```\$ sudo blkid -o list
device       fs_type label    mount point      UUID
-----------------------------------------------------------------------------------
/dev/sda1    ntfs             /mnt/windows-xp  XXXXXXXXXXXXXXXX
/dev/sda2    ext4             /                XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda3    ext4             /home            XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda6    swap             <swap>           XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sda5    ext4             /mnt/data        XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sdb1    ext2             (not mounted)    XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
/dev/sdc1    vfat             (not mounted)    XXXX-XXXX
```

Sometimes, the label field is empty, as in the example above. So, we will end up with just 4 fields instead of 5.

I hope this sample, working program will help newbies out (esp. you who are frustrated by Real World Haskell’s clunky Parsec examples…). And for those looking to actually use it themselves for easy USB mounting/unmounting, here are some additional notes:

• Create mount points (i.e., directories) /mnt/u0, /mnt/u1, /mnt/u2, etc. so that the mount command won’t choke.
• Use shell aliases or keyboard hotkeys to avoid typing out “usbmnt -U” every single time.
• Tweak the options passed to the mount command to suit your needs (see the fileSystemArgs function).

(1) A simple exercise would be to extend usbmnt so that it reads a very simple configuration file to detect the right user-defined options to pass to mount depending on the partition’s UUID. The only additional thing you would have to do is read a file from disk and store it as a String.

(2) An eye-opening exercises for Parsec newcomers would be to pass to the parsing functions a ByteString instead of String. (The transition to a ByteString is extremely straight-forward, and requires minimal changes.)

I will do the exercises myself later on and post them in a couple weeks at the latest.

UPDATE: December 1, 2011: Here are some convenience links for those coming in from Google for the solutions to the 2 exercises: Solution for #1 Solution for #2

Happy hacking!

EDIT: December 1, 2011: Fix typo.

# Xmonad.hs: The Joy of Refactoring

The more I use Haskell, the more I learn to appreciate its beauty. The thesis of this post is: Haskell code is so easy to refactor. What follows is an account of my experience the other day with extending my XMonad configuration file, and how easy/fun it was for me. It’s written in a very newbie-friendly way for all XMonad users who don’t know much Haskell, if at all.

The other day, I ended up writing something like the following into my xmonad.hs file:

```myManageHook :: ManageHook
myManageHook = composeAll
[
...
, resource  =? "atWorkspace0" --> doShift "0"
, resource  =? "atWorkspace1" --> doShift "1"
, resource  =? "atWorkspace2" --> doShift "2"
, resource  =? "atWorkspace3" --> doShift "3"
, resource  =? "atWorkspace4" --> doShift "4"
, resource  =? "atWorkspace5" --> doShift "5"
, resource  =? "atWorkspace6" --> doShift "6"
, resource  =? "atWorkspace7" --> doShift "7"
, resource  =? "atWorkspace8" --> doShift "8"
, resource  =? "atWorkspace9" --> doShift "9"
, resource  =? "atWorkspaceF1" --> doShift "F1"
, resource  =? "atWorkspaceF2" --> doShift "F2"
, resource  =? "atWorkspaceF3" --> doShift "F3"
, resource  =? "atWorkspaceF4" --> doShift "F4"
, resource  =? "atWorkspaceF5" --> doShift "F5"
, resource  =? "atWorkspaceF6" --> doShift "F6"
, resource  =? "atWorkspaceF7" --> doShift "F7"
, resource  =? "atWorkspaceF8" --> doShift "F8"
, resource  =? "atWorkspaceF9" --> doShift "F9"
, resource  =? "atWorkspaceF10" --> doShift "F10"
, resource  =? "atWorkspaceF11" --> doShift "F11"
, resource  =? "atWorkspaceF12" --> doShift "F12"
]
```

Even if you don’t know Haskell, you can tell that the above is very repetitive. It looks a bit stupid. Vim’s visual block mode makes editing the above lines in bulk pretty easy, but your xmonad.hs file is a program’s source code, and source code must obey the Do not Repeat Yourself rule. It will make life easier down the road.

Let’s first consider what the code above means. First, we observe that myManageHook is a function, of type ManageHook. The composeAll function’s type signature is as follows:

```composeAll :: [ManageHook] -> ManageHook
```

(I loaded up XMonad from GHCi to figure this out.) So composeAll merely “compresses” a list of ManageHook types into a single ManageHook. Great! Now we know exactly what each element in the list really means:

```myManageHook = composeAll
[ a ManageHook
, a ManageHook
, a ManageHook
, a ManageHook
]
```

So to covert the various “atWorkspace0”, “atWorkspace1”, etc. lines we just need to create a function that generates a list of ManageHook types, and append this list to the one that already exists! Like this:

```myManageHook :: ManageHook
myManageHook = composeAll \$
[ ... existing MangeHook items
]
++ workspaceShifts
where
workspaceShifts = another list of ManageHooks
```

Notice how we had to add in a “\$” symbol, because:

```myManageHook = composeAll [list 1] ++ [list 2]
```

means

```myManageHook = (composeAll [list 1]) ++ [list 2]
```

which means

```myManageHook = ManageHook ++ [list 2]
```

which is definitely not what we want. We want this:

```myManageHook = composeAll ([list 1] ++ [list 2])
```

which is

```myManageHook = composeAll [lists 1 and 2 combined]
```

and we can do this with the “\$” dollar symbol. We could just use the explicit parentheses instead; ultimately it is a matter of style/taste.

So, let’s keep going. The workspaceShifts function needs to generate a list of ManageHooks. From the first code listing, it’s clear that all the generated ManageHooks need only vary slightly from each other:

```    [
...
, resource  =? "atWorkspace0" --> doShift "0"
, resource  =? "atWorkspace1" --> doShift "1"
, resource  =? "atWorkspace2" --> doShift "2"
, resource  =? "atWorkspace3" --> doShift "3"
, resource  =? "atWorkspace4" --> doShift "4"
, resource  =? "atWorkspace5" --> doShift "5"
, resource  =? "atWorkspace6" --> doShift "6"
, resource  =? "atWorkspace7" --> doShift "7"
, resource  =? "atWorkspace8" --> doShift "8"
, resource  =? "atWorkspace9" --> doShift "9"
, resource  =? "atWorkspaceF1" --> doShift "F1"
, resource  =? "atWorkspaceF2" --> doShift "F2"
, resource  =? "atWorkspaceF3" --> doShift "F3"
, resource  =? "atWorkspaceF4" --> doShift "F4"
, resource  =? "atWorkspaceF5" --> doShift "F5"
, resource  =? "atWorkspaceF6" --> doShift "F6"
, resource  =? "atWorkspaceF7" --> doShift "F7"
, resource  =? "atWorkspaceF8" --> doShift "F8"
, resource  =? "atWorkspaceF9" --> doShift "F9"
, resource  =? "atWorkspaceF10" --> doShift "F10"
, resource  =? "atWorkspaceF11" --> doShift "F11"
, resource  =? "atWorkspaceF12" --> doShift "F12"
]
```

So the only thing that really changes is the suffix after “atWorkspace”; it goes from “0” to “F12”. Let’s express this idea in code:

```myManageHook :: ManageHook
myManageHook = composeAll \$
[ ...
]
++ workspaceShifts
where
workspaceShifts = genList (s1 ++ s2)
genList :: [String] -> [ManageHook]
genList ss = map (\s -> resource =? ("atWorkspace" ++ s) --> doShift s) ss
```

We create a new genList function, which needs an argument (a [String], or “list of strings” to be exact). We creatively call them ss in the code above. The map function simply modifies each item in a list in the same way. So here, we modify every string (s) to become a MangeHook, by giving map‘s first argument as:

```(\s -> resource =? ("atWorkspace" ++ s) --> doShift s)
```

The backslash followed by s binds a single string from the ss list as the variable s. The right arrow (->) signals the start of the function definition. Do you see the resemblance?

```    ...
, resource  =? "atWorkspace0" --> doShift "0"
, resource  =? "atWorkspace1" --> doShift "1"
, resource  =? "atWorkspace2" --> doShift "2"
, resource  =? "atWorkspace3" --> doShift "3"
...

vs.

resource =? ("atWorkspace" ++ s) --> doShift s
```

Nice, clean, and succinct.

Now we just need to define those strings to feed into genList. Here’s s1:

```        s1 :: [String] -- a list of strings
s1 = map show [0..9]
```

Again, we use the map function. It’s such a handy little function! Haskell is very much built around such useful little named primitives (by convention, most things that look like “operators” such as multi-punctuation arrows, ampersands, and colons are part of some niche library, and not Haskell the core language). The show function merely converts its argument (here, a list of Ints), into strings, so the

```s1 = map show [0..9]
```

part really means: [“0″,”1″,”2″,”3″,”4″,”5″,”6″,”7″,”8″,”9”].

So that’s s1. The second list, s2, is almost identical:

```s2 = map (("F" ++) . show) [1..12]
```

The only difference is that it adds “F” as a prefix, so that we get “F1” instead of “1”, “F2” instead of “2”, and so on.

So, that’s it! The complete code is as follows:

```myManageHook :: ManageHook
myManageHook = composeAll \$
[ ...
]
++ workspaceShifts
where
workspaceShifts = genList (s1 ++ s2)
genList :: [String] -> [ManageHook]
genList ss = map (\s -> resource =? ("atWorkspace" ++ s) --> doShift s) ss
s1, s2 :: [String]
s1 = map show [0..9]
s2 = map (("F" ++) . show) [1..12]
```

We can actually shorten it even more:

```myManageHook :: ManageHook
myManageHook = composeAll \$
[ ...
]
++ map (\s -> resource =? ("atWorkspace" ++ s) --> doShift s) (s1 ++ s2)
where
s1 = map show [0..9]
s2 = map (("F" ++) . show) [1..12]
```

In the end, we’ve reduced 22 lines of stupid, repetitive code prone to human error and typos down to 4 lines of smart, modular code. I really enjoy this kind of refactoring: reduction of human-error-prone code.

What’s more, the whole experience was quite easy and enjoyable. I did not need to know what exactly a ManageHook type represented; instead, all I needed to know were the type signatures of the various existing parts. And that’s why Haskell is so fun to refactor: type signatures, combined with industrial-strength type checking, makes things safe. Plus, if you look at the code, everything matters. It’s really hard to write spaghetti code in Haskell (even for relative newbies like myself).

For the curious, here is a list of type signatures for composeAll, resource, doShift, (–<), and (=?):

```Prelude XMonad> :t composeAll
composeAll :: [ManageHook] -> ManageHook
Prelude XMonad> :t resource
resource :: Query String
Prelude XMonad> :t doShift
doShift :: WorkspaceId -> ManageHook
Prelude XMonad> :t (-->)
(-->) :: Query Bool -> ManageHook -> ManageHook
Prelude XMonad> :t (=?)
(=?) :: Eq a => Query a -> a -> Query Bool
```

UPDATE July 3, 2011: Hendrik mentions a more concise approach using list comprehensions (arguably more Haskell-y, and simper):

```myManageHook :: ManageHook
myManageHook = composeAll \$
[ ...
]
++ [resource =? ("atWorkspace" ++ s) --> doShift s
| s <- map show [0..9] ++ map (('F':) . show) [1..12] ]
```

UPDATE November 8, 2011: Fixed typo.

# Haskell: Using CmdArgs (Single and Multi-Mode)

I use the CmdArgs module all the time for all of my personal Haskell projects. CmdArgs is really the one-stop solution for all of your command-line option handling needs. Its “killer” feature is the ability to support multi-mode options (e.g., “myprog mode1 [mode1 options]”, “myprog mode2 [mode2 options]”, etc.). Unfortunately, not many people know about CmdArgs, it seems. So, I’m writing this post to help spread its popularity among fellow newbie/intermediate Haskellers. I’m including two sample programs — a traditional single-mode program, and a multi-mode program, for your tinkering pleasure. They are hereby released into the PUBLIC DOMAIN.

The following is a simple single-mode program, singleMode.hs.

```-- singleMode.hs
-- License: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Console.CmdArgs
import System.Environment (getArgs, withArgs)
import System.Exit
import Control.Monad (when)

data MyOptions = MyOptions
{ color :: Bool
, first_name :: String
, age :: Int
, directory :: FilePath
} deriving (Data, Typeable, Show, Eq)

-- Customize your options, including help messages, shortened names, etc.
myProgOpts :: MyOptions
myProgOpts = MyOptions
{ color = def &= help "use color"
, first_name = def &= help "your first name"
, age = def &= explicit &= name "g" &= name "age" &= help "your age"
, directory = def &= typDir &= help "your first name"
}

getOpts :: IO MyOptions
getOpts = cmdArgs \$ myProgOpts
&= verbosityArgs [explicit, name "Verbose", name "V"] []
&= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]
&= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
&= help _PROGRAM_ABOUT
&= helpArg [explicit, name "help", name "h"]
&= program _PROGRAM_NAME

_PROGRAM_NAME = "myProg"
_PROGRAM_VERSION = "0.1.2.3"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_ABOUT = "a sample CmdArgs program for you tinkering pleasure"
_COPYRIGHT = "(C) Your Name Here 2011"

main :: IO ()
main = do
args <- getArgs
-- If the user did not specify any arguments, pretend as "--help" was given
opts <- (if null args then withArgs ["--help"] else id) getOpts
optionHandler opts

-- Before directly calling your main program, you should warn your user about incorrect arguments, if any.
optionHandler :: MyOptions -> IO ()
optionHandler opts@MyOptions{..}  = do
-- Take the opportunity here to weed out ugly, malformed, or invalid arguments.
when (null first_name) \$ putStrLn "--first-name is blank!" >> exitWith (ExitFailure 1)
when (age < 5) \$ putStrLn "you must be at least 5 years old to run this program" >> exitWith (ExitFailure 1)
-- When you're done, pass the (corrected, or not) options to your actual program.
exec opts

exec :: MyOptions -> IO ()
exec opts@MyOptions{..} = do
putStrLn \$ "Hello, " ++ firstname ++ "!"
putStrLn \$ "You are " ++ showAge ++ " years old."
where
firstname = if color
then "\x1b[1;31m" ++ first_name ++ "\x1b[0m"
else first_name
showAge = if color
then "\x1b[1;32m" ++ show age ++ "\x1b[0m"
else show age
```

Screenshot:

Don’t you just love how CmdArgs handles all the formatting and line breaking for you automagically? Anyway, onto the discussion.

You first specify your program’s options by declaring a data type (in this case, MyOptions). Pretty self-explanatory. The myProgOpts function is where you specify your program’s options. You can pass in “annotations” (as CmdArgs calls them) to your options. Here, I used the explicit annotation to prevent CmdArgs from auto-generating the shorter name for the age option. The typ &= “NAME” makes it so that we get “NAME” in the help message for the –first-name option. typDir is just a shortcut for typ &= “DIR” because it’s so common.

The def function is just a sane default type for many data types, such as Bool (False), String (“”), Int (0), and others. Generally, you should use def unless you really want a different default value for an option.

Notice how the option full_name turned into –full-name? The use of underscores in your options is converted into dashes by CmdArgs. Neat!

The getOpts function has some customizations: we use -v for –version instead of -V (the default), and also use -h for the short name for –help (-? is the default). We also specify the program’s metadata (copyright info, etc.).

In the main function, if no arguments are given, we display the default help message (and exit). It took me a while to figure this one out, so take heed!

If you don’t want the –verbose or –quiet options, then just remove the line “&= verbosityArgs …”.

The RecordWildCards pragma makes the code extremely concise and easy to use. I must admit, ever since stumbling upon this pragma, I’ve used it everywhere in my own code in other projects as well.

OK, so that’s what a simple single-mode program looks like. How about multi-mode programs? Here is one, multiMode.hs:

```-- multiMode.hs
-- License: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Console.CmdArgs
import System.Environment (getArgs, withArgs)
import System.Exit
import Control.Monad (when)

data MyOptions =
Mode1   { first_name :: String
, last_name :: String
}
|
Mode2   { height :: Double
, weight :: Double
} deriving (Data, Typeable, Show, Eq)

mode1 :: MyOptions
mode1 = Mode1
{ first_name = "FIRSTNAME" &= help "your first name"
, last_name = "LASTNAME" &= help "your last name"
}
&= details  [ "Examples:"
, "Blah blah blah."
]

mode2 :: MyOptions
mode2 = Mode2
{ height = def &= help "your height, in centimeters"
, weight = def &= help "your weight, in kilograms"
}
&= details  [ "Examples:"
, "Blah blah blah again."
]

myModes :: Mode (CmdArgs MyOptions)
myModes = cmdArgsMode \$ modes [mode1, mode2]
&= verbosityArgs [explicit, name "Verbose", name "V"] []
&= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]
&= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
&= help _PROGRAM_ABOUT
&= helpArg [explicit, name "help", name "h"]
&= program _PROGRAM_NAME

_PROGRAM_NAME = "myProg"
_PROGRAM_VERSION = "0.1.2.3"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_ABOUT = "a sample CmdArgs program for you tinkering pleasure"
_COPYRIGHT = "(C) Your Name Here 2011"

main :: IO ()
main = do
args <- getArgs
-- If the user did not specify any arguments, pretend as "--help" was given
opts <- (if null args then withArgs ["--help"] else id) \$ cmdArgsRun myModes
optionHandler opts

optionHandler :: MyOptions -> IO ()
optionHandler opts@Mode1{..}  = do
when (null first_name) \$ putStrLn "warning: --first-name is blank"
when (null last_name) \$ putStrLn "warning: --last-name is blank"
exec opts
optionHandler opts@Mode2{..}  = do
when (height == 0.0) \$ putStrLn "warning: --height is 0.0"
when (weight == 0.0) \$ putStrLn "warning: --weight is 0.0"
exec opts

exec :: MyOptions -> IO ()
exec opts@Mode1{..} = putStrLn \$ "Hello, " ++ first_name ++ " " ++ last_name ++ "!"
exec opts@Mode2{..} = putStrLn \$ "You are " ++ show height ++ "cm tall, and weigh " ++ show weight ++ "kg!"
```

Screenshot:

The program uses 2 modes, mode1 and mode2. If you use modes without overlapping letters, then you can invoke them with the least amount of unambiguous letters. E.g., if the modes were named foo and bar, you could just do “multiMode f …” and foo mode would be in effect. CmdArgs does this for us.

If your modes have lots and lots of options, then calling “multiMode -h” would give a brief summary, instead of giving each mode’s detailed help message. Here, the modes have only a couple options each, so that’s why the default help message is very detailed.

You might have noticed that I included a details annotation for both modes. This annotation is useful if you want to give examples on which options to use in what way.

Happy coding!

UPDATE April 26, 2011: Victor from the comments asked a question about including a custom type as an option parameter. I never needed this option, but, it’s do-able (with the new CmdArgs 0.6.9). Anyway, here is my minimal example:

```instance Default MyComplexType where
def = MyComplexType { brand = "BRAND"
, eggs = 0
, flowers = 0
, nest = Nestor { books = 0, tag = "TAG", author = "AUTHOR" }
}

data MyComplexType = MyComplexType
{ brand :: String
, eggs :: Int
, flowers :: Integer
, nest :: Nestor
} deriving (Data, Typeable, Show, Eq)

data Nestor = Nestor
{ books :: Int
, tag :: String
, author :: String
} deriving (Data, Typeable, Show, Eq)

data MyOptions = MyOptions
{ color :: Bool
, first_name :: String
, age :: Int
, directory :: FilePath
, custom :: MyComplexType
} deriving (Data, Typeable, Show, Eq)
```

Can you believe it? CmdArgs can even handle nested, custom complex types! Here’s how CmdArgs renders the above, in the help message:

CmdArgs will automatically expect a string of comma-separate values (CSVs). If you don’t like the auto-generated “ITEM,INT,INT,INT,ITEM,…” description, you could easily replace it with a custom string, with a &= typ “X,Y,Z,…” annotation.

If you don’t like the CSV format that CmdArgs imposes on your custom complex type, you’re out of luck. Your best option is to just use a single String type as the option argument, and then parse that with Parsec on your own (probably in the optionHandler function, in case of any errors) and then pass on the parsed data into your main program.

UPDATE July 21, 2011: I tried out CmdArgs with a custom algebraic datatype (something like “data Color = Red | Green | Blue | ColorNone”) and it still works! Your argument for this sort of data type will be, e.g., “–color-rgb red” and the “red” argument will be interpreted as the “Red” type. Actually, CmdArgs is very lenient and will accept any non-ambiguous argument, so here, “–color-rgb r” will also be interpreted as “Red”. Here is an example of the relevant portions:

```instance Default Color where
def = ColorNone

data Color = Red
| Green
| Blue
| ColorNone
deriving (Data, Typeable, Show, Eq)

myProgOpts = MyOptions
{ ...
, color_rgb = def &= help "select an RGB color (Red, Green, or Blue)"
...
}
```

UPDATE August 5, 2011: You can even specify an option as a list:

```data Format     = Text
| TeX
| HTML
| FNone
deriving (Data, Typeable, Show, Eq)

data MyOptions = MyOptions
{ ...
, format :: [Format]
, ...
} deriving (Data, Typeable, Show, Eq)

myProgOpts = MyOptions
{ ...
, format = [] &= help "format of output"
, ...
}
```

Here, you can do something like “-f TeX -f HTML -f Text” to set the format option to the value [TeX, HTML, Text]. This possibility to use a list as an option parameter is quite useful (such as the example here, where you want to specify multiple output targets).