Random Haskell Ephiphany: Using the liftM function

I’ve been tinkering with some monadic code and it just dawned on me that the often-used liftM function is just a shortcut (less typing) for a common use scenario which arises whenever you need to use a pure function along with an impure one.

For example, consider the following pseudocode:

randoms :: IO [Int]

Calling randoms generates a list of random Int values. Suppose we want just the first value from this list, so we naively write the following:

main :: IO ()
main = do
    rs <- randoms
    let r = head rs
    ...

But then you realize that you will only use just a single random Int, and that the variable rs is useless. So, you remember what return in Haskell means and then write this instead:

main :: IO ()
main = do
    r <- return . head =<< randoms
    ...

But this looks a bit awkward. This is where liftM comes in:

main :: IO ()
main = do
    r <- liftM head $ randoms
    ...

The code is now much simpler and cleaner. What’s not to like?

UPDATE July 15, 2012: I just realized that there is even a shorter solution, using the (<$>) function from the very useful Control.Applicative module:

main :: IO ()
main = do
    r <- head <$> randoms
    ...
Advertisements

The KISS PRNG (2011 version) in C and Haskell

Did you know that Dr. George Marsaglia (1924-2011), creator of the famed Diehard battery of randomness tests, devised a super-simple PRNG algorithm, just a month prior to his passing? Called the KISS PRNG (or Super-KISS, as there have been multiple KISS versions released previously by Marsaglia), this algorithm boasts a period in excess of 10^40million (10^40,000,000) — an astoundingly large number, many orders of magnitude larger than the famed Mersenne Twister‘s period (only 2^19,937 − 1, or only about 4.3 x 10^6,001 according to gcalctool). Plus, it’s so simple codewise, and also very fast.

Here’s the C implementation (adapted from Marsaglia’s own code):

/* AUTHOR: Shinobu (zuttobenkyou.wordpress.com) */
/* LICENSE: PUBLIC DOMAIN */
#include <stdint.h>

typedef uint64_t u64;

#define QSIZE 0x200000
#define CNG (cng = 6906969069ULL * cng + 13579)
#define XS (xs ^= (xs << 13), xs ^= (xs >> 17), xs ^= (xs << 43))
#define KISS (B64MWC() + CNG + XS)

static u64 QARY[QSIZE];
static int j;
static u64 carry;
static u64 xs;
static u64 cng;

void randk_reset(void)
{
	j = QSIZE - 1;
	carry = 0;
	xs = 362436069362436069ULL;
	cng = 123456789987654321ULL; /* use this as the seed */
}

u64 B64MWC(void)
{
	u64 t, x;
	j = (j + 1) & (QSIZE - 1);
	x = QARY[j];
	t = (x << 28) + carry;
	carry = (x >> 36) - (t < x);
	return (QARY[j] = t - x);
}

/* Initialize PRNG with default seed */
void randk_seed(void)
{
	u64 i;
	/* Seed QARY[] with CNG+XS: */
	for (i = 0; i < QSIZE; i++)
		QARY[i] = CNG + XS;
}

void randk_seed_manual(u64 seed)
{
	cng ^= seed;
	xs ^= cng;
	randk_seed();
}

void randk_warmup(int rounds)
{
	int i;
	/* Run through several rounds to warm up the state */
	for (i = 0; i < rounds; i++)
		randk();
}

/* Generate a pseudorandom 64-bit unsigned integer. */
u64 randk(void)
{
	return KISS;
}

Simple, eh? This algorithm is actually 3 PRNG’s in one: the 64-bit Multiply-With-Carry PRNG (B64MWC()), the XOR-Shift PRNG (XS), and the simple Linear Congruential PRNG (CNG). The exorbitant period comes from the fact that this algorithm relies on three different states of the three PRNGs to generate a random number.

Now, where does Haskell come into the picture? Well, I ported the code to Haskell because I wanted a simple PRNG that was of higher quality than the default System.Random RNG. Plus, if you look into the actual source of System.Random, here is an unnerving bit of code:

stdSplit            :: StdGen -> (StdGen, StdGen)
stdSplit std@(StdGen s1 s2)
                     = (left, right)
                       where
                        -- no statistical foundation for this!
                        left    = StdGen new_s1 t2
                        right   = StdGen t1 new_s2

                        new_s1 | s1 == 2147483562 = 1
                               | otherwise        = s1 + 1

                        new_s2 | s2 == 1          = 2147483398
                               | otherwise        = s2 - 1

                        StdGen t1 t2 = snd (next std)

See, the RandomGen type class requires the definition of next, split, and genRange functions (see this page). The split function’s purpose is to take one PRNG state, and give two distinct PRNG states, so that you can get multiple unique PRNG’s to work with (this comes up in functional programming in real practice — I speak from experience). The thing is, the statistical robustness of the split function for the StdGen PRNG that comes with Haskell, as can be seen in the source listing, is a bit… annoying/worrying.

Well, when I saw this, I thought: “Hey, why not use KISS? It has 3 PRNGs built into one, so when implementing split, it could just change the state of one of the PRNGs, and you’d get a *completely* different PRNG!” And so that’s exactly what I did:

-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module KISS where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))

type U64 = Word64

-- | This is the last KISS-type RNG (2011) that Dr. George Marsaglia (1924-2011)
-- released to the internet before his death. The only difference with this
-- version is that the kissMWCArraySize is 0xfff (4096), instead of 0x200000
-- (2,097,152), for purposes of speed. The period of the original was
-- approximated by Marsaglia as 10^40million, which is practically infinity for
-- most, if not all, needs for everyday programs. The reduced state size for the
-- MWC algorithm from 0x200000 to 0xfff should shorten the period, but it should
-- still be excellent for general usage; because KISS combines not only MWC but
-- also CNG (Congruential) and XSHF (XOR-shift) generators, the period should
-- still be very large.
--
-- TODO: Determine period of this KISS rng.
data KISSRNG = KISSRNG
	{ kissMWCArray :: UArray Int U64
	, kissMWCArraySize :: U64
	, kissMWCIndex :: U64
	, kissMWCCarry :: U64
	, kissCNG :: U64
	, kissXSHF :: U64
	}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
	where
	round1 b = xor b (unsafeShiftL b 13)
 	round2 b = xor b (unsafeShiftR b 17)
	round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
	{ kissMWCArray = array'
	, kissMWCIndex = index'
	, kissMWCCarry = carry'
	}
	where
	index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
	x = kissMWCArray ! (fromIntegral index')
	t = unsafeShiftL x 28 + kissMWCCarry
	carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
	array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
	{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) $ zip [0..] kissArray
	, kissMWCArraySize = kissStateSize
	, kissMWCIndex = kissStateSize - 1
	, kissMWCCarry = 0
	, kissCNG = cngWarmed
	, kissXSHF = xshfWarmed
	}
	where
	-- seed the MWC array with the Congruential and XOR-Shift RNG's
	(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
	step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
		where
		cng' = roundCNG cng
		xshf' = roundXSHF xshf
	-- default Congruential RNG seed
	seedCNG = 123456789987654321
	-- default XOR-Shift RNG seed
	seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
	where
	k' = roundB64MWC k
	kissMWC = (kissMWCArray k') ! (fromIntegral $ kissMWCIndex k')
	cng = roundCNG $ kissCNG k
	xshf = roundXSHF $ kissXSHF k

instance Show KISSRNG where
	show KISSRNG{..} = "kissMWC: [kissMWC..]"
		++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
		++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
		++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
		++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral $ kissStateSize - 1)) ++ "\n"
		++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
		++ "kissCNG: " ++ show kissCNG ++ "\n"
		++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
	next rng = (fromIntegral n, rng')
		where
		(n, rng') = randKISS rng
	split rng = (rng1, rng2)
		where
		rng1 = warmupXSHF rng
		rng2 = warmupMWC rng
	genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
	where
	warmup g' _ = snd $ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF $ kissXSHF rng}

In the implementation of split, you can clearly see that we simply warm up one of PRNGs (move on to the next state in the period) to get a new PRNG. Again, since KISS depends on all three PRNGs, simply changing the state of one of the PRNGs will give you a completely different PRNG.

Oh, the only weakness of the Haskell version is that its QSIZE is only 0xfff, not 0x200000 as in the original, for performance reasons. I certainly hope someone could improve the performance of the code and release it on Hackage (my code is hereby released into the PUBLIC DOMAIN, so do what you like with it), restoring the state size to 0x200000 as in Marsaglia’s original. Also, I’m not sure how large the period is, but judging by how the XOR-Shift PRNG has a large period on its own, it should still be very, very large with a 0xfff state size for the MWC algorithm.

I would sincerely appreciate it if someone more familiar with combinatorics/compsci could tell me what the size of the period is with a 0xfff state size for the MWC.

I was also pleasantly surprised by the very good quality of KISS. I used my code to write some random bits into a file, and used the ent program to judge the entroy of it. Here are the results:

Entropy = 7.999829 bits per byte.

Optimum compression would reduce the size
of this 1048576 byte file by 0 percent.

Chi square distribution for 1048576 samples is 248.29, and randomly
would exceed this value 60.65 percent of the times.

Arithmetic mean value of data bytes is 127.5231 (127.5 = random).
Monte Carlo value for Pi is 3.141895835 (error 0.01 percent).
Serial correlation coefficient is 0.001437 (totally uncorrelated = 0.0).

The results show that the KISS RNG has excellent quality random numbers. These figures make it as good (randomness-wise) as, e.g., the one based on AES encryption (AES in counter mode), which has also been analyzed with ent, as stated on the github page:

Using ent, a randomness property maker on one 1Mb sample.

cprng-AES:

Entropy = 7.999837 bits per byte.
Optimum compression would reduce the size of this 1048576 byte file by 0 percent.
Chi square distribution for 1048576 samples is 237.02.
Arithmetic mean value of data bytes is 127.3422 (127.5 = random).
Monte Carlo value for Pi is 3.143589568 (error 0.06 percent).

The rather ugly code I used to generate this file (and believe me, it took forever to generate a 1MiB file because the code is horribly unoptimized…) is below:

-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module Main where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))
import qualified Data.ByteString as BS

type U64 = Word64

data KISSRNG = KISSRNG
	{ kissMWCArray :: UArray Int U64
	, kissMWCArraySize :: U64
	, kissMWCIndex :: U64
	, kissMWCCarry :: U64
	, kissCNG :: U64
	, kissXSHF :: U64
	}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
	where
	round1 b = xor b (unsafeShiftL b 13)
 	round2 b = xor b (unsafeShiftR b 17)
	round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
	{ kissMWCArray = array'
	, kissMWCIndex = index'
	, kissMWCCarry = carry'
	}
	where
	index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
	x = kissMWCArray ! (fromIntegral index')
	t = unsafeShiftL x 28 + kissMWCCarry
	carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
	array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
	{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) $ zip [0..] kissArray
	, kissMWCArraySize = kissStateSize
	, kissMWCIndex = kissStateSize - 1
	, kissMWCCarry = 0
	, kissCNG = xor cngWarmed seed
	, kissXSHF = xor xshfWarmed seed
	}
	where
	-- seed the MWC array with the Congruential and XOR-Shift RNG's
	(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
	step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
		where
		cng' = roundCNG cng
		xshf' = roundXSHF xshf
	-- default Congruential RNG seed
	seedCNG = 123456789987654321
	-- default XOR-Shift RNG seed
	seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
	where
	k' = roundB64MWC k
	kissMWC = (kissMWCArray k') ! (fromIntegral $ kissMWCIndex k')
	cng = roundCNG $ kissCNG k
	xshf = roundXSHF $ kissXSHF k

instance Show KISSRNG where
	show KISSRNG{..} = "kissMWC: [kissMWC..]"
		++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
		++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
		++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
		++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral $ kissStateSize - 1)) ++ "\n"
		++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
		++ "kissCNG: " ++ show kissCNG ++ "\n"
		++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
	next rng = (fromIntegral n, rng')
		where
		(n, rng') = randKISS rng
	split rng = (rng1, rng2)
		where
		rng1 = warmupXSHF rng
		rng2 = warmupMWC rng
	genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
	where
	warmup g' _ = snd $ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF $ kissXSHF rng}

main :: IO ()
main = do
	let
		rng = makeKISSRNG 0
		(bytes1MiB, _) = genBytesKISS 0x100000 rng
	BS.writeFile "data" bytes1MiB

genBytesKISS :: U64 -> KISSRNG -> (BS.ByteString, KISSRNG)
genBytesKISS len kissrng = foldl' step (BS.empty, kissrng) [1..(div len 8)] -- divide by 8, b/c e.g., to generate 8 bytes, we only need 1 U64
	where
	step (bs, rng) _ = (foldl' BS.snoc bs $ octets u64, rng')
		where
		(u64, rng') = randKISS rng

smallerChunks :: [U64] -> [Word8]
smallerChunks = concatMap octets

-- | Get a number and split it up into 8 8-bit parts (64 bits total).
octets :: (Bits a, Integral a) => a -> [Word8]
octets w = map (\n -> fromIntegral $ shiftR w n) . reverse $ take 8 [0,8..]

Here are the C and Haskell standalone versions that prove that the Haskell port behaves in the same way as the C version, given the right starting seed values and state size (both 0xfff for the MWC PRNG):

C standalone version (compile with gcc -o ckiss kiss.c):

/* AUTHOR: Shinobu (zuttobenkyou.wordpress.com) */
/* LICENSE: PUBLIC DOMAIN */
#include <stdio.h>
#include <inttypes.h>
#include <stdint.h>

typedef uint64_t u64;

#define QSIZE 0xfff
#define CNG (cng = 6906969069ULL * cng + 13579)
#define XS (xs ^= (xs << 13), xs ^= (xs >> 17), xs ^= (xs << 43))
#define KISS (B64MWC() + CNG + XS)

static u64 QARY[QSIZE];
static int j;
static u64 carry;
static u64 xs;
static u64 cng;

u64 B64MWC(void)
{
	u64 t, x;
	j = (j + 1) & (QSIZE - 1);
	x = QARY[j];
	t = (x << 28) + carry;
	carry = (x >> 36) - (t < x);
	return (QARY[j] = t - x);
}

/* Initialize PRNG with default seed */
void randk_seed(void)
{
	u64 i;
	j = QSIZE - 1;
	carry = 0;
	xs = 362436069362436069ULL;
	cng = 123456789987654321ULL;
	/* Seed QARY[] with CNG+XS: */
	for (i = 0; i < QSIZE; i++)
		QARY[i] = CNG + XS;
}

/* Generate a pseudorandom 64-bit unsigned integer. */
u64 randk(void)
{
	return KISS;
}

int main(void)
{
	randk_seed();
	printf("randk_seed called!\n");
	printf("KISS idx: %"PRIu64"\n", j);
	printf("qary[idx] is: %"PRIu64"\n", QARY[j]);
	printf("qary[0] is: %"PRIu64"\n", QARY[0]);
	printf("qary[QSIZE - 1] is: %"PRIu64"\n", QARY[QSIZE - 1]);
	printf("carry: %"PRIu64"\n", carry);
	printf("cng: %"PRIu64"\n", cng);
	printf("xs: %"PRIu64"\n", xs);
	u64 x = KISS;
	printf("\nKISS called! KISS num is: %"PRIu64"\n", x);
	printf("\nKISS idx: %"PRIu64"\n", j);
	printf("qary[idx] is: %"PRIu64"\n", QARY[j]);
	printf("qary[0] is: %"PRIu64"\n", QARY[0]);
	printf("qary[QSIZE - 1] is: %"PRIu64"\n", QARY[QSIZE - 1]);
	printf("carry: %"PRIu64"\n", carry);
	printf("cng: %"PRIu64"\n", cng);
	printf("xs: %"PRIu64"\n", xs);

	printf("x + 18334599312639636657 is: %"PRIu64"\n", x + 18334599312639636657ULL);
}

Haskell standalone version (run with runhaskell kiss.hs):

-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module KISS where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))

type U64 = Word64

data KISSRNG = KISSRNG
	{ kissMWCArray :: UArray Int U64
	, kissMWCArraySize :: U64
	, kissMWCIndex :: U64
	, kissMWCCarry :: U64
	, kissCNG :: U64
	, kissXSHF :: U64
	}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
	where
	round1 b = xor b (unsafeShiftL b 13)
 	round2 b = xor b (unsafeShiftR b 17)
	round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
	{ kissMWCArray = array'
	, kissMWCIndex = index'
	, kissMWCCarry = carry'
	}
	where
	index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
	x = kissMWCArray ! (fromIntegral index')
	t = unsafeShiftL x 28 + kissMWCCarry
	carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
	array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
	{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) $ zip [0..] kissArray
	, kissMWCArraySize = kissStateSize
	, kissMWCIndex = kissStateSize - 1
	, kissMWCCarry = 0
	, kissCNG = cngWarmed
	, kissXSHF = xshfWarmed
	}
	where
	-- seed the MWC array with the Congruential and XOR-Shift RNG's
	(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
	step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
		where
		cng' = roundCNG cng
		xshf' = roundXSHF xshf
	-- default Congruential RNG seed
	seedCNG = 123456789987654321
	-- default XOR-Shift RNG seed
	seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
	where
	k' = roundB64MWC k
	kissMWC = (kissMWCArray k') ! (fromIntegral $ kissMWCIndex k')
	cng = roundCNG $ kissCNG k
	xshf = roundXSHF $ kissXSHF k

instance Show KISSRNG where
	show KISSRNG{..} = "kissMWC: [kissMWC..]"
		++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
		++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
		++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
		++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral $ kissStateSize - 1)) ++ "\n"
		++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
		++ "kissCNG: " ++ show kissCNG ++ "\n"
		++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
	next rng = (fromIntegral n, rng')
		where
		(n, rng') = randKISS rng
	split rng = (rng1, rng2)
		where
		rng1 = warmupXSHF rng
		rng2 = warmupMWC rng
	genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
	where
	warmup g' _ = snd $ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF $ kissXSHF rng}

main :: IO ()
main = do
	let rng = makeKISSRNG 0
	putStrLn $ show rng

EDIT May 3, 2012: Sorry about the somewhat redundant-looking code listings, but I’m too lazy/busy to clean them up.
EDIT May 4, 2012: Alternate period number for the Mersenne Twister, for easier comparison of period size with KISS.

Easy Solutions to Hard Problems?

I’ve been trudging along knee-deep in tree manipulation algorithms recently, all in Haskell. I admit, calling them “algorithms” makes me sound much smarter than I really am… what I actually worked on was converting one type of tree into another type of tree (functionA) and then back again (functionB), such that I would get back the original tree — all in a purely functional, stateless, recursive way. My brain hurt a lot. I hit roadblock after roadblock; still, I managed to go through each hurdle by sheer effort (I never studied computer science in school).

See, I have a habit of solving typical Haskell coding problems in a very iterative fashion. Here are the steps I usually follow:

  1. Research the problem domain (Wikipedia, StackOverflow, etc.)
  2. Take lots of loose notes on paper (if drawing is required), or on the computer (using emacs’ Org-mode).
  3. Write temporary prototype functions to get the job done.
  4. Test said function with GHCi.
  5. Repeat Steps 1 through 4 until I am satisfied with the implementation.

The steps usually work after one or two iterations. But for hard problems, I would end up going through many more (failed) iterations. Over and over again, hundreds of lines of Haskell would get written, tested, then ultimately abandoned because their flawed design would dawn on me halfway through Step 3. Then I would get burned out, and spend the rest of the day away from the computer screen, doing something completely different. But on the following day, I would cook up a solution from scratch in an hour.

It’s such a strange feeling. You try really hard for hours, days, weeks even, failing again and again. Then, after a brief break, you just figure it out, with less mental effort than all the effort you put in previously.

What can explain this phenomenon? The biggest factor is obviously the learning process itself — that is, it takes lots of failures to familiarize yourself intimately with the problem at hand. But the way in which the solution comes to me only after a deliberate pause, a complete separation from the problem domain, fascinates me. I call it the “Pause Effect” (PE), because I’m too lazy to dig up the right psychological term for this that probably exists already.

So, here’s my new guideline for solving really hard problems:

  1. Try to solve problem in a “brute-force” manner. Don’t stop until you burn out. I call this the “Feynman Step”, after a quote from the televised “Take the World from Another Point of View” (1973) interview, where, toward the end, he describes a good conversation partner — someone who has gone as far as he could go in his field of study. Every time I’m in this step, I think of how animated Feynman gets in his interview, and it fires me up again.
  2. Rest one or two days, then come back to it. This is the “PE Step”.

The best part is when you ultimately find the solution — it feels incredible. You get this almost intoxicating feeling of empowerment in yourself and in your own abilities.

Shinobu’s Haskell Coding Style (SHCS)

I’ve been coding in Haskell for a couple years. Now, Haskell does not really have an established coding style, like C (e.g., the excellent Linux Kernel Coding Style (LKCS)). This is partly due to the extremely flexible nature of the indentation syntax alone — Haskell enjoys the unique honor of having significant whitespace, but at the same time allowing the use of optional curly braces and semicolons for “imperative-styled” do-notation (do-notation is quite common in Haskell). I present to you my own coding style, aptly named “Shinobu’s Haskell Coding Style” — it stresses, above all, consistency and predictable indentation. Since I love the LKCS so much, I decided to use tabs — how you set the actual length of the tab is up to you and your editor-of-choice, but I personally use 4-space-wide tabs.

A picture is worth a thousand words, so here is a “picture” of SHCS in action, with ample pseudocode for your reading pleasure:

{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE OverloadedStrings #-}
module Foo.Bar where

import Some.Standard.System.Library
import Some.Standard.System.Library2 (func)

import Foo
import Quux

-- | A haddock-compatible comment.
funcWithLongTypeSignature
	:: Eq a
	=> Int
	-> Int
	-> Int
	-> Int
	-> IO Int
funcWithLongTypeSignature a b c d = do
	let a' = foo a
	e <- blah a' b
	let
		c' = foo c
		d' = foo d
	f <- bleh c' d'
	-- A comment about (someTest f). Blah blah blah blah blah blah blah blah
	-- blah blah blah.
	when (someTest f)
		. func1
		. func2
		$ func3 arg
	when (someTest2 f)
		. func1
		. func2
		. func3
		. func4
		$ arg
	return $ e + f
	where
	e a b = do
		...
		return x
	f c d = case convert c of
		Z n -> return n
		_ -> return y
		where
		-- Break up 80+ character lines into parts.
		convert c = hoolaaloolaa'
			. hoolaaloolaa''
			. hoolaaloolaa'''
			$ hoolaaloolaa'''' c
		-- This version is also acceptable, if it increases legibility:
		--
		-- convert c = hoolaaloolaa'
		--	. hoolaaloolaa''
		--	. hoolaaloolaa'''
		--	. hoolaaloolaa''''
		--  $ c

someLongList :: [(Int, String)]
someLongList =
	[ (1, "a")
	, (2, "b")
	, (3, "c")
	, (4, "d")
	, (5, "e")
	, (6, "f")
	]

listComprehension :: [Int]
listComprehension =
	[ (a, b)
	| a <- [1..]
	, b <- [2..]
	, a + b < 100
	]

baz :: Int -> Int -> Char
baz a b c
	| a == 0 = 'o'
	| b == 100 = if True
		then if True
			then 'p'
			else 's'
		else 'q'
	| c == 22 = 'z'
	| otherwise = f 'k'
	where
	f c = blah $ bleh c


longStringConcatenation :: String
longStringConcatenation = "The quick "
	++ "fox "
	++ "jumps "
	++ "over "
	++ "the "
	++ "lazy "
	++ "evaluation."

data Geometry = Geometry
	{ geoGeo :: Int
	, geoGeoGeo :: Double
	} deriving (Eq)

instance Show Geometry where
	show Geometry{..} = show geoGeo
		++ show geoGeoGeo
		++ "!"

recordConstruction :: Geometry
recordConstruction = Geometry
	{ geoGeo = 1
	, geoGeoGeo = 2.0
	}

data SomeEnum
	= SEA
	| SEB
	| SEC
	| SED
	| SEE
	deriving (Eq, Enum, Ord, Show)

quaff :: SomeEnum -> Int
quaff s = case s of
	SEA -> 1
	SEB -> 2
	SEC -> 3
	SED -> 4
	SEE -> 5

WordPress renders the indentation tabs as spaces — sorry about that. Anyway, my greatest “triumph” was when I discovered that you could write the “where” clause at the same level of indentation as all the functions below it (e.g., functionWithLongTypeSignature). This reduced a lot of my code’s indentation, because I prefer to use where clauses wherever possible.

Here are the rules:

  • Place LANGUAGE extensions at the beginning of the file, with one extension per line.
  • Import standard, “system” modules first on its own block. Add all of your own modules on their own block below it. List all modules in alphabetical order.
  • 80 character limit. Try very hard to keep lines within 80 characters. This rule is especially pertinent to type signatures. Speaking of which…
  • Write explicit type signatures for all top-level functions. You may also want to write type signatures for functions underneath “where” clauses (but I think you have to use the ScopedTypeVariables extension to do this).
  • Use the “leading comma” rule when writing long lists or data structure definitions (e.g., the “Geometry” example above).
  • Indent long string concatenations (++).
  • Use GHC with at least the -Wall and -Werror flags.
  • When writing a case expression for a type that is like a C enum (e.g., SomeEnum above), explicitly write all the possible enumeration symbols for that type, instead of writing (n – 1) of them and using the “_” catch-all syntax at the end.
  • Use spaces between arithmetic components (e.g., “(n + 1)” instead of “(n+1)”.
  • Avoid use of curly braces/semicolons for do-notation blocks.
  • Avoid use of let statements, and instead use where clauses. The use of a where clause forces you to write in a top-down manner, with the “overall picture” first, followed by smaller component functions. I’d rather be faced with a one-liner function followed by 100 helper functions under a where clause, rather than first seeing the 100 helper functions in a gigantic let expression, followed by the one-liner.
  • Try to avoid nesting where clauses as much as possible.
  • When confronted with a long line, break it up into smaller parts (by function, if you’re composing many of them, or by argument, if it’s just one function with many arguments) and indent the “tail” of these parts, each on their own line (see “convert c” in the example).
  • Prefer to use the dollar symbol operator ($) instead of using parentheses. E.g., (foo $ bar x) instead of (foo (bar x)).
  • Instead of (foo (bar (baz (quux x)))), write (foo . bar . baz $ quux x). I believe this is an “unstated” style rule among Haskellers, as it is. And so I formally state it here.
  • Catch-all rule: if the code is short enough, it is optional whether you write it in one line or multiple (indented) lines (e.g., list comprehensions, lists, etc.)

These are just a handful of rules, but they go a long way to make things look consistent. And, because the indentation is predictable, it makes the code easier to read.

Intro to QuickCheck and Hpc

And you thought my Haskell honeymoon was over.

So the other day, I finally got around to using the legendary QuickCheck testing library for a semi-serious project of mine. And I must say that I am very impressed with QuickCheck — it caught two subtle (but serious) errors in my core functions — in day one! I’m going to explain some basic concepts to help you get started with QuickCheck immediately; afterwards I’ll throw in a brief discussion about the Haskell Program Coverage (“Hpc”) tool, too, since it goes well with QuickCheck.

QuickCheck

The point of QuickCheck is to generate tons of random, junk values (the nastier-looking, the better) to feed into your functions. Well, they’re not really junk values, from a type system perspective, but the point is to generate lots of edge-case values that we forget to consider when writing the code. QuickCheck supports testing of pure functions as well as monadic ones (ST monad), but I’ll just focus on the pure functions because that’s what I have experience with.

So the next question is, how does QuickCheck generate random values? Well, if your function uses any of the basic types like Int, [Integer], Double, Word64, etc., QuickCheck knows how to generate the random values already. But if you use a custom type, or a unique combination of ordinary types (e.g., (Maybe Int, Maybe Int)), you have to instruct QuickCheck how to generate it. You do this by writing an instance for QuickCheck’s Arbitrary typeclass.

So let’s say my custom type is (Maybe Int, Maybe Int), and that it has a special requirement: if it is (Just a, Just b), then b must be greater than a. No problem.

newtype MPair = MPair (Maybe Int, Maybe Int)
    deriving (Show)

instance Arbitrary MPair where
    arbitrary = do
        -- First, generate a random Int.
        a <- arbitrary :: Gen Int
        -- Now generate another random Int, but such that it is greater than *a*.
        b <- suchThat (arbitrary :: Gen Int) (>a)
        -- Now choose between either (Just a), or Nothing.
        a' <- elements [Just a, Nothing]
        -- Repeat the same coin flip for *b*.
        b' <- elements [Just b, Nothing]
        -- Return the random result.
        return $ MPair (a', b')

There is a method to the madness! There are other useful combinators besides suchThat and elements, of course.

So now you’ve told QuickCheck how to generate the random values to stress-test your functions. The next step is to define the stress-tests. In the QuickCheck world, such testing functions are used to test very specific properties of your functions that must hold true no matter what you throw at it. By convention, these functions are named “prop_[function name]”.

So here’s an example, prop_isTouching, which, appropriately enough, tests the isTouching function..

prop_isTouching :: MPair -> MPair -> Bool
prop_isTouching x y =
    isTouching x' y' == isTouching y' x'
    where
    x' = fromMPair x
    y' = fromMPair y

isTouching :: (Ord a) => (Maybe a, Maybe a) -> (Maybe a, Maybe a) -> Bool
isTouching p@(pStart, pEnd) q@(qStart, qEnd)
    -- if no pair has both of its values defined, then just return False;
    -- technically this notion is incorrect, but the greater codebase behind this
    -- interprets it this way, so it's OK
    | not (isBoth p) || not (isBoth q) = False
    -- if one or both pairs are empty, then return False
    | isBlank p || isBlank q = False
    | isLeft q = inP qStart'
    | isRight q = inP qEnd'
    | isLeft p = inQ pStart'
    | isRight p = inQ pEnd'
    | otherwise = inP qStart'
        || inP qEnd'
        || inQ pStart'
        || inQ pEnd'
    where
    pStart' = fromJust pStart
    pEnd' = fromJust pEnd
    qStart' = fromJust qStart
    qEnd' = fromJust qEnd
    inP = isInside (pStart', pEnd')
    inQ = isInside (qStart', qEnd')

isLeft
    , isRight
    , isBoth
    , isBlank :: (Maybe a, Maybe a) -> Bool
isLeft (a, b) = isJust a && isNothing b
isRight (a, b) = isNothing a && isJust b
isBoth (a, b) = isJust a && isJust b
isBlank (a, b) = isNothing a && isNothing b

fromMPair :: MPair -> (Maybe Int, Maybe Int)
fromMPair (MPair a) = a

isInside :: (Ord a) => (a, a) -> a -> Bool
isInside (a, b) c = a <= c && c <= b

You might be wondering, “Hey, I thought you were going to test a function that takes (Maybe Int, Maybe Int), not (Maybe a, Maybe a)!” Well, believe me, it pays off a lot to write higher-order functions like this that work on multiple types. The fact that we can test it using Maybe Ints (a very simple type) is just one benefit. Notice how I’ve made sure to restrict isTouching‘s arguments to the Ord typeclass, since we expect the right hand value in the pair to be greater than the one on the left (if it exists). The fancy Arbitrary instance up above was not in vain.

Anyway, the isTouching function merely checks to see if one given pair “falls in” or “touches” the other pair of values. It’s a pretty mundane function, but such functions often form the backbone of the rest of your code, so it’s really important to get these 100% right. The various helper functions like isLeft, isRight, fromMPair, etc. may seem annoying, as if they get in the way of the example to test isTouching itself. But think about it: all of these auxiliary functions will, by virtue of their necessity, be tested by calling prop_isTouching! And if prop_isTouching keeps failing (and your custom type or function is too complex for a 10-minute debugging session), you can always add more prop_ functions to test these auxiliary functions in isolation. Haskell embraces small, cute functions, and so should you!

The prop_isTouching function itself is straightforward enough: it tests the “commutative” property of isTouching — that the order of the arguments does not matter.

So far so good. Now we just need to run prop_isTouching hundreds (or thousands) of times to see if it holds. QuickCheck defines some basic test running functions, for use in the IO monad. The simplest one is aptly named quickCheck. So, we can run the function above like so:

import Test.QuickCheck

import [Your testing module, where prop_isTouching resides]

main :: IO ()
main = quickCheck prop_isTouching

This will run prop_isTouching 100 times with the default testing settings. But default settings are usually not desirable (for me, I only caught errors when I “upped” the number of runs to 1000, among other things). Oh, and quickCheck will not tell you the name of the function it is testing. So here is a more useful version:

{-# LANGUAGE RecordWildCards #-}
import System.Exit
import Test.QuickCheck

import [Your testing module, where prop_isTouching resides]

-- Rigorous test arguments.
rigorous :: Args
rigorous = Args
    { replay = Nothing
    , maxSuccess = 1000 -- tests to run
    , maxDiscard = 1000 -- the number of tests that are thrown out and ignored b/c of "==>" conditions, before "giving up" and failing due to too many discarded tests
    , maxSize = 1000 -- if a prop_ function uses a list ([]) type, maxSize is the max length of the randomly generated list
    , chatty = True
    }

-- Quick test arguments.
quick :: Args
quick = Args
    { replay = Nothing
    , maxSuccess = 100
    , maxDiscard = 100
    , maxSize = 100
    , chatty = True
    }

runTests :: [String] -> IO ()
runTests as = case as of
    [] -> runTests' quick
    a -> case head a of
        "1" -> runTests' quick
        "2" -> runTests' rigorous
        _ -> runTests' quick
    where
    runTests' :: Args -> IO ()
    runTests' testArgs = do
        -- if all of your prop_ functions are of the same type, you can put
        -- them in a list and use mapM_ instead
        f prop_isTouching "prop_isTouching"
        f prop_someOtherFunc1 "someOtherFunc1"
        f prop_someOtherFunc2 "someOtherFunc2"
        f prop_someOtherFunc3 "someOtherFunc3"
        f prop_someOtherFunc4 "someOtherFunc4"
        where
        f prop str = do
            putStrLn str
            quitOnFail =<< quickCheckWithResult testArgs prop
        quitOnFail r = case r of
            -- pattern match with just two dots with RecordWildCards because I'm lazy
            Success{..} -> return ()
            _ -> exitFailure

main :: IO ()
main = getArgs >>= runTests

If you compile the above as “test”, then running “./test 2” will use the “rigorous” test settings. The key difference is that instead of quickCheck, we use quickCheckWithResult. With it, we can provide our choice of test settings (the Args type), and also get some feedback on what the test results were. For simplicity’s sake, we only check if the test was a complete success; if it’s anything else, we abort immediately.

Here’s a quick note about the term “shrink” that you might encounter: if QuickCheck spots a failure, it will first try to shrink the size of the random input repeatedly while maintaining the “failure” result (and QuickCheck will tell you about how many shrinks were performed). This is to help you work with reasonably small values (esp. useful if you have list arguments with hundreds of items each, like in prop_foo above).)

The careful reader would have wondered what the “==>” was about in the comments. Well, the “==>” function is called the “implication” function, and is used to throw out invalid values before running the prop_ function. It’s another way to customize the random value, sort of like how we defined a custom Arbitrary instance up above for the MPair type synonym. For example,

prop_foo :: [Int] -> [Int] -> Prop
prop_foo xs ys =
    not (null xs) ==> -- xs must not be null
    (length ys > 3) ==> -- ys must have at least 4 elements
    foo xs ys == foo ys xs

and it only gets to the “foo xs ys == bar xs ys” part if the two statements above are true. The only difference is that we have to end up with a Prop type instead of Bool, as was the case in prop_isTouching. No other part of your testing code needs to change. The advantage in using (==>) is its ease of use — you can trivially write any rare edge-case condition that you know does not conform to the spec, without bothering to write an Arbitrary instance. However, the disadvantage is that QuickCheck will waste time generating invalid inputs before it gets to the test the function in question. Because of this, you should first try to create an instance of Arbitrary before going with (==>).

Here is a version without using (==>) just for illustrative purposes:

import Data.List
import Test.QuickCheck

main = verboseCheck prop_foo

newtype Xs = Xs [Int]
    deriving (Show)
newtype Ys = Ys [Int]
    deriving (Show)

prop_foo :: Xs -> Ys -> Bool
prop_foo (Xs xs) (Ys ys) =
    foo xs ys == foo ys xs

foo :: [Int] -> [Int] -> [Int]
foo xs ys = sort $ xs ++ ys

instance Arbitrary Xs where
    arbitrary = do
    -- xs must not be null
    xs <- suchThat (arbitrary :: Gen [Int]) (not . null)
    return $ Xs xs

instance Arbitrary Ys where
    arbitrary = do
    -- ys must have at least 4 elements
    ys <- suchThat (arbitrary :: Gen [Int]) ((>3) . length)
    return $ Ys ys

That’s it! You now know enough to get started with QuickCheck today. No excuses! Make it a New Year’s resolution for 2012, if you have to!

QuickCheck has taught me to be skeptical of the “if it compiles, it is OK” Haskell attitude — it has really opened my eyes. I now want QuickCheck as a bare minimum testing setup for all of my Haskell code. No tests = poor design (Medieval II: Total War, I’m looking at you…).

QuickCheck can even be used to test C code with the Foreign Function Interface (FFI), so I’m forever tempted to go back to my other C projects and test them with QuickCheck as well (can you imagine the millions of bugs that lie hidden in C code everywhere — that could be exposed with just a little bit of Haskell knowledge and QuickCheck by the everyday developer? Suddenly, I am overcome with optimism… yet I fear that it is really pessimism underneath… hmm.)

Hpc

I said I would mention Hpc at the beginning. So, what does it do?

Simply put, Hpc tells you which codepaths were used (or not used) after running a binary executable. This is a perfect match for the binary you used for running QuickCheck tests, because during the course of its run QuickCheck will have fed tens of thousands of random values into your functions and have visited most, if not all, parts of your code. Thankfully, Hpc is included with GHC, so you already have it! Just compile your binary with the -fhpc flag. Here is the process:

  1. Compile your program with -fhpc. (ghc -fhpc –make foo.hs)
  2. Run your program. (./foo)
  3. Your program should have generated a *.tix metadata file, as well as a .hpc folder with *.mix files for hpc to analyze.
  4. Run hpc report foo for a quick plaintext report on your codepaths.
  5. Run hpc markup foo to generate prettified HTML files for a more comprehensive overview.
  6. Rinse and repeat. Remove the old *.tix files if you don’t want combined results.

One big caveat: make sure to delete any existing object files (*.o, *.hi) that you had earlier which were produced without the -fhpc flag! Otherwise, those portions of code will not generate hpc metadata! Alternatively, instead of deleting every *.o or *.hi file you have down your project’s folder tree, you could also pass the -fforce-recompile flag to ghc (this is the new name for the deprecated -no-recomp flag, which was used (sans explanation!) in Real World Haskell, Chapter 11).

Go forth, fellow Haskell newbies — embrace testing and code coverage analysis today!

Simple Password Generation with Haskell

So, I’ve been using a custom password generator for a while. It’s great because it has some useful settings like “only generate alphanumeric passwords” or “only generate letters” to deal with stupid, legacy-code websites that refuse to modernize their password handling code. I use this generator to create 50-character passwords, because hey, if it’s randomly generated, then you might as well generate really long passwords! I recently upgraded the pseudorandom number generator (PRNG) to use a cryptographically secure PRNG (CPRNG), just for fun.

Anyway, here is the program:

-- LICENSE: PUBLIC DOMAIN
module Main where

import Control.Monad.State
import Crypto.Random.AESCtr
import Data.Binary (decode)
import qualified Data.ByteString.Lazy as B
import Data.List (nub)
import IO
import System (getArgs)
import System.IO (hSetEcho)

keysChar, keysNum, keysPunc, keysCharNum, keysAll, keysHex :: String
keysChar = ['a'..'z'] ++ ['A'..'Z']
keysHex = ['a'..'f']
keysNum = ['0'..'9']
keysPunc = "`~!@#$%^&*()-_=+[{]}\\|;:'\",<.>/? "
keysCharNum = keysChar ++ keysNum
keysAll = keysChar ++ keysNum ++ keysPunc

giveKey ::  String -> Char -> Int -> Char
giveKey keysCustom c n = extractChar $ case c of
    'i'  -> (keysNum ++ keysHex)
    'j'  -> keysNum
    'k'  -> keysChar
    'l'  -> keysCharNum
    ';'  -> keysPunc
    'h'  -> (keysCharNum ++ keysCustom)
    '\n' -> ['\n']
    _    -> keysAll
    where
    extractChar xs = xs!!mod n (length xs)

showRandomKey :: String -> StateT AESRNG IO ()
showRandomKey keysCustom = handleKey =<< liftIO getChar
    where
    handleKey key = case key of
        '\n' -> liftIO (putChar '\n') >> showRandomKey keysCustom
        'q' -> (liftIO $ putStrLn "\nBye!") >> return ()
        _ -> mapM_ f [0..(49)::Int] >> (liftIO $ putStrLn []) >> showRandomKey keysCustom
        where
        f _ = liftIO
            . putChar
            . giveKey keysCustom key
            . (\n -> mod n (length (keysAll ++ keysCustom) - 1))
            =<< aesRandomInt

aesRandomInt :: StateT AESRNG IO Int
aesRandomInt = do
    aesState <- get
    let (bs, aesState') = genRandomBytes aesState 16
    put aesState'
    return (decode $ B.fromChunks [bs])

main :: IO ()
main = do
    hSetBuffering stdin NoBuffering -- disable buffering from STDIN
    hSetBuffering stdout NoBuffering -- disable buffering from STDOUT
    hSetEcho stdin False -- disable terminal echo
    as <- getArgs
    let as' = filter (\c -> elem c keysAll) . nub $ unwords as
    mapM_ putStrLn
        [ []
        , "poke: 'q'     quit"
        , "      'j'     number"
        , "      'k'     letter"
        , "      'l'     alphanumeric"
        , "      ';'     punctuation"
        , "      'h'     alphanumeric" ++ (if null as' then [] else " + " ++ as')
        , "      'i'     hexadecimal"
        , "      'ENTER' newline"
        , "      else    any"
        , []
        ]
    aesState <- makeSystem -- gather entropy from the system to use as the initial seed
    _ <- runStateT (showRandomKey as') aesState -- enter loop
    return ()

Yes, the CPRNG used is the Advanced Encryption Standard (AES) algorithm in counter mode. It sure beats using the default System.Random module in terms of… coolness (for the purposes of this application, the use of a CPRNG over a regular PRNG really gives no benefit). Anyway, here is a sample run:

$ poke \!@#$%
poke: 'q'     quit
      'j'     number
      'k'     letter
      'l'     alphanumeric
      ';'     punctuation
      'h'     alphanumeric + !@#$%
      'i'     hexadecimal
      'ENTER' newline
      else    any

%g@c@geisBaqK81ihYkEC5NyUrWXU2ndCKr3wHkklpjvCWF9I0
zaJ,!z{db.|~vR7,MvOnPU-5v7N;cCy'],bl/d;^s[hI}RM?j>
KfdJKVygPSOufgllMAZlEaLWHSHpDrHIcgmryETcEsx5uUWlQb
49388c0958e9c51f2f4dc06a8097eb8169f715b4fcfb3ca555

I named it “poke” because I couldn’t think of any other short, UNIX-y name.

The program takes a string as input and treats it as a special class of characters to consider when generating a password with the ‘h’ key. This way, you can fit into the requirements of legacy website code that say something like, “You may use alphanumeric characters and also ‘!’ ‘@’ ‘#’ ‘$’ and ‘%’ symbols”.

The output above is from pressing ‘h’, then SPACE, then ‘l’ and finally ‘i’. You could easily extend it to take more fine-tuned options, such as a shorter password length instead of the default 50. (Hint: use this post.)

The interesting side effect of using 50-character long, randomly generated passwords is that I myself do not know these passwords! The only thing I remember is the single password used for my GnuPG private key, used to decrypt the master password file.

In case you are curious, my complete setup regarding passwords is as follows: I store all of my passwords in plaintext in a “master” file, then encrypt it with GnuPG. I use a git repo to track this encrypted file, so that I can sanely remove/delete old passwords without worrying about how to get it back if I need it. Once in a while, I use a simple shell command, gpg2 -d mypasswords.gpg | less, to view the passwords for entry into some website (hooray for copy and paste!). If I need to update/add/delete passwords, I just decrypt the master file, then edit it, and re-encrypt it and commit the change into git (btw, I decrypt it into a RAM partition to avoid leaving any traces of the plaintext file).

The GnuPG private key used to encrypt the master file is itself encrypted with CAST5 (again using GnuPG) and tracked in git. The CAST5 encryption (with the “-c” flag) is a way to encrypt a file with symmetric encryption — i.e., you supply the same password for encryption and decryption, and plus you don’t need a GnuPG key to do it! (You wouldn’t want to encrypt it using another GnuPG key, because then you’d need to encrypt the key for that GnuPG key as well — recursion!) I constantly sync the git repo holding these important, encrypted files across all of my personal machines, thumb drives, etc., for reliability and convenience.

I could learn to use KeyPass or some other software to manage my passwords, but, I’m too lazy. I’m also too paranoid to trust anyone other than myself to handle my passwords. I also take care not to store anything *too* valuable into the passwords file, or anything I type into any file anywhere, just to play it safe.

EDIT: Hey, this is my 100th post! Congratulations to me!

Parsec Example Revisited (Again): Parsing Lazy ByteStrings

So in this post, I said that I would parse ByteStrings next time just to show how easy it is. Well, I’ve done just that. Since most of the code is identical, I’m going to post a diff after the full code.

-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
module Main where

import qualified Data.ByteString.Lazy as BL
import System.Console.CmdArgs.Implicit
import System.IO
import System.Environment
import System.Exit
import System.Process
import Text.Parsec.Char hiding (upper)
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.String
import qualified Text.Parsec.ByteString.Lazy as PB
import qualified Text.Parsec.Token as PT
import Text.Parsec.Language (emptyDef)
import Control.Monad.Identity

data Opts = Opts
    { all_devices :: Bool
    , unmount :: Bool
    , unmount_all :: Bool
    , discover :: Bool
    , no_color :: Bool
    } deriving (Data, Typeable, Show, Eq)

progOpts :: Opts
progOpts = Opts
    { all_devices = def &= help "mount all USB devices"
    , unmount = def &= help "choose a USB device to unmount"
    , unmount_all = def &= name "U" &= help "unmount all USB devices"
    , discover = def &= help "list all mounted/unmounted USB devices"
    , no_color = def &= help "disable colors"
    }
    &= details
        [ "Notes:"
        , ""
        , "The default behavior without any options is to try to mount a USB device."
            ++ " Here, `device' means a device under the /dev directory, and in our context, is actually a file system partition."
            ++ " Many USB drives have only a single partition, in which case the term `device' means both the USB drive and the single partition it has."
        , ""
        , "Also, allowing the $USER to execute the mount and umount commands with sudo privileges (sudo visudo) will make things less clunky."
        ]

getOpts :: IO Opts
getOpts = cmdArgs $ progOpts
    &= summary (_PROGRAM_INFO ++ ", " ++ _COPYRIGHT)
    &= program _PROGRAM_NAME
    &= help _PROGRAM_DESC
    &= helpArg [explicit, name "help", name "h"]
    &= versionArg [explicit, name "version", name "v", summary _PROGRAM_INFO]

_PROGRAM_NAME, _PROGRAM_VERSION, _PROGRAM_INFO, _PROGRAM_DESC, _COPYRIGHT :: String
_PROGRAM_NAME = "usbmnt"
_PROGRAM_VERSION = "0.1.0"
_PROGRAM_INFO = _PROGRAM_NAME ++ " version " ++ _PROGRAM_VERSION
_PROGRAM_DESC = "mount/unmount USB device(s)"
_COPYRIGHT = "(C) Linus Arver 2011"

data BlockDevice = BlockDevice
    { shortname :: String
    , uuid :: UUID
    , fsys :: String
    , mountPoint :: MountPoint
    } deriving (Eq)

data MountPoint
    = MPath { path :: FilePath }
    | Swap
    | Unmounted
    | UnknownBlkidVal
    deriving (Eq)

instance Show BlockDevice where
    show BlockDevice{..} = unwords
        [ shortname
        , fsys
        , uuid
        , show mountPoint
        ]

instance Show MountPoint where
    show (MPath path) = path
    show Swap = "Swap"
    show Unmounted = "Unmounted"
    show UnknownBlkidVal = "UnknownBlkidVal"

blockdeviceDefault :: BlockDevice
blockdeviceDefault = BlockDevice
    { shortname = ""
    , uuid = ""
    , fsys = ""
    , mountPoint = MPath {path = ""}
    }

data Config = Config
    { fsyss :: [(String, String)]
    , uuids :: [(UUID, String)]
    } deriving (Eq, Show)

_ALPHANUM :: String
_ALPHANUM = ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']

data Color
    = Red
    | Green
    | Yellow
    | Blue
    | CNone
    deriving (Show, Eq)

colorize :: Color -> String -> String
colorize c s = case c of
    Blue -> "\x1b[1;34m" ++ s ++ "\x1b[0m"
    Green -> "\x1b[1;32m" ++ s ++ "\x1b[0m"
    Red -> "\x1b[1;31m" ++ s ++ "\x1b[0m"
    Yellow -> "\x1b[1;33m" ++ s ++ "\x1b[0m"
    _ -> s

main :: IO ()
main = do
    hSetBuffering stdout NoBuffering
    hSetBuffering stderr NoBuffering
    opts <- getOpts
    homeDir <- getEnv "HOME"
    errNo <- argsCheck opts homeDir
    when (errNo > 0) $ exitWith $ ExitFailure errNo
    (devs, takenPaths) <- getDevices opts
    let configLoc = homeDir ++ "/.usbmnt"
    configSrc <- BL.readFile configLoc
    (confErrNo, config) <- parseConfig configSrc configLoc
    when (confErrNo > 0) $ exitWith $ ExitFailure confErrNo
    let mountablePaths = filter (\p -> not $ elem p takenPaths) $ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
        devsKV = zip (map show [(1::Int)..]) . zip devs $ mountablePaths
    prog opts config devsKV

argsCheck :: Opts -> String -> IO Int
argsCheck Opts{..} homeDir
    | null homeDir = e "could not get environment variable $HOME" 1
    | otherwise = return 0
    where
        e :: String -> Int -> IO Int
        e str num = errMsg str >> return num

prog :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
prog opts@Opts{..} config devsKV
    | discover = do
        putStrLn "all devices:"
        mapM_ (\(_, (d, _)) -> putStrLn $ cshow d) devsKV
    | otherwise = do
        putStrLn (if (unmount || unmount_all)
            then "USB device(s) to unmount:"
            else "USB device(s) to mount:")
        mapM_ (\(n, (d, _)) -> putStrLn $ "    " ++ n ++ ") " ++ show' d) devsKV
        putStrLn ""
        mountMenu opts config devsKV
    where
        cshow :: BlockDevice -> String
        cshow b@BlockDevice{..}
            | no_color = show b
            | otherwise = case mountPoint of
                Unmounted -> colorize Green $ show b
                MPath _ -> if not $ null $ getUSBMountPath b
                    then colorize Blue $ show b
                    else show b
                _ -> show b
        show' :: BlockDevice -> String
        show' = if not (unmount || unmount_all)
            then show
            else unwords . init . words . show

mountMenu :: Opts -> Config -> [(String, (BlockDevice, FilePath))] -> IO ()
mountMenu Opts{..} config devsKV
    | unmount = if length devsKV == 1
        then do
            putStrLn "only 1 USB device to unmount"
            tryMount False config (snd . head $ devsKV) >>= exitWith
        else chooseDev prompt devsKV (tryMount False config)
    | unmount_all = do
        putStrLn "unmounting all USB devices..."
        mapM_ (tryMount False config) (map snd devsKV)
        return ()
    | all_devices = do
        putStrLn "mounting all USB devices..."
        mapM_ (tryMount True config) (map snd devsKV)
        return ()
    | length devsKV == 1 = do
        putStrLn "only 1 USB device to mount"
        tryMount True config (snd . head $ devsKV) >>= exitWith
    | otherwise = chooseDev prompt devsKV (tryMount True config)
    where
        prompt :: String
        prompt = if (unmount || unmount_all)
            then "choose USB device to unmount (q to exit)"
            else "choose USB device to mount (q to exit)"

chooseDev :: String -> [(String, (BlockDevice, FilePath))] -> ((BlockDevice, FilePath) -> IO ExitCode) -> IO ()
chooseDev prompt devsKV func = do
    putStrLn prompt
    key <- getLine
    case lookup key devsKV of
        Just dev -> func dev >>= exitWith
        _ -> case key of
            "q" -> return ()
            _ -> chooseDev prompt devsKV func

tryMount :: Bool -> Config -> (BlockDevice, FilePath) -> IO ExitCode
tryMount mount config@Config{..} (bd@BlockDevice{..}, mp)
    | (null margs) = do
        errMsg $ "UUID " ++ squote uuid ++ " was not found in config file"
        errMsg $ "filesystem " ++ squote fsys ++ " was also not found in config file"
        errMsg $ "supported file systems: " ++ (unwords $ map fst fsyss)
        exitWith (ExitFailure 1)
    | otherwise = do
    when mount $ do
        if (null $ mountArgsUUID config uuid)
            then putStrLn $ "filesystem " ++ squote fsys ++ " found in config file"
            else putStrLn $ "UUID " ++ squote uuid ++ " found in config file"
        putStrLn $ "using these arguments: " ++ squote margs
    putStr $ (if mount then "" else "un")
        ++ "mounting "
        ++ shortname
        ++ " (" ++ fsys ++ ") "
        ++ (if mount then "to " ++ mp else "from " ++ show mountPoint)
        ++ ".."
    (_, _, _, p) <- createProcess $ cmd margs shortname
    exitStatus <- waitForProcess p
    if (exitStatus == ExitSuccess)
        then do
            putStrLn "OK"
            return ExitSuccess
        else do
            putStr "FAILED\n"
            errMsg $ (if mount
                then "mount error (perhaps " ++ squote mp ++ " does not exist)"
                else "unmount error")
            return (ExitFailure 1)
    where
        margs = mountArgs config bd
        cmd arguments devPath = CreateProcess
            { cmdspec = ShellCommand (if mount
                then "sudo mount -t " ++ arguments ++ " " ++ devPath ++ " " ++ mp ++ " &>/dev/null"
                else "sudo umount " ++ show mountPoint)
            , cwd = Nothing
            , env = Nothing
            , std_in = CreatePipe
            , std_out = CreatePipe
            , std_err = Inherit
            , close_fds = False
            }

mountArgs :: Config -> BlockDevice -> String
mountArgs Config{..} BlockDevice{..} = case lookup uuid uuids of
    Just a -> a
    _ -> case lookup fsys fsyss of
        Just a -> a
        _ -> []

mountArgsUUID :: Config -> UUID -> String
mountArgsUUID Config{..} uuid' = case lookup uuid' uuids of
    Just a -> a
    _ -> []

getDevices :: Opts -> IO ([BlockDevice], [String])
getDevices Opts{..} = do
    (_, sout, _, p) <- createProcess cmdBlkid
    devs <- case sout of
        Just h -> hGetContents h
        Nothing -> return []
    _ <- waitForProcess p
    let devs' = (map (unwords . words)) . drop 2 . lines $ devs
    devs'' <- mapM parseBlkid devs'
    let toMount = filter (\BlockDevice{..} -> mountPoint == Unmounted) devs''
        toUnmount = filter (\dev -> not $ null $ getUSBMountPath dev) devs''
        takenPaths = filter (not . null) . map getUSBMountPath $ devs''
    when (not discover && null toMount && (not (unmount || unmount_all))) $ do
        errMsg $ "cannot find USB devices to mount"
        exitWith (ExitFailure 1)
    when (not discover && null toUnmount && (unmount || unmount_all)) $ do
        errMsg $ "cannot find USB devices to unmount"
        exitWith (ExitFailure 1)
    return $ formatDevs devs'' toMount toUnmount takenPaths
    where
        formatDevs :: [BlockDevice] -> [BlockDevice] -> [BlockDevice] -> [String] -> ([BlockDevice], [String])
        formatDevs ds m um takenPaths
            | discover = (ds, takenPaths)
            | unmount || unmount_all = (um, takenPaths)
            | otherwise = (m, takenPaths)
        cmdBlkid = CreateProcess
            { cmdspec = ShellCommand ("sudo blkid -o list")
            , cwd = Nothing
            , env = Nothing
            , std_in = CreatePipe
            , std_out = CreatePipe
            , std_err = Inherit
            , close_fds = False
            }

getUSBMountPath :: BlockDevice -> String
getUSBMountPath BlockDevice{..} = case mountPoint of
    MPath str -> if take 6 str == "/mnt/u" && (all (\c -> elem c ['0'..'9']) (drop 6 str))
        then str
        else ""
    _ -> ""

errMsg :: String -> IO ()
errMsg msg = hPutStrLn stderr $ "error: " ++ msg

squote :: String -> String
squote s = "`" ++ s ++ "'"

-- Parsing

-- for parsing the computer-generated output of `sudo blkid -o list'
parserIdentifier :: Parser String
parserIdentifier = many1 $ oneOf $ _ALPHANUM ++ "/-_"

parserWhitespace :: Parser String
parserWhitespace = many1 $ oneOf " \t\n\r"

parserMP :: Parser MountPoint
parserMP =
    try ( do
        a <- oneOf "<("
        b <- manyTill anyChar (lookAhead $ (oneOf ">)"))
        _ <- oneOf ">)"
        let mp = case a of
                '<' -> Swap
                '(' -> case b of
                    "not mounted" -> Unmounted
                    _ -> UnknownBlkidVal
                _ -> UnknownBlkidVal
        return mp
        )
    <|> (parserIdentifier >>= (\s -> return MPath {path = s}))
    <?> "blkid's mount point description"

blkidParser :: Parser BlockDevice
blkidParser =
    try ( do
        sname <- parserIdentifier
        _ <- parserWhitespace
        fs <- parserIdentifier
        _ <- parserWhitespace
        _ <- parserIdentifier -- leave out the "label" column, even if it exists
        _ <- parserWhitespace
        mp <- parserMP
        _ <- parserWhitespace
        uid <- parserIdentifier
        eof
        return BlockDevice
           { shortname = sname
           , uuid = uid
           , fsys = fs
           , mountPoint = mp
           }
        )
    <|>
    do  sname <- parserIdentifier
        _ <- parserWhitespace
        fs <- parserIdentifier
        _ <- parserWhitespace
        mp <- parserMP
        _ <- parserWhitespace
        uid <- parserIdentifier
        eof
        return BlockDevice
            { shortname = sname
            , uuid = uid
            , fsys = fs
            , mountPoint = mp
            }
    <?> "5 or 4 fields to parse"

parseBlkid :: String -> IO BlockDevice
parseBlkid src =
    case parse blkidParser "output of `sudo blkid -o list'" src of
        Left parseError -> errMsg (show parseError) >> return blockdeviceDefault
        Right result -> return result

-- we use a LanguageDef so that we can get whitespace/newline parsing for FREE
-- in our .usbmnt file
configDef :: PT.GenLanguageDef BL.ByteString () Identity
configDef = emptyDef
    { PT.commentStart   = ""
    , PT.commentEnd     = ""
    , PT.commentLine    = "#"
    , PT.nestedComments = False
    -- the identStart/identLetter define what a UUID will look like (a
    -- dash-separated hex number)
    , PT.identStart     = oneOf $ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F']
    , PT.identLetter    = oneOf $ ['0'..'9'] ++ ['a'..'f'] ++ ['A'..'F'] ++ "-"
    , PT.opStart        = char '.'
    , PT.opLetter       = char '.'
    , PT.reservedOpNames= []
    , PT.reservedNames  = []
    , PT.caseSensitive  = True
    }

-- we call makeTokenParser def and pick out just those we need
lexer :: PT.GenTokenParser BL.ByteString () Identity
lexer = PT.makeTokenParser configDef

p_identifier :: ParsecT BL.ByteString () Identity String
p_identifier = PT.identifier lexer
p_stringLiteral :: ParsecT BL.ByteString () Identity String
p_stringLiteral = PT.stringLiteral lexer
p_whiteSpace :: ParsecT BL.ByteString () Identity ()
p_whiteSpace = PT.whiteSpace lexer
p_braces :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity a
p_braces = PT.braces lexer
p_commaSep :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity [a]
p_commaSep = PT.commaSep lexer
p_symbol :: String -> ParsecT BL.ByteString () Identity String
p_symbol = PT.symbol lexer

type UUID = String

assocParser :: PB.Parser String -> PB.Parser (UUID, String)
assocParser keyParser = do
    key <- keyParser
    _ <- many $ oneOf " \t"
    _ <- string "="
    _ <- many $ oneOf " \t"
    mountOpts <- p_stringLiteral
    return (key, mountOpts)
    <?> "a key-value association"

hashParser :: String -> PB.Parser String -> PB.Parser [(String, String)]
hashParser hashName keyParser = do
    _ <- p_symbol hashName
    _ <- p_symbol "="
    a <- p_braces (p_commaSep $ assocParser keyParser)
    return a
    <?> "a " ++ hashName ++ " curly brace block"

configParser :: PB.Parser Config
configParser = do
    p_whiteSpace -- take care of leading whitespace/comments as defined by configDef
    -- parse FSYS_HASH first
    fsyss' <- hashParser "FSYS_HASH" (many1 alphaNum)
    p_whiteSpace
    -- now parse UUID_HASH
    uuids' <- hashParser "UUID_HASH" (p_identifier)
    eof
    return $ Config {fsyss = fsyss', uuids = uuids'}
    <?> "config with FSYS_HASH and UUID_HASH blocks"

parseConfig :: BL.ByteString -> String -> IO (Int, Config)
parseConfig src loc =
    case parse configParser ("config file at " ++ squote loc) src of
        Left parseError -> errMsg (show parseError) >> return (1, Config [] [])
        Right result -> return (0, result)

And here is the diff:

--- usbmnt.hsold	2011-11-09 11:43:02.871554967 -0800
+++ usbmnt.hsnew	2011-11-09 11:43:09.761453413 -0800
@@ -2,6 +2,7 @@
 {-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}
 module Main where
 
+import qualified Data.ByteString.Lazy as BL
 import System.Console.CmdArgs.Implicit
 import System.IO
 import System.Environment
@@ -11,6 +12,7 @@
 import Text.Parsec.Combinator
 import Text.Parsec.Prim
 import Text.Parsec.String
+import qualified Text.Parsec.ByteString.Lazy as PB
 import qualified Text.Parsec.Token as PT
 import Text.Parsec.Language (emptyDef)
 import Control.Monad.Identity
@@ -126,7 +128,7 @@
     when (errNo > 0) $ exitWith $ ExitFailure errNo
     (devs, takenPaths) <- getDevices opts
     let configLoc = homeDir ++ "/.usbmnt"
-    configSrc <- readFile configLoc
+    configSrc <- BL.readFile configLoc
     (confErrNo, config) <- parseConfig configSrc configLoc
     when (confErrNo > 0) $ exitWith $ ExitFailure confErrNo
     let mountablePaths = filter (\p -> not $ elem p takenPaths) $ map (\p -> "/mnt/u" ++ show p) [(0::Int)..]
@@ -379,7 +381,7 @@
 
 -- we use a LanguageDef so that we can get whitespace/newline parsing for FREE
 -- in our .usbmnt file
-configDef :: PT.LanguageDef st
+configDef :: PT.GenLanguageDef BL.ByteString () Identity
 configDef = emptyDef
     { PT.commentStart   = ""
     , PT.commentEnd     = ""
@@ -397,25 +399,25 @@
     }
 
 -- we call makeTokenParser def and pick out just those we need
-lexer :: PT.TokenParser ()
+lexer :: PT.GenTokenParser BL.ByteString () Identity
 lexer = PT.makeTokenParser configDef
 
-p_identifier :: ParsecT String () Identity String
+p_identifier :: ParsecT BL.ByteString () Identity String
 p_identifier = PT.identifier lexer
-p_stringLiteral :: ParsecT String () Identity String
+p_stringLiteral :: ParsecT BL.ByteString () Identity String
 p_stringLiteral = PT.stringLiteral lexer
-p_whiteSpace :: ParsecT String () Identity ()
+p_whiteSpace :: ParsecT BL.ByteString () Identity ()
 p_whiteSpace = PT.whiteSpace lexer
-p_braces :: ParsecT String () Identity a -> ParsecT String () Identity a
+p_braces :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity a
 p_braces = PT.braces lexer
-p_commaSep :: ParsecT String () Identity a -> ParsecT String () Identity [a]
+p_commaSep :: ParsecT BL.ByteString () Identity a -> ParsecT BL.ByteString () Identity [a]
 p_commaSep = PT.commaSep lexer
-p_symbol :: String -> ParsecT String () Identity String
+p_symbol :: String -> ParsecT BL.ByteString () Identity String
 p_symbol = PT.symbol lexer
 
 type UUID = String
 
-assocParser :: Parser String -> Parser (UUID, String)
+assocParser :: PB.Parser String -> PB.Parser (UUID, String)
 assocParser keyParser = do
     key <- keyParser
     _ <- many $ oneOf " \t"
@@ -425,7 +427,7 @@
     return (key, mountOpts)
     <?> "a key-value association"
 
-hashParser :: String -> Parser String -> Parser [(String, String)]
+hashParser :: String -> PB.Parser String -> PB.Parser [(String, String)]
 hashParser hashName keyParser = do
     _ <- p_symbol hashName
     _ <- p_symbol "="
@@ -433,7 +435,7 @@
     return a
     <?> "a " ++ hashName ++ " curly brace block"
 
-configParser :: Parser Config
+configParser :: PB.Parser Config
 configParser = do
     p_whiteSpace -- take care of leading whitespace/comments as defined by configDef
     -- parse FSYS_HASH first
@@ -445,7 +447,7 @@
     return $ Config {fsyss = fsyss', uuids = uuids'}
     <?> "config with FSYS_HASH and UUID_HASH blocks"
 
-parseConfig :: String -> String -> IO (Int, Config)
+parseConfig :: BL.ByteString -> String -> IO (Int, Config)
 parseConfig src loc =
     case parse configParser ("config file at " ++ squote loc) src of
         Left parseError -> errMsg (show parseError) >> return (1, Config [] [])

Like I said, pretty straightforward, right? The code actually only uses lazy bytestrings for parsing the configuration file. The output of the blkid command is still parsed with the old way (native Haskell strings). The only tricky part is to import the bytestring stuff (Data.ByteString.Lazy and Text.Parsec.ByteString.Lazy) as qualified imports to avoid namespace clashes. The configDef function’s type signature had to change a bit because Parsec 3.1.1 does not have a convenience type alias for LanguageDef that uses bytestrings. To be honest, I couldn’t figure out the type signatures for p_identifier, p_symbol, etc., so I had GHC do it for me. But I mean, it all took like 5-10 minutes. Easy.

So now you can parse bytestrings — lazily!