Skype on Linux: Startup Crash Fix

I’ve been using Skype recently and it kept crashing randomly a few seconds after startup. I managed to salvage the error message, which states as follows:

skype: pcm.c:2811: snd_pcm_areas_copy: Assertion `src_areas' failed.

Apparently, I’m not the first to notice this problem. I learned from that link that Microsoft is in no hurry to fix this longstanding, horrible bug that is a simple 1-liner fix. The fix is as follows:

mkdir ~/.Skype/Logs

Shame on the current Skype developers.

Update May 31, 2012: Apparently, Skype automatically creates huge logfiles in the ~/.Skype/Logs directory, and there is no way to turn off this useless feature (the logfiles are written in a proprietary Skype-only format, and is only designed to help Skype fix bugs if anything goes wrong; no personal data (chat history, etc.) is written in those logs according to online sources). I was bitten by this problem earlier this evening when I realized that my home partition (a meager 9GB) filled up with a 2GB Skype logfile, which prevented me from downloading a file online. The solution is simple: make the logfile a symlink into /dev/null — this way, you don’t have to periodically delete the logfiles — they are automatically “deleted” as they are being written! I knew I could take advantage of /dev/null someday and this is the perfect time for it! EDIT June 1, 2012: Actually, Skype did crash once mysteriously because of this, so just make a symlink to some partition with more space available, and then run a script upon boot up/power off to delete all files in that directory. Sorry about the bad advice!

There you go — happy Skyping!

Update June 3, 2012: OK, so apparently there is a way to create a “blackhole” directory, much in the same spirit of the special /dev/null file. This stackexchange page addresses the same concern, and led me to nullfs, which is a FUSE program that allows you to mount /dev/null-like filesystems from userspace (as regular programs, without root privileges). The code for these programs are just a few hundred lines long and compile very quickly. There are three flavors (nul1fs, nullfs, and nulnfs) and I’ve tested out nullfs to great success. You can create a “blackhole” like directory where files can be created into, but which does not take up any disk space. I just ran Skype again right now and the output of “lsof | grep skype” looks very normal (the log files in ~/.Skype/Logs are indeed created in Logs). So here’s how to do it: compile nullfs with

g++ -o nullfs -lfuse nullfs

and then create a regular directory anywhere with

mkdir blackhole

and then mount the nullfs filesystem on that directory with

path-to-nullfs-binary path-to-blackhole-directory 

and you’re good to go. For our use case, the blackhole directory would just be ~/.Skype/Logs. The nullfs binary backgrounds itself so you don’t need to worry about backgrounding/disowning the process. You can put the one-liner nullfs invocation into your startup script of choice, and you’re all set! I haven’t tried out the nul1fs or nulnfs flavors, because I didn’t need to. Hooray for FUSE!

The KISS PRNG (2011 version) in C and Haskell

Did you know that Dr. George Marsaglia (1924-2011), creator of the famed Diehard battery of randomness tests, devised a super-simple PRNG algorithm, just a month prior to his passing? Called the KISS PRNG (or Super-KISS, as there have been multiple KISS versions released previously by Marsaglia), this algorithm boasts a period in excess of 10^40million (10^40,000,000) — an astoundingly large number, many orders of magnitude larger than the famed Mersenne Twister‘s period (only 2^19,937 − 1, or only about 4.3 x 10^6,001 according to gcalctool). Plus, it’s so simple codewise, and also very fast.

Here’s the C implementation (adapted from Marsaglia’s own code):

/* AUTHOR: Shinobu (zuttobenkyou.wordpress.com) */
/* LICENSE: PUBLIC DOMAIN */
#include <stdint.h>

typedef uint64_t u64;

#define QSIZE 0x200000
#define CNG (cng = 6906969069ULL * cng + 13579)
#define XS (xs ^= (xs << 13), xs ^= (xs >> 17), xs ^= (xs << 43))
#define KISS (B64MWC() + CNG + XS)

static u64 QARY[QSIZE];
static int j;
static u64 carry;
static u64 xs;
static u64 cng;

void randk_reset(void)
{
	j = QSIZE - 1;
	carry = 0;
	xs = 362436069362436069ULL;
	cng = 123456789987654321ULL; /* use this as the seed */
}

u64 B64MWC(void)
{
	u64 t, x;
	j = (j + 1) & (QSIZE - 1);
	x = QARY[j];
	t = (x << 28) + carry;
	carry = (x >> 36) - (t < x);
	return (QARY[j] = t - x);
}

/* Initialize PRNG with default seed */
void randk_seed(void)
{
	u64 i;
	/* Seed QARY[] with CNG+XS: */
	for (i = 0; i < QSIZE; i++)
		QARY[i] = CNG + XS;
}

void randk_seed_manual(u64 seed)
{
	cng ^= seed;
	xs ^= cng;
	randk_seed();
}

void randk_warmup(int rounds)
{
	int i;
	/* Run through several rounds to warm up the state */
	for (i = 0; i < rounds; i++)
		randk();
}

/* Generate a pseudorandom 64-bit unsigned integer. */
u64 randk(void)
{
	return KISS;
}

Simple, eh? This algorithm is actually 3 PRNG’s in one: the 64-bit Multiply-With-Carry PRNG (B64MWC()), the XOR-Shift PRNG (XS), and the simple Linear Congruential PRNG (CNG). The exorbitant period comes from the fact that this algorithm relies on three different states of the three PRNGs to generate a random number.

Now, where does Haskell come into the picture? Well, I ported the code to Haskell because I wanted a simple PRNG that was of higher quality than the default System.Random RNG. Plus, if you look into the actual source of System.Random, here is an unnerving bit of code:

stdSplit            :: StdGen -> (StdGen, StdGen)
stdSplit std@(StdGen s1 s2)
                     = (left, right)
                       where
                        -- no statistical foundation for this!
                        left    = StdGen new_s1 t2
                        right   = StdGen t1 new_s2

                        new_s1 | s1 == 2147483562 = 1
                               | otherwise        = s1 + 1

                        new_s2 | s2 == 1          = 2147483398
                               | otherwise        = s2 - 1

                        StdGen t1 t2 = snd (next std)

See, the RandomGen type class requires the definition of next, split, and genRange functions (see this page). The split function’s purpose is to take one PRNG state, and give two distinct PRNG states, so that you can get multiple unique PRNG’s to work with (this comes up in functional programming in real practice — I speak from experience). The thing is, the statistical robustness of the split function for the StdGen PRNG that comes with Haskell, as can be seen in the source listing, is a bit… annoying/worrying.

Well, when I saw this, I thought: “Hey, why not use KISS? It has 3 PRNGs built into one, so when implementing split, it could just change the state of one of the PRNGs, and you’d get a *completely* different PRNG!” And so that’s exactly what I did:

-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module KISS where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))

type U64 = Word64

-- | This is the last KISS-type RNG (2011) that Dr. George Marsaglia (1924-2011)
-- released to the internet before his death. The only difference with this
-- version is that the kissMWCArraySize is 0xfff (4096), instead of 0x200000
-- (2,097,152), for purposes of speed. The period of the original was
-- approximated by Marsaglia as 10^40million, which is practically infinity for
-- most, if not all, needs for everyday programs. The reduced state size for the
-- MWC algorithm from 0x200000 to 0xfff should shorten the period, but it should
-- still be excellent for general usage; because KISS combines not only MWC but
-- also CNG (Congruential) and XSHF (XOR-shift) generators, the period should
-- still be very large.
--
-- TODO: Determine period of this KISS rng.
data KISSRNG = KISSRNG
	{ kissMWCArray :: UArray Int U64
	, kissMWCArraySize :: U64
	, kissMWCIndex :: U64
	, kissMWCCarry :: U64
	, kissCNG :: U64
	, kissXSHF :: U64
	}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
	where
	round1 b = xor b (unsafeShiftL b 13)
 	round2 b = xor b (unsafeShiftR b 17)
	round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
	{ kissMWCArray = array'
	, kissMWCIndex = index'
	, kissMWCCarry = carry'
	}
	where
	index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
	x = kissMWCArray ! (fromIntegral index')
	t = unsafeShiftL x 28 + kissMWCCarry
	carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
	array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
	{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) $ zip [0..] kissArray
	, kissMWCArraySize = kissStateSize
	, kissMWCIndex = kissStateSize - 1
	, kissMWCCarry = 0
	, kissCNG = cngWarmed
	, kissXSHF = xshfWarmed
	}
	where
	-- seed the MWC array with the Congruential and XOR-Shift RNG's
	(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
	step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
		where
		cng' = roundCNG cng
		xshf' = roundXSHF xshf
	-- default Congruential RNG seed
	seedCNG = 123456789987654321
	-- default XOR-Shift RNG seed
	seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
	where
	k' = roundB64MWC k
	kissMWC = (kissMWCArray k') ! (fromIntegral $ kissMWCIndex k')
	cng = roundCNG $ kissCNG k
	xshf = roundXSHF $ kissXSHF k

instance Show KISSRNG where
	show KISSRNG{..} = "kissMWC: [kissMWC..]"
		++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
		++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
		++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
		++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral $ kissStateSize - 1)) ++ "\n"
		++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
		++ "kissCNG: " ++ show kissCNG ++ "\n"
		++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
	next rng = (fromIntegral n, rng')
		where
		(n, rng') = randKISS rng
	split rng = (rng1, rng2)
		where
		rng1 = warmupXSHF rng
		rng2 = warmupMWC rng
	genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
	where
	warmup g' _ = snd $ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF $ kissXSHF rng}

In the implementation of split, you can clearly see that we simply warm up one of PRNGs (move on to the next state in the period) to get a new PRNG. Again, since KISS depends on all three PRNGs, simply changing the state of one of the PRNGs will give you a completely different PRNG.

Oh, the only weakness of the Haskell version is that its QSIZE is only 0xfff, not 0x200000 as in the original, for performance reasons. I certainly hope someone could improve the performance of the code and release it on Hackage (my code is hereby released into the PUBLIC DOMAIN, so do what you like with it), restoring the state size to 0x200000 as in Marsaglia’s original. Also, I’m not sure how large the period is, but judging by how the XOR-Shift PRNG has a large period on its own, it should still be very, very large with a 0xfff state size for the MWC algorithm.

I would sincerely appreciate it if someone more familiar with combinatorics/compsci could tell me what the size of the period is with a 0xfff state size for the MWC.

I was also pleasantly surprised by the very good quality of KISS. I used my code to write some random bits into a file, and used the ent program to judge the entroy of it. Here are the results:

Entropy = 7.999829 bits per byte.

Optimum compression would reduce the size
of this 1048576 byte file by 0 percent.

Chi square distribution for 1048576 samples is 248.29, and randomly
would exceed this value 60.65 percent of the times.

Arithmetic mean value of data bytes is 127.5231 (127.5 = random).
Monte Carlo value for Pi is 3.141895835 (error 0.01 percent).
Serial correlation coefficient is 0.001437 (totally uncorrelated = 0.0).

The results show that the KISS RNG has excellent quality random numbers. These figures make it as good (randomness-wise) as, e.g., the one based on AES encryption (AES in counter mode), which has also been analyzed with ent, as stated on the github page:

Using ent, a randomness property maker on one 1Mb sample.

cprng-AES:

Entropy = 7.999837 bits per byte.
Optimum compression would reduce the size of this 1048576 byte file by 0 percent.
Chi square distribution for 1048576 samples is 237.02.
Arithmetic mean value of data bytes is 127.3422 (127.5 = random).
Monte Carlo value for Pi is 3.143589568 (error 0.06 percent).

The rather ugly code I used to generate this file (and believe me, it took forever to generate a 1MiB file because the code is horribly unoptimized…) is below:

-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module Main where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))
import qualified Data.ByteString as BS

type U64 = Word64

data KISSRNG = KISSRNG
	{ kissMWCArray :: UArray Int U64
	, kissMWCArraySize :: U64
	, kissMWCIndex :: U64
	, kissMWCCarry :: U64
	, kissCNG :: U64
	, kissXSHF :: U64
	}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
	where
	round1 b = xor b (unsafeShiftL b 13)
 	round2 b = xor b (unsafeShiftR b 17)
	round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
	{ kissMWCArray = array'
	, kissMWCIndex = index'
	, kissMWCCarry = carry'
	}
	where
	index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
	x = kissMWCArray ! (fromIntegral index')
	t = unsafeShiftL x 28 + kissMWCCarry
	carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
	array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
	{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) $ zip [0..] kissArray
	, kissMWCArraySize = kissStateSize
	, kissMWCIndex = kissStateSize - 1
	, kissMWCCarry = 0
	, kissCNG = xor cngWarmed seed
	, kissXSHF = xor xshfWarmed seed
	}
	where
	-- seed the MWC array with the Congruential and XOR-Shift RNG's
	(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
	step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
		where
		cng' = roundCNG cng
		xshf' = roundXSHF xshf
	-- default Congruential RNG seed
	seedCNG = 123456789987654321
	-- default XOR-Shift RNG seed
	seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
	where
	k' = roundB64MWC k
	kissMWC = (kissMWCArray k') ! (fromIntegral $ kissMWCIndex k')
	cng = roundCNG $ kissCNG k
	xshf = roundXSHF $ kissXSHF k

instance Show KISSRNG where
	show KISSRNG{..} = "kissMWC: [kissMWC..]"
		++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
		++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
		++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
		++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral $ kissStateSize - 1)) ++ "\n"
		++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
		++ "kissCNG: " ++ show kissCNG ++ "\n"
		++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
	next rng = (fromIntegral n, rng')
		where
		(n, rng') = randKISS rng
	split rng = (rng1, rng2)
		where
		rng1 = warmupXSHF rng
		rng2 = warmupMWC rng
	genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
	where
	warmup g' _ = snd $ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF $ kissXSHF rng}

main :: IO ()
main = do
	let
		rng = makeKISSRNG 0
		(bytes1MiB, _) = genBytesKISS 0x100000 rng
	BS.writeFile "data" bytes1MiB

genBytesKISS :: U64 -> KISSRNG -> (BS.ByteString, KISSRNG)
genBytesKISS len kissrng = foldl' step (BS.empty, kissrng) [1..(div len 8)] -- divide by 8, b/c e.g., to generate 8 bytes, we only need 1 U64
	where
	step (bs, rng) _ = (foldl' BS.snoc bs $ octets u64, rng')
		where
		(u64, rng') = randKISS rng

smallerChunks :: [U64] -> [Word8]
smallerChunks = concatMap octets

-- | Get a number and split it up into 8 8-bit parts (64 bits total).
octets :: (Bits a, Integral a) => a -> [Word8]
octets w = map (\n -> fromIntegral $ shiftR w n) . reverse $ take 8 [0,8..]

Here are the C and Haskell standalone versions that prove that the Haskell port behaves in the same way as the C version, given the right starting seed values and state size (both 0xfff for the MWC PRNG):

C standalone version (compile with gcc -o ckiss kiss.c):

/* AUTHOR: Shinobu (zuttobenkyou.wordpress.com) */
/* LICENSE: PUBLIC DOMAIN */
#include <stdio.h>
#include <inttypes.h>
#include <stdint.h>

typedef uint64_t u64;

#define QSIZE 0xfff
#define CNG (cng = 6906969069ULL * cng + 13579)
#define XS (xs ^= (xs << 13), xs ^= (xs >> 17), xs ^= (xs << 43))
#define KISS (B64MWC() + CNG + XS)

static u64 QARY[QSIZE];
static int j;
static u64 carry;
static u64 xs;
static u64 cng;

u64 B64MWC(void)
{
	u64 t, x;
	j = (j + 1) & (QSIZE - 1);
	x = QARY[j];
	t = (x << 28) + carry;
	carry = (x >> 36) - (t < x);
	return (QARY[j] = t - x);
}

/* Initialize PRNG with default seed */
void randk_seed(void)
{
	u64 i;
	j = QSIZE - 1;
	carry = 0;
	xs = 362436069362436069ULL;
	cng = 123456789987654321ULL;
	/* Seed QARY[] with CNG+XS: */
	for (i = 0; i < QSIZE; i++)
		QARY[i] = CNG + XS;
}

/* Generate a pseudorandom 64-bit unsigned integer. */
u64 randk(void)
{
	return KISS;
}

int main(void)
{
	randk_seed();
	printf("randk_seed called!\n");
	printf("KISS idx: %"PRIu64"\n", j);
	printf("qary[idx] is: %"PRIu64"\n", QARY[j]);
	printf("qary[0] is: %"PRIu64"\n", QARY[0]);
	printf("qary[QSIZE - 1] is: %"PRIu64"\n", QARY[QSIZE - 1]);
	printf("carry: %"PRIu64"\n", carry);
	printf("cng: %"PRIu64"\n", cng);
	printf("xs: %"PRIu64"\n", xs);
	u64 x = KISS;
	printf("\nKISS called! KISS num is: %"PRIu64"\n", x);
	printf("\nKISS idx: %"PRIu64"\n", j);
	printf("qary[idx] is: %"PRIu64"\n", QARY[j]);
	printf("qary[0] is: %"PRIu64"\n", QARY[0]);
	printf("qary[QSIZE - 1] is: %"PRIu64"\n", QARY[QSIZE - 1]);
	printf("carry: %"PRIu64"\n", carry);
	printf("cng: %"PRIu64"\n", cng);
	printf("xs: %"PRIu64"\n", xs);

	printf("x + 18334599312639636657 is: %"PRIu64"\n", x + 18334599312639636657ULL);
}

Haskell standalone version (run with runhaskell kiss.hs):

-- AUTHOR: Shinobu (zuttobenkyou.wordpress.com)
-- LICENSE: PUBLIC DOMAIN
{-# LANGUAGE RecordWildCards #-}

module KISS where

import Data.Array.Unboxed
import Data.Bits
import Data.List
import Data.Word
import System.Random (RandomGen(..))

type U64 = Word64

data KISSRNG = KISSRNG
	{ kissMWCArray :: UArray Int U64
	, kissMWCArraySize :: U64
	, kissMWCIndex :: U64
	, kissMWCCarry :: U64
	, kissCNG :: U64
	, kissXSHF :: U64
	}

kissStateSize :: U64
kissStateSize = 0xfff

roundCNG :: U64 -> U64
roundCNG cng = 6906969069 * cng + 13579

roundXSHF :: U64 -> U64
roundXSHF = round3 . round2 . round1
	where
	round1 b = xor b (unsafeShiftL b 13)
 	round2 b = xor b (unsafeShiftR b 17)
	round3 b = xor b (unsafeShiftL b 43)

roundB64MWC :: KISSRNG -> KISSRNG
roundB64MWC kiss@KISSRNG{..} = kiss
	{ kissMWCArray = array'
	, kissMWCIndex = index'
	, kissMWCCarry = carry'
	}
	where
	index' = (kissMWCIndex + 1) .&. (kissMWCArraySize - 1)
	x = kissMWCArray ! (fromIntegral index')
	t = unsafeShiftL x 28 + kissMWCCarry
	carry' = unsafeShiftR x 36 - (if t < x then 1 else 0)
	array' = kissMWCArray // [(fromIntegral index', t - x)]

makeKISSRNG :: U64 -> KISSRNG
makeKISSRNG seed = KISSRNG
	{ kissMWCArray = array (0, (fromIntegral kissStateSize - 1)) $ zip [0..] kissArray
	, kissMWCArraySize = kissStateSize
	, kissMWCIndex = kissStateSize - 1
	, kissMWCCarry = 0
	, kissCNG = cngWarmed
	, kissXSHF = xshfWarmed
	}
	where
	-- seed the MWC array with the Congruential and XOR-Shift RNG's
	(kissArray, cngWarmed, xshfWarmed) = foldl' step ([], seedCNG, seedXSHF) [0..(kissStateSize - 1)]
	step (ary, cng, xshf) _ = ((cng' + xshf'):ary, cng', xshf')
		where
		cng' = roundCNG cng
		xshf' = roundXSHF xshf
	-- default Congruential RNG seed
	seedCNG = 123456789987654321
	-- default XOR-Shift RNG seed
	seedXSHF = 362436069362436069

randKISS :: KISSRNG -> (U64, KISSRNG)
randKISS k = (kissMWC + cng + xshf, k' {kissCNG = cng, kissXSHF = xshf})
	where
	k' = roundB64MWC k
	kissMWC = (kissMWCArray k') ! (fromIntegral $ kissMWCIndex k')
	cng = roundCNG $ kissCNG k
	xshf = roundXSHF $ kissXSHF k

instance Show KISSRNG where
	show KISSRNG{..} = "kissMWC: [kissMWC..]"
		++ "\nkissMWCIdx: " ++ show kissMWCIndex ++ "\n"
		++ "kissMWCArray!!Idx: " ++ show (kissMWCArray ! (fromIntegral kissMWCIndex)) ++ "\n"
		++ "kissMWCArray!!Head: " ++ show (kissMWCArray ! 0) ++ "\n"
		++ "kissMWCArray!!Last: " ++ show (kissMWCArray ! (fromIntegral $ kissStateSize - 1)) ++ "\n"
		++ "kissMWCCarry: " ++ show kissMWCCarry ++ "\n"
		++ "kissCNG: " ++ show kissCNG ++ "\n"
		++ "kissXSHF: " ++ show kissXSHF ++ "\n"

instance RandomGen KISSRNG where
	next rng = (fromIntegral n, rng')
		where
		(n, rng') = randKISS rng
	split rng = (rng1, rng2)
		where
		rng1 = warmupXSHF rng
		rng2 = warmupMWC rng
	genRange _ = (0, 0xffffffffffffffff)

warmupRNG :: RandomGen g => g -> Int -> g
warmupRNG g rounds = foldl' warmup g [1..rounds]
	where
	warmup g' _ = snd $ next g'

warmupMWC :: KISSRNG -> KISSRNG
warmupMWC rng = roundB64MWC rng

warmupXSHF :: KISSRNG -> KISSRNG
warmupXSHF rng = rng { kissXSHF = roundXSHF $ kissXSHF rng}

main :: IO ()
main = do
	let rng = makeKISSRNG 0
	putStrLn $ show rng

EDIT May 3, 2012: Sorry about the somewhat redundant-looking code listings, but I’m too lazy/busy to clean them up.
EDIT May 4, 2012: Alternate period number for the Mersenne Twister, for easier comparison of period size with KISS.

A Better Way to Prevent Cheating for Online Games

Forrest Smith has written an article about the problem of rampant cheating (http://altdevblogaday.com/2012/04/02/extravagant-cheating-via-direct-x/) on PC games. I’ve wanted to address this topic because it online cheating has affected my gaming experience in the past (although not any longer because programming has taken up most, if not all, of my gaming time). Cheating on PC games can be seen as a classic Black Hat vs. White Hat battle that never ends. Forrest mentions data analysis as a tool to measure the likelihood of a player using one or more cheats to gain an unfair advantage (which should be noticeable in terms of raw game data). I’d like to add a few things.

I agree that data analysis is probably the best way to detect cheaters. This is because it is possible to detect players who “toggle” subtle cheats that are difficult to notice (e.g., realistic aimbot, or wallhack) — these players use these cheats for 2 minutes then turn them off, and then on for another 2 minutes, then off again, and so on. While it would be difficult for humans to detect such players, computers, with enough data points, could easily see consistent, significant fluctuations in game data and raise a red flag for further analysis by a human.

Though data analysis is excellent, I think that the amount of computation needed (and also the amount of raw data needed) to automatically (and intelligently) analyze every player, in real time, is beyond the current state of technology, or at least economic feasibility. Even if trusted supercomputers could crunch player data, no game company is going to throw in thousands, if not millions, of dollars just to check for cheaters — especially if the game is over two or three years old.

The best way to prevent cheating is to use a network of trust. This is the magic phrase that comes up over and over again for those who are engaged in real security work. Public key cryptography is an excellent example. You could easily have a server that uses the current state-of-the-art public-key cryptography to ensure that, for example, only players who share at least 2 other players in their trusted keyring can log in to play. The effect of this is that you are simulating real human behavior — people tend to avoid taking risks (in this case, the possibility of a ruined gaming experience because of cheating) with complete strangers. You could even enforce draconian measures, such as, “if a trusted person has been confirmed as a cheater, all other associated players will have their trust level reduced.” Such a rule would severely dampen the spread of new cheats, as the banned player will be prevented from playing on trusted servers, and all of his real-life friends who added him to their keyring will have their “trustworthiness” level reduced. In short, real-life peer pressure would dramatically affect the game.

Sadly, the only problem with the network of trust is that it prevents millions of players from playing against each other immediately, because, let’s face it, the vast majority of online players one encounters are complete strangers. Still, I think that this method is a viable option; I am 99.999999% sure that there are thousands, if not tens of thousands, of players out there who would be willing to sacrifice maxed-out (full player) servers in order to enjoy hours of trusted gaming that guarantees a system that no one in the world, not even world governments, is able to hack.

UPDATE April 21, 2012: Fix grammar/typos.

Easy Solutions to Hard Problems?

I’ve been trudging along knee-deep in tree manipulation algorithms recently, all in Haskell. I admit, calling them “algorithms” makes me sound much smarter than I really am… what I actually worked on was converting one type of tree into another type of tree (functionA) and then back again (functionB), such that I would get back the original tree — all in a purely functional, stateless, recursive way. My brain hurt a lot. I hit roadblock after roadblock; still, I managed to go through each hurdle by sheer effort (I never studied computer science in school).

See, I have a habit of solving typical Haskell coding problems in a very iterative fashion. Here are the steps I usually follow:

  1. Research the problem domain (Wikipedia, StackOverflow, etc.)
  2. Take lots of loose notes on paper (if drawing is required), or on the computer (using emacs’ Org-mode).
  3. Write temporary prototype functions to get the job done.
  4. Test said function with GHCi.
  5. Repeat Steps 1 through 4 until I am satisfied with the implementation.

The steps usually work after one or two iterations. But for hard problems, I would end up going through many more (failed) iterations. Over and over again, hundreds of lines of Haskell would get written, tested, then ultimately abandoned because their flawed design would dawn on me halfway through Step 3. Then I would get burned out, and spend the rest of the day away from the computer screen, doing something completely different. But on the following day, I would cook up a solution from scratch in an hour.

It’s such a strange feeling. You try really hard for hours, days, weeks even, failing again and again. Then, after a brief break, you just figure it out, with less mental effort than all the effort you put in previously.

What can explain this phenomenon? The biggest factor is obviously the learning process itself — that is, it takes lots of failures to familiarize yourself intimately with the problem at hand. But the way in which the solution comes to me only after a deliberate pause, a complete separation from the problem domain, fascinates me. I call it the “Pause Effect” (PE), because I’m too lazy to dig up the right psychological term for this that probably exists already.

So, here’s my new guideline for solving really hard problems:

  1. Try to solve problem in a “brute-force” manner. Don’t stop until you burn out. I call this the “Feynman Step”, after a quote from the televised “Take the World from Another Point of View” (1973) interview, where, toward the end, he describes a good conversation partner — someone who has gone as far as he could go in his field of study. Every time I’m in this step, I think of how animated Feynman gets in his interview, and it fires me up again.
  2. Rest one or two days, then come back to it. This is the “PE Step”.

The best part is when you ultimately find the solution — it feels incredible. You get this almost intoxicating feeling of empowerment in yourself and in your own abilities.

Shinobu’s Haskell Coding Style (SHCS)

I’ve been coding in Haskell for a couple years. Now, Haskell does not really have an established coding style, like C (e.g., the excellent Linux Kernel Coding Style (LKCS)). This is partly due to the extremely flexible nature of the indentation syntax alone — Haskell enjoys the unique honor of having significant whitespace, but at the same time allowing the use of optional curly braces and semicolons for “imperative-styled” do-notation (do-notation is quite common in Haskell). I present to you my own coding style, aptly named “Shinobu’s Haskell Coding Style” — it stresses, above all, consistency and predictable indentation. Since I love the LKCS so much, I decided to use tabs — how you set the actual length of the tab is up to you and your editor-of-choice, but I personally use 4-space-wide tabs.

A picture is worth a thousand words, so here is a “picture” of SHCS in action, with ample pseudocode for your reading pleasure:

{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE OverloadedStrings #-}
module Foo.Bar where

import Some.Standard.System.Library
import Some.Standard.System.Library2 (func)

import Foo
import Quux

-- | A haddock-compatible comment.
funcWithLongTypeSignature
	:: Eq a
	=> Int
	-> Int
	-> Int
	-> Int
	-> IO Int
funcWithLongTypeSignature a b c d = do
	let a' = foo a
	e <- blah a' b
	let
		c' = foo c
		d' = foo d
	f <- bleh c' d'
	-- A comment about (someTest f). Blah blah blah blah blah blah blah blah
	-- blah blah blah.
	when (someTest f)
		. func1
		. func2
		$ func3 arg
	when (someTest2 f)
		. func1
		. func2
		. func3
		. func4
		$ arg
	return $ e + f
	where
	e a b = do
		...
		return x
	f c d = case convert c of
		Z n -> return n
		_ -> return y
		where
		-- Break up 80+ character lines into parts.
		convert c = hoolaaloolaa'
			. hoolaaloolaa''
			. hoolaaloolaa'''
			$ hoolaaloolaa'''' c
		-- This version is also acceptable, if it increases legibility:
		--
		-- convert c = hoolaaloolaa'
		--	. hoolaaloolaa''
		--	. hoolaaloolaa'''
		--	. hoolaaloolaa''''
		--  $ c

someLongList :: [(Int, String)]
someLongList =
	[ (1, "a")
	, (2, "b")
	, (3, "c")
	, (4, "d")
	, (5, "e")
	, (6, "f")
	]

listComprehension :: [Int]
listComprehension =
	[ (a, b)
	| a <- [1..]
	, b <- [2..]
	, a + b < 100
	]

baz :: Int -> Int -> Char
baz a b c
	| a == 0 = 'o'
	| b == 100 = if True
		then if True
			then 'p'
			else 's'
		else 'q'
	| c == 22 = 'z'
	| otherwise = f 'k'
	where
	f c = blah $ bleh c


longStringConcatenation :: String
longStringConcatenation = "The quick "
	++ "fox "
	++ "jumps "
	++ "over "
	++ "the "
	++ "lazy "
	++ "evaluation."

data Geometry = Geometry
	{ geoGeo :: Int
	, geoGeoGeo :: Double
	} deriving (Eq)

instance Show Geometry where
	show Geometry{..} = show geoGeo
		++ show geoGeoGeo
		++ "!"

recordConstruction :: Geometry
recordConstruction = Geometry
	{ geoGeo = 1
	, geoGeoGeo = 2.0
	}

data SomeEnum
	= SEA
	| SEB
	| SEC
	| SED
	| SEE
	deriving (Eq, Enum, Ord, Show)

quaff :: SomeEnum -> Int
quaff s = case s of
	SEA -> 1
	SEB -> 2
	SEC -> 3
	SED -> 4
	SEE -> 5

WordPress renders the indentation tabs as spaces — sorry about that. Anyway, my greatest “triumph” was when I discovered that you could write the “where” clause at the same level of indentation as all the functions below it (e.g., functionWithLongTypeSignature). This reduced a lot of my code’s indentation, because I prefer to use where clauses wherever possible.

Here are the rules:

  • Place LANGUAGE extensions at the beginning of the file, with one extension per line.
  • Import standard, “system” modules first on its own block. Add all of your own modules on their own block below it. List all modules in alphabetical order.
  • 80 character limit. Try very hard to keep lines within 80 characters. This rule is especially pertinent to type signatures. Speaking of which…
  • Write explicit type signatures for all top-level functions. You may also want to write type signatures for functions underneath “where” clauses (but I think you have to use the ScopedTypeVariables extension to do this).
  • Use the “leading comma” rule when writing long lists or data structure definitions (e.g., the “Geometry” example above).
  • Indent long string concatenations (++).
  • Use GHC with at least the -Wall and -Werror flags.
  • When writing a case expression for a type that is like a C enum (e.g., SomeEnum above), explicitly write all the possible enumeration symbols for that type, instead of writing (n – 1) of them and using the “_” catch-all syntax at the end.
  • Use spaces between arithmetic components (e.g., “(n + 1)” instead of “(n+1)”.
  • Avoid use of curly braces/semicolons for do-notation blocks.
  • Avoid use of let statements, and instead use where clauses. The use of a where clause forces you to write in a top-down manner, with the “overall picture” first, followed by smaller component functions. I’d rather be faced with a one-liner function followed by 100 helper functions under a where clause, rather than first seeing the 100 helper functions in a gigantic let expression, followed by the one-liner.
  • Try to avoid nesting where clauses as much as possible.
  • When confronted with a long line, break it up into smaller parts (by function, if you’re composing many of them, or by argument, if it’s just one function with many arguments) and indent the “tail” of these parts, each on their own line (see “convert c” in the example).
  • Prefer to use the dollar symbol operator ($) instead of using parentheses. E.g., (foo $ bar x) instead of (foo (bar x)).
  • Instead of (foo (bar (baz (quux x)))), write (foo . bar . baz $ quux x). I believe this is an “unstated” style rule among Haskellers, as it is. And so I formally state it here.
  • Catch-all rule: if the code is short enough, it is optional whether you write it in one line or multiple (indented) lines (e.g., list comprehensions, lists, etc.)

These are just a handful of rules, but they go a long way to make things look consistent. And, because the indentation is predictable, it makes the code easier to read.

On Help Vampires

A help vampire (HV) is someone who asks for help to resolve a problem, especially on a forum or blog post, but:

  • does not read the forum posts/blog post,
  • provides little, if any, information about his/her particular situation (even if it’s drastically different than the one presented in the forum topic/blog post), and
  • provides no evidence of having spent even one ounce of energy to solve the problem, or to even narrow it down.

On top of this, a healthy HV usually:

  • has poor spelling and grammar,
  • (if in an e-mail thread) always top-posts,
  • communicates in terse statements, forcing others to assume 9 out of 10 variables to be a certain way in order for their statements, taken as a whole, to make sense.

I’ve been around in the Linux/FOSS world for several years, and I’ve seen many HVs. You can usually tell that you’re dealing with a HV because you’ll need to communicate with them with about 5 or 6 posts to actually get them to pose the right question — to just get to square 1. And by then, you’re so into the “yay, I’m helping someone!” feeling that you fail to realize the vampire fangs already dug deep inside your neck, sucking out every drop of your blood…

I see myself as a pretty nice person, because, I’ve helped out even the worst HVs, with the faint hope that one day, they would see the light and turn back into sane humans again. But, I think helping HVs is basically the same as feeding a troll — they will never learn.

Enough is enough. I refuse to waste time with people who cannot form coherent ideas in their heads!

There is so much information out there on the internet — with some simple double-quoting skills on google, you can find the solution to most things very quickly. For 99% of people, the answers are already out there, 99% of the time.

I declare, I will never feed the help vampires, on this blog or anywhere else!

P.S. I’m reminded of a reverse-HV situation, where the original poster of a forum thread, posts a very well thought-out question, but is met with answers from a mob of semi-illiterate people (some with very high post counts) who have either (1) not read the entirety of the OP’s post, or (2) not taken their medication. I’m glad that I’ve never had to face such mobs in my own threads…

Problems with the Portable Game Notation (PGN) Standard

I was trying to write a toy program to parse chess PGN (Portable Game Notation) files, to automate chess analysis using the free chess engine Stockfish. It turns out that I really, really hate PGN.

Parsing PGN Requires a Legal Chess Move Generator

This is the biggest problem with PGN. It uses SAN (Standard Algebraic Notation, or just “algebraic notation”) to encode chess moves in a game. However, for almost inexplicable reasons, it decided to NEVER use disambiguation files/ranks whenever possible. For example, if there are two knights that can move to e1, but one of them is pinned by an enemy piece (moving this knight would expose the king to a check and is thus an illegal move), PGN states that the move must be encoded as “Ne1”.

Thanks to this, it is impossible to know what “Ne1” means any time you see it in PGN, unless you have a legal chess move generator to disambiguate it. The irony is that the disambiguation characters, though mentioned in the PGN standard (“8.2.3.4 Disambiguation”), do not help you disambiguate at all! If the standard just said “use Smith Notation“, the requirement of any chess knowledge would not be necessary; each recorded move would be the only possible legal move anyway, and hence, would not require the use of a full-fledged move generator, and you’d only need to keep track of where the pieces are as you move along.

Other Problems

I’ve said 99% of what I wanted to say… But! There are still other problems with PGN that annoy me and probably every single other person out there who have tried to write a program to parse PGN data:

  • Two different formats: There are two PGN formats, with different stylistic requirements: “PGN Export” and “PGN Import” formats. I cannot find any reason why there should be two different formats. Just have 1 format, and call it “The PGN Standard, 2.0”!
  • Stateful parsing: This is the result of the choice of SAN and disambiguation characters I mentioned above. You have to keep track of game state (where all the pieces are) in order to parse a sequence of moves. Technically speaking, even the use of Smith Notation requires that you keep track of where the pieces are, and hence, is stateful, so there is room for improvement here.
  • It’s set in stone: The standard was last revised in 1994, nearly 20 years ago. Some parts of the standard remain undefined to this day.
  • ASCII encoding: The world uses Unicode, and there ARE players whose names use non-English characters, such as European players, Russian players, Chinese players, etc.
  • Confusion of human-friendliness and computer-friendliness: The use of SAN for the Movetext section was probably designed for easy legibility for humans. But there are other designs that make it extremely human-unfriendly, notably NAGs (Numeric Annotation Glyphs), which require that you use symbols like “$3” for the traditional “!!” move comment.
  • 80-character limit: The widespread “PGN Export” format requires you to limit every line to 80 characters. This is again due to confusion between human-friendliness and computer-friendliness. Computers would much rather have lines of text that are broken up semantically (e.g., pretty-printed XML or YAML). The case for human-friendliness is weak, because any better-than-Notepad text editor has sensible soft line-breaking. And no one encodes PGN by hand with a text editor anyway (everyone uses one of the many free PGN editing programs out there).
  • 255-character limit: The “PGN Import” format requires that a line must be less than 255 characters long.
  • Messy use of newlines: This character is used to break up each line to 80 or 255 characters, and is hence can be ignored when parsing. Right? Wrong. It is also used to separate two PGN games from each other. Ugh!
  • Confusion between content and style: Section “3.2.1: Byte equivalence” states, “For a given PGN data file, export format representations generated by different PGN programs on the same computing system should be exactly equivalent, byte for byte.” So, this makes even whitespace significant (an extra space character will violate the standard)… even though it doesn’t play a huge role (most of the time; see “Messy use of newlines” above).
  • Seven Tag Roster: There are 7 tag pairs, or key-value pairs, that are mandatory for a game for “archival storage”. Here they are:
    1. Event (the name of the tournament or match event)
    2. Site (the location of the event)
    3. Date (the starting date of the game)
    4. Round (the playing round ordinal of the game)
    5. White (the player of the white pieces)
    6. Black (the player of the black pieces)
    7. Result (the result of the game)

    And, these tags must appear in the above order (again, confusion between content and style). Technically, these tags can have empty values, so from a parsing viewpoint, having these tags with empty values is the same thing as not having these tags at all. And some of these tags are very awkward for some situations (e.g., the “Site” tag, which is geographic in nature, doesn’t really apply for the case of correspondence chess.)

  • Conflict between tag pairs and Movetext: The “PlyCount” tag, for example, is redundant and does nothing but introduce errors in a PGN file. (What would you, as programmer, trust — the PlyCount tag, or the actual Movetext section? So why have the “PlyCount” tag in the first place?) The “Result” tag is also redundant, because it is supposed to be present as a “Game Termination Marker” after the Movetext section. Yet another tag that you have to check for validity…
  • Clumsy date tag: The “Date” tag uses a “YYYY.MM.DD” pattern to record the date. There is no way to disambiguate the order of games with the same opponent for those 5-minute blitz games that everyone plays on the internet.

A New, Sane Standard…

If it were me, I’d do away with PGN entirely… there are just too many problems. Here are some ideas for a better chess game recording standard:

  • Only require a list of moves. Do not require certain “tag pairs” such as the “Seven Tag Roster”. Make all additional data in addition to the list of actual game moves optional. Because, really, a game of chess is, essentially, a sequence of moves.
  • Use of “Full Notation” for recording moves. I hereby declare my support for a new notation, called “Full Notation”: each move records what piece is moving, what piece is captured (if any), what piece a pawn was promoted to (if any), whether it is a normal move, a castling move, or an en passant capture, and the starting and destination squares of the moving piece. For castling moves, it records the starting and destination squares for both the King and Rook (with maybe aliases “O-O” and “O-O-O” for traditional orthodox castling moves), for dead-simple clarity. This removes the need for a legal move generator when parsing the moves, and also gets rid of the need for disambiguation characters entirely. It also removes any need to keep track of where the pieces are. Simplicity always wins.
  • State which variant of chess we are playing, since there are many popular ones now, such as Chess960.
  • Design every feature to be computer-friendly, not human-friendly. Nobody writes raw PGN by hand from scratch, so there’s no concern for “alienating” any existing PGN users.
  • Maybe separate the actual game moves from the variations/commentary, to allow for easier basic parsing. The recursive structure used by PGN for defining move variations has been wildly successful (perhaps PGN’s only redeeming feature), and it makes sense to adopt this property. But I’m not sure what is the simplest way to represent the actual moves played vs. the variations/commentary.
  • Be very conservative against stylistic features (e.g., PGN’s Numeric Annotation Glyphs are an excellent example of what NOT to do).
  • Use XML or YAML (probably YAML). This would make it 10x easier to parse the game info, regardless of your programming-language-of-choice. This also automatically makes the standard computer-friendly.
  • Use Unicode.

The above ideas would certainly make the format more “verbose” and require more disk space. But in the age of hard drives in the hundreds-of-gigabytes range, I think it makes a lot of sense to sacrifice the extra kilobytes per game to achieve simplicity. Besides, few people keep more than, say, 10,000 files in a plaintext format such as PGN. I would be delighted to see Scid, PyChess or any other free program adopt these ideas to create a new chess game recording standard… The most important feature I’d like to see would be the lack of the need for a legal move generator when parsing the moves. This alone would make the parsing 100x easier.

UPDATES:

  • January 27, 2012: Don’t suggest the use of long algebraic notation. Instead, support a new notation called “Full Notation”. Also clarify some points about keeping track of game state.