Problems with the Portable Game Notation (PGN) Standard

I was trying to write a toy program to parse chess PGN (Portable Game Notation) files, to automate chess analysis using the free chess engine Stockfish. It turns out that I really, really hate PGN.

Parsing PGN Requires a Legal Chess Move Generator

This is the biggest problem with PGN. It uses SAN (Standard Algebraic Notation, or just “algebraic notation”) to encode chess moves in a game. However, for almost inexplicable reasons, it decided to NEVER use disambiguation files/ranks whenever possible. For example, if there are two knights that can move to e1, but one of them is pinned by an enemy piece (moving this knight would expose the king to a check and is thus an illegal move), PGN states that the move must be encoded as “Ne1”.

Thanks to this, it is impossible to know what “Ne1” means any time you see it in PGN, unless you have a legal chess move generator to disambiguate it. The irony is that the disambiguation characters, though mentioned in the PGN standard (“8.2.3.4 Disambiguation”), do not help you disambiguate at all! If the standard just said “use Smith Notation“, the requirement of any chess knowledge would not be necessary; each recorded move would be the only possible legal move anyway, and hence, would not require the use of a full-fledged move generator, and you’d only need to keep track of where the pieces are as you move along.

Other Problems

I’ve said 99% of what I wanted to say… But! There are still other problems with PGN that annoy me and probably every single other person out there who have tried to write a program to parse PGN data:

  • Two different formats: There are two PGN formats, with different stylistic requirements: “PGN Export” and “PGN Import” formats. I cannot find any reason why there should be two different formats. Just have 1 format, and call it “The PGN Standard, 2.0”!
  • Stateful parsing: This is the result of the choice of SAN and disambiguation characters I mentioned above. You have to keep track of game state (where all the pieces are) in order to parse a sequence of moves. Technically speaking, even the use of Smith Notation requires that you keep track of where the pieces are, and hence, is stateful, so there is room for improvement here.
  • It’s set in stone: The standard was last revised in 1994, nearly 20 years ago. Some parts of the standard remain undefined to this day.
  • ASCII encoding: The world uses Unicode, and there ARE players whose names use non-English characters, such as European players, Russian players, Chinese players, etc.
  • Confusion of human-friendliness and computer-friendliness: The use of SAN for the Movetext section was probably designed for easy legibility for humans. But there are other designs that make it extremely human-unfriendly, notably NAGs (Numeric Annotation Glyphs), which require that you use symbols like “$3” for the traditional “!!” move comment.
  • 80-character limit: The widespread “PGN Export” format requires you to limit every line to 80 characters. This is again due to confusion between human-friendliness and computer-friendliness. Computers would much rather have lines of text that are broken up semantically (e.g., pretty-printed XML or YAML). The case for human-friendliness is weak, because any better-than-Notepad text editor has sensible soft line-breaking. And no one encodes PGN by hand with a text editor anyway (everyone uses one of the many free PGN editing programs out there).
  • 255-character limit: The “PGN Import” format requires that a line must be less than 255 characters long.
  • Messy use of newlines: This character is used to break up each line to 80 or 255 characters, and is hence can be ignored when parsing. Right? Wrong. It is also used to separate two PGN games from each other. Ugh!
  • Confusion between content and style: Section “3.2.1: Byte equivalence” states, “For a given PGN data file, export format representations generated by different PGN programs on the same computing system should be exactly equivalent, byte for byte.” So, this makes even whitespace significant (an extra space character will violate the standard)… even though it doesn’t play a huge role (most of the time; see “Messy use of newlines” above).
  • Seven Tag Roster: There are 7 tag pairs, or key-value pairs, that are mandatory for a game for “archival storage”. Here they are:
    1. Event (the name of the tournament or match event)
    2. Site (the location of the event)
    3. Date (the starting date of the game)
    4. Round (the playing round ordinal of the game)
    5. White (the player of the white pieces)
    6. Black (the player of the black pieces)
    7. Result (the result of the game)

    And, these tags must appear in the above order (again, confusion between content and style). Technically, these tags can have empty values, so from a parsing viewpoint, having these tags with empty values is the same thing as not having these tags at all. And some of these tags are very awkward for some situations (e.g., the “Site” tag, which is geographic in nature, doesn’t really apply for the case of correspondence chess.)

  • Conflict between tag pairs and Movetext: The “PlyCount” tag, for example, is redundant and does nothing but introduce errors in a PGN file. (What would you, as programmer, trust — the PlyCount tag, or the actual Movetext section? So why have the “PlyCount” tag in the first place?) The “Result” tag is also redundant, because it is supposed to be present as a “Game Termination Marker” after the Movetext section. Yet another tag that you have to check for validity…
  • Clumsy date tag: The “Date” tag uses a “YYYY.MM.DD” pattern to record the date. There is no way to disambiguate the order of games with the same opponent for those 5-minute blitz games that everyone plays on the internet.

A New, Sane Standard…

If it were me, I’d do away with PGN entirely… there are just too many problems. Here are some ideas for a better chess game recording standard:

  • Only require a list of moves. Do not require certain “tag pairs” such as the “Seven Tag Roster”. Make all additional data in addition to the list of actual game moves optional. Because, really, a game of chess is, essentially, a sequence of moves.
  • Use of “Full Notation” for recording moves. I hereby declare my support for a new notation, called “Full Notation”: each move records what piece is moving, what piece is captured (if any), what piece a pawn was promoted to (if any), whether it is a normal move, a castling move, or an en passant capture, and the starting and destination squares of the moving piece. For castling moves, it records the starting and destination squares for both the King and Rook (with maybe aliases “O-O” and “O-O-O” for traditional orthodox castling moves), for dead-simple clarity. This removes the need for a legal move generator when parsing the moves, and also gets rid of the need for disambiguation characters entirely. It also removes any need to keep track of where the pieces are. Simplicity always wins.
  • State which variant of chess we are playing, since there are many popular ones now, such as Chess960.
  • Design every feature to be computer-friendly, not human-friendly. Nobody writes raw PGN by hand from scratch, so there’s no concern for “alienating” any existing PGN users.
  • Maybe separate the actual game moves from the variations/commentary, to allow for easier basic parsing. The recursive structure used by PGN for defining move variations has been wildly successful (perhaps PGN’s only redeeming feature), and it makes sense to adopt this property. But I’m not sure what is the simplest way to represent the actual moves played vs. the variations/commentary.
  • Be very conservative against stylistic features (e.g., PGN’s Numeric Annotation Glyphs are an excellent example of what NOT to do).
  • Use XML or YAML (probably YAML). This would make it 10x easier to parse the game info, regardless of your programming-language-of-choice. This also automatically makes the standard computer-friendly.
  • Use Unicode.

The above ideas would certainly make the format more “verbose” and require more disk space. But in the age of hard drives in the hundreds-of-gigabytes range, I think it makes a lot of sense to sacrifice the extra kilobytes per game to achieve simplicity. Besides, few people keep more than, say, 10,000 files in a plaintext format such as PGN. I would be delighted to see Scid, PyChess or any other free program adopt these ideas to create a new chess game recording standard… The most important feature I’d like to see would be the lack of the need for a legal move generator when parsing the moves. This alone would make the parsing 100x easier.

UPDATES:

  • January 27, 2012: Don’t suggest the use of long algebraic notation. Instead, support a new notation called “Full Notation”. Also clarify some points about keeping track of game state.
Advertisements

What do aliens look like?

Excuse the semi-random title, but, this question has been bugging me for a while. Ever since I was a kid and saw movies about space and aliens, I asked the question, “would real aliens really look like that?” Let’s face it, the mainstream film and art culture tend to portray aliens as humanoid life forms. How much of a humanoid form can be drawn from reasonable, educated guesses? And how much of it is just plain fantasy?

NOTE: I did not do any real research before writing this post, and am making educated guesses based off the top of my head. If you are an expert at any of the stuff (astrobiology in particular) please critique this post in the comments!

Before we examine what intelligent aliens could look like, let’s first look at the most basic conditions of where life can arise, if at all. There are some very intelligent guesses as to the cosmic ingredients of life. Let us examine each ingredient in turn.

The habitable zone

All life, as far as we know, have 1 function: change one form of energy into another. There must be a steady supply of energy that can be consumed. For us, it’s the Sun, mostly, although there is also life at the ocean depths that draw energy from thermal vents (and even these vents draw their energy from the heat of the Earth’s molten core, which is also due to radioactive decay). I think it’s safe to say that there must be either a good source of steady radiation for there to be life. Now, the best way to get a steady source of radioactive energy is by orbiting a star. Stars, if of the right size, have very good lifepsans (in the order of billions of years!). But orbiting a star also has another tremendous benefit: you get to stay at a stable location in space. If there was no Sun to orbit, the Earth would be hurling across space in some random direction (and it would be frozen over rather quickly).

So, there needs to be a sun-like star. The next thing necessary for life is probably a planet-sized blob of rock with some water and an atmosphere. The planet-scale size, water, and atmosphere really go hand-in-hand and can’t exist meaningfully without the other. Let us start with water. Water is necessary for life because it is the best chemically neutral “solvent” where many different chemical reactions can take place freely. Water also has the special property of being less dense as a solid than as a liquid, which keeps it from freezing over too easily; e.g., the icebergs of the Arctic float around, and melt, when they reach warmer waters — if the ice was to simply sink the moment it froze over, such melting processes would never occur. The rocky composition of the planet is required because it provides the only way to keep the water in a stable place (what we call oceans).

Now, water itself is a very precious substance — if you are designing a planet, you’d need a way to stop water from evaporating away into space. This is because the radioactive rays from the nearby sun could slowly “boil” the water, one molecule at a time, away. So, first, you’d need some sort of protective shield around the planet to keep the water safe. The Earth uses a magnetic field to do this, and thankfully the Earth is large enough to generate a strong enough magnetic field. This is the reason why our alien world would need to be planet-sized. Also, a planet can, by virtue of its size, retain most of the water with its gravitational field; if water molecules turn into separete Hydrogen and Oxygen gas moledules, their escape into outer space could be slowed down significantly, so that such evaporation would take many millions (or billions?) of years.

So we’ve established that you need water, and a planet to keep it. What about atmosphere? Well, the atmosphere plays a very important role on Earth: it shields us from too much radiation. Yes, the Earth’s magnetic field protects us from harmful radiation, but not all of it. You still get a lot of radiation from the sun itself. For us on Earth, the ozone layer plays a big role in protecting us from direct radiation. And so it is with our hypothetical alien world: it, too would need an atmosphere of some sort (and plus, if you have large bodies of water (oceans) with an atmosphere, you could get a water cycle, so that the water gets spread across into dry land, to allow for land-dwelling organisms).

Land dweller

Sun, rocky planet, water, and atmosphere. OK. Now comes I think probably the most controversial point: the intelligent alien would be a land-based animal. To support this hypothesis, I will first get rid of the other two alternatives: flying (winged) animals and marine (ocean-dwelling) animals. First, why can’t winged animals become intelligent? My guess is that intelligence evolved from the ability to manipulate nature in accurate, reproducible ways — i.e., we could create tools with our hands, and this separated the stupid and the less-stupid. Flying animals, by the laws of physics that govern our universe, cannot be too large. In fact, the smaller the better. This is why most flying organisms are insects. Also, because nature strives for brutal simplicity whenever it can, chances are that if you find a winged creature, it will only have wings, not wings and arms. And if you only have wings, then, you won’t be able to grab things in the accurate manner required to create tools.

Well, what about marine animals? Well, the problem is basically the same as that of winged animals: you will have fins, not arms or hands, because you need to swim. Dolphins of the future could have IQs that are 5000x higher than us, but without a means to manipulate nature around them with precision instruments (e.g., hands, opposable thumb, etc.), they will be forever doomed to the same patterns of behavior as their 50 IQ ancestors. If you think about it a marine animals are just like winged animals, except that their “wind” is the water. And as for land-dwelling marine animals, I also think that this category cannot produce any intelligent life, because of the following reason: the presence of water shortens the distance between all potential predators and prey. You need as big a distance as possible between you and the predator to survive as an intelligent life form, because this is the only way you can show Nature your intelligent decision-making skills and collect enough evolution points. If sharks around a 2-mile radius can sense your presence, and there are no trees to “climb up” to to avoid them, there is very, very little time to make any decisions. This is probably why the dominant evolutionary design of underwater land dwellers (other than fish) are crustaceans, with their natural armored shells. This short predator-prey distance, or PPD, explains why dolphins cannot truly sleep like we do.

Sense organs and appearance

So that leaves us with land-dwelling animals. We can take all the clues on our own planet to make some very good guesses in this final category. My only lament is that these guesses make the hypothetical intelligent alien very, very boring (and very humanoid). First, our alien must have limbs. All land-dwelling creatures on Earth have limbs, because limbs provide the best way to move about across a hard surface inside a low-density medium (the atmosphere). Let’s throw in hands here as well, because hands, as stated above, are the best natural tools to manipulate Nature in an accurate way. Since the two-hand, two-feet design is probably the best (and simplest) way to have hands and limbs for mobility, we’ll just adopt the “two hands, two feet” design. Next, the alien must have eyes. Eyes are one of the most primitive and basic organs (many bacteria have eyes), and provide the best bang-for-the-buck in terms of the information it can gather. Light is the fastest medium of information, and also the most prevalent (the alien Sun would provide a constant stream of light, as would any moons orbiting the planet). Light allowed our own human ancestors to maximize their PPD, because they could spot predators a mile away by just their color (this is impossible to do in the ocean), and it also allowed early hunter humans to communicate with abstract symbols to each other in a manner that only they, not the prey, could understand (what we now call hand signals) to maximize their intelligent decision making skills.

Hands, feet, and eyes. OK. What about a mouth, and nose? Well the mouth is certainly mandatory: how else can you eat food? There must be at least one opening for the nutrients to enter, and so there must be at least one mouth. As for the nose, for us humans it allows us to keep a well-salivated mouth, because we can breathe through our nose (which is dry by default); if we had to always breathe through our mouths, we would have a much harder time keeping our saliva flowing and ready to eat a meal when we chanced upon it. Our alien’s mouth would also have to be salivated with some chemicals to help it chew things and swallow the fine pieces as a single whole (imagine swallowing dry leaves with zero saliva… not pleasant). The nose also functions as a primitive poison detector (e.g., rotting food); if a species had to use their mouth to test something out every time they sat down for a meal, it would die out pretty fast.

The alien will also have ears, probably. This is because they allow the organism to sense things when the eyes can’t (e.g., at night, for example). Awareness of the environment is 99.9% of the evolution game, so anything to maximize information-gathering senses helps to keep the species from going extinct.

Eyes, mouth, nose, and ears. Amazing! Our alien probably has all of these things. Now comes an interesting question: does the alien have a face like we do? Chances are, I think, yes. First, the mouth has to go at the very bottom, because this is the only way of keeping the eyes above it. Why do the eyes need to placed above the mouth? It’s because of gravity: the alien, like us, we will pick up food from the ground (a freshly killed prey animal, any collection of food from multiple sources, etc.) more than from trees or other high places. Also, water will almost always be at ground-level. Having the eyes above the mouth allows us to look out for potential threats or opportunities while we eat, simultaneously. The alien nose will also probably find a spot between the eyes and mouth, because it can’t be below the mouth or above the eyes. If below the mouth, then it would simply get in the way when eating. The nose can’t be above the eyes because of gravity again: anything expelled from the nose would hinder the eyes. So the nose goes between the eyes and mouth. As for the eyes themselves, there would be at least two of them, because having two eyes provides highly accurate depth perception, and also acts as a primitive form of “insurance”: you can lose one eye but still go on your way. Lastly, the ears are located probably on the sides, on opposite poles, because that’s the best way to capture as many sound waves as possible from different directions. Since our alien has a “face”, the ears would have to go on a different axis — the simplest would be to have the ears on the left and right, not the front (face) and back.

Our hypothetical alien probably as a face and ears like ours. But does it also have a brain behind the face, on a “head”? Could it instead have a face and brain in the “torso”, if it has one? Well, I think the alien would also have a brain inside a “head” with a face, much like ours. Why? First, keeping the bulk of the sensory organs (eyes, nose, mouth, ears) in one place is great because it allows us to protect them from harmful threats in the most efficient way. If we had our sensory organs spread apart everywhere, it would be very difficult to protect them all at once. If we just cover our heads with our hands and arms, we can protect almost all of our important sensory organs quite effectively. By covering your face alone with one hand, you protect your eyes, nose, and mouth — impressive, don’t you think? Thus, the alien will also have a head for the face. Having a head provides another benefit: it allows maximum use of the other limbs and torso — you can put them underwater, inside mud, or whatever, and it has no detrimental effect on your main senses (eyes, ears, etc.).

Now, what about the brain? Is it also inside the head of the alien? Most likely, yes. This is because the center “torso” area of the alien will already be filled up with older, more primitive organs such as a heart, digestive organs, etc. And, chances are that the brain of its more stupid ancestors would have been very close to its major sensory organs such as its eyes, for reasons of simplicity (why have extra-long neurons that go from the eyes to some far-flung brain when you can have short neurons with a brain close by?).

Conclusion

My conclusion is that intelligent aliens will look very humanoid in form. And chances are, their Earth will look very similar to ours, with an atmosphere and oceans. I don’t think this prospect is boring at all. Just imagine — they too will have a word for “water”, and “Sun” and “Earth”, as well as “face”, “eyes”, etc. The more you think about it, the more fascinating it gets.

The only unpleasant part about it is that, if we ever do meet such aliens, they will freak us out with their warped humanlike appearance. But hey, at least the freakout will go both ways, right?