On Help Vampires

A help vampire (HV) is someone who asks for help to resolve a problem, especially on a forum or blog post, but:

  • does not read the forum posts/blog post,
  • provides little, if any, information about his/her particular situation (even if it’s drastically different than the one presented in the forum topic/blog post), and
  • provides no evidence of having spent even one ounce of energy to solve the problem, or to even narrow it down.

On top of this, a healthy HV usually:

  • has poor spelling and grammar,
  • (if in an e-mail thread) always top-posts,
  • communicates in terse statements, forcing others to assume 9 out of 10 variables to be a certain way in order for their statements, taken as a whole, to make sense.

I’ve been around in the Linux/FOSS world for several years, and I’ve seen many HVs. You can usually tell that you’re dealing with a HV because you’ll need to communicate with them with about 5 or 6 posts to actually get them to pose the right question — to just get to square 1. And by then, you’re so into the “yay, I’m helping someone!” feeling that you fail to realize the vampire fangs already dug deep inside your neck, sucking out every drop of your blood…

I see myself as a pretty nice person, because, I’ve helped out even the worst HVs, with the faint hope that one day, they would see the light and turn back into sane humans again. But, I think helping HVs is basically the same as feeding a troll — they will never learn.

Enough is enough. I refuse to waste time with people who cannot form coherent ideas in their heads!

There is so much information out there on the internet — with some simple double-quoting skills on google, you can find the solution to most things very quickly. For 99% of people, the answers are already out there, 99% of the time.

I declare, I will never feed the help vampires, on this blog or anywhere else!

P.S. I’m reminded of a reverse-HV situation, where the original poster of a forum thread, posts a very well thought-out question, but is met with answers from a mob of semi-illiterate people (some with very high post counts) who have either (1) not read the entirety of the OP’s post, or (2) not taken their medication. I’m glad that I’ve never had to face such mobs in my own threads…

Problems with the Portable Game Notation (PGN) Standard

I was trying to write a toy program to parse chess PGN (Portable Game Notation) files, to automate chess analysis using the free chess engine Stockfish. It turns out that I really, really hate PGN.

Parsing PGN Requires a Legal Chess Move Generator

This is the biggest problem with PGN. It uses SAN (Standard Algebraic Notation, or just “algebraic notation”) to encode chess moves in a game. However, for almost inexplicable reasons, it decided to NEVER use disambiguation files/ranks whenever possible. For example, if there are two knights that can move to e1, but one of them is pinned by an enemy piece (moving this knight would expose the king to a check and is thus an illegal move), PGN states that the move must be encoded as “Ne1”.

Thanks to this, it is impossible to know what “Ne1” means any time you see it in PGN, unless you have a legal chess move generator to disambiguate it. The irony is that the disambiguation characters, though mentioned in the PGN standard (“8.2.3.4 Disambiguation”), do not help you disambiguate at all! If the standard just said “use Smith Notation“, the requirement of any chess knowledge would not be necessary; each recorded move would be the only possible legal move anyway, and hence, would not require the use of a full-fledged move generator, and you’d only need to keep track of where the pieces are as you move along.

Other Problems

I’ve said 99% of what I wanted to say… But! There are still other problems with PGN that annoy me and probably every single other person out there who have tried to write a program to parse PGN data:

  • Two different formats: There are two PGN formats, with different stylistic requirements: “PGN Export” and “PGN Import” formats. I cannot find any reason why there should be two different formats. Just have 1 format, and call it “The PGN Standard, 2.0”!
  • Stateful parsing: This is the result of the choice of SAN and disambiguation characters I mentioned above. You have to keep track of game state (where all the pieces are) in order to parse a sequence of moves. Technically speaking, even the use of Smith Notation requires that you keep track of where the pieces are, and hence, is stateful, so there is room for improvement here.
  • It’s set in stone: The standard was last revised in 1994, nearly 20 years ago. Some parts of the standard remain undefined to this day.
  • ASCII encoding: The world uses Unicode, and there ARE players whose names use non-English characters, such as European players, Russian players, Chinese players, etc.
  • Confusion of human-friendliness and computer-friendliness: The use of SAN for the Movetext section was probably designed for easy legibility for humans. But there are other designs that make it extremely human-unfriendly, notably NAGs (Numeric Annotation Glyphs), which require that you use symbols like “$3” for the traditional “!!” move comment.
  • 80-character limit: The widespread “PGN Export” format requires you to limit every line to 80 characters. This is again due to confusion between human-friendliness and computer-friendliness. Computers would much rather have lines of text that are broken up semantically (e.g., pretty-printed XML or YAML). The case for human-friendliness is weak, because any better-than-Notepad text editor has sensible soft line-breaking. And no one encodes PGN by hand with a text editor anyway (everyone uses one of the many free PGN editing programs out there).
  • 255-character limit: The “PGN Import” format requires that a line must be less than 255 characters long.
  • Messy use of newlines: This character is used to break up each line to 80 or 255 characters, and is hence can be ignored when parsing. Right? Wrong. It is also used to separate two PGN games from each other. Ugh!
  • Confusion between content and style: Section “3.2.1: Byte equivalence” states, “For a given PGN data file, export format representations generated by different PGN programs on the same computing system should be exactly equivalent, byte for byte.” So, this makes even whitespace significant (an extra space character will violate the standard)… even though it doesn’t play a huge role (most of the time; see “Messy use of newlines” above).
  • Seven Tag Roster: There are 7 tag pairs, or key-value pairs, that are mandatory for a game for “archival storage”. Here they are:
    1. Event (the name of the tournament or match event)
    2. Site (the location of the event)
    3. Date (the starting date of the game)
    4. Round (the playing round ordinal of the game)
    5. White (the player of the white pieces)
    6. Black (the player of the black pieces)
    7. Result (the result of the game)

    And, these tags must appear in the above order (again, confusion between content and style). Technically, these tags can have empty values, so from a parsing viewpoint, having these tags with empty values is the same thing as not having these tags at all. And some of these tags are very awkward for some situations (e.g., the “Site” tag, which is geographic in nature, doesn’t really apply for the case of correspondence chess.)

  • Conflict between tag pairs and Movetext: The “PlyCount” tag, for example, is redundant and does nothing but introduce errors in a PGN file. (What would you, as programmer, trust — the PlyCount tag, or the actual Movetext section? So why have the “PlyCount” tag in the first place?) The “Result” tag is also redundant, because it is supposed to be present as a “Game Termination Marker” after the Movetext section. Yet another tag that you have to check for validity…
  • Clumsy date tag: The “Date” tag uses a “YYYY.MM.DD” pattern to record the date. There is no way to disambiguate the order of games with the same opponent for those 5-minute blitz games that everyone plays on the internet.

A New, Sane Standard…

If it were me, I’d do away with PGN entirely… there are just too many problems. Here are some ideas for a better chess game recording standard:

  • Only require a list of moves. Do not require certain “tag pairs” such as the “Seven Tag Roster”. Make all additional data in addition to the list of actual game moves optional. Because, really, a game of chess is, essentially, a sequence of moves.
  • Use of “Full Notation” for recording moves. I hereby declare my support for a new notation, called “Full Notation”: each move records what piece is moving, what piece is captured (if any), what piece a pawn was promoted to (if any), whether it is a normal move, a castling move, or an en passant capture, and the starting and destination squares of the moving piece. For castling moves, it records the starting and destination squares for both the King and Rook (with maybe aliases “O-O” and “O-O-O” for traditional orthodox castling moves), for dead-simple clarity. This removes the need for a legal move generator when parsing the moves, and also gets rid of the need for disambiguation characters entirely. It also removes any need to keep track of where the pieces are. Simplicity always wins.
  • State which variant of chess we are playing, since there are many popular ones now, such as Chess960.
  • Design every feature to be computer-friendly, not human-friendly. Nobody writes raw PGN by hand from scratch, so there’s no concern for “alienating” any existing PGN users.
  • Maybe separate the actual game moves from the variations/commentary, to allow for easier basic parsing. The recursive structure used by PGN for defining move variations has been wildly successful (perhaps PGN’s only redeeming feature), and it makes sense to adopt this property. But I’m not sure what is the simplest way to represent the actual moves played vs. the variations/commentary.
  • Be very conservative against stylistic features (e.g., PGN’s Numeric Annotation Glyphs are an excellent example of what NOT to do).
  • Use XML or YAML (probably YAML). This would make it 10x easier to parse the game info, regardless of your programming-language-of-choice. This also automatically makes the standard computer-friendly.
  • Use Unicode.

The above ideas would certainly make the format more “verbose” and require more disk space. But in the age of hard drives in the hundreds-of-gigabytes range, I think it makes a lot of sense to sacrifice the extra kilobytes per game to achieve simplicity. Besides, few people keep more than, say, 10,000 files in a plaintext format such as PGN. I would be delighted to see Scid, PyChess or any other free program adopt these ideas to create a new chess game recording standard… The most important feature I’d like to see would be the lack of the need for a legal move generator when parsing the moves. This alone would make the parsing 100x easier.

UPDATES:

  • January 27, 2012: Don’t suggest the use of long algebraic notation. Instead, support a new notation called “Full Notation”. Also clarify some points about keeping track of game state.

Website Passwords: A Big Mess

TL;DR: You will inevitably be screwed when you try to change your website passwords.

So a few months ago, I changed all of my website passwords. I used a simple pseudorandom ASCII-only character generator to ensure the uniqueness of each one. In the process, I discovered that many websites have horrible, broken password interfaces.

This post is mainly a rant. Setting and changing passwords should never be difficult, and should be 100% transparent. We end users probably collectively wasted millions of hours with broken password interfaces, and will waste millions more until the issues below are addressed each time someone deploys a new website.

Special Characters

Many websites tell you a list of special characters that are not allowed in passwords. Sadly, this list is often incomplete. Worse still, some only accept alphanumeric passwords, but are silent as to this restriction — and to top it off, they don’t even bother to tell you why your chosen password is invalid! The gall.

It appears that the restriction against special characters is largely a matter of legacy vs. modern platforms. Newer websites like Wikipedia allow you to choose any character from a US ASCII keyboard. Many older institutions (Bank of America, for example) have very strange special character restrictions, which almost seem arbitrary (did you know that Bank of America calls passwords “passcodes”?).

What needs to be done: At a minimum, allow input of ALL characters from a US ASCII keyboard ([a-zA-Z0-9] and all punctuation characters and spaces (tabs are impossible to type into a text field in some browsers, so they can be excused)).

Password Length

This is the biggest problem. For roughly 1/2 of my website passwords, they have a maximum character limit. Some even enforce a 12-character limit (socalgas.com is one example). Some enforce a 16-character limit (bugs.freedesktop.org, login.live.com). Barnesandnoble.com has a 15-character limit (no space s allowed, alphanumeric only).

But the best part is this: many of these sites do not tell you about this limit. So, you can spend 5, 10 minutes thinking out a great mnemonic device for a fantastic password, and you’ll get hit with some “Invalid Password” error. Yet another well-meaning user slapped in the face.

Many sites are fixated on only preventing 3, 4 character passwords by implementing an interactive “password strength” meter that rejects short passwords. But they still fail to tell you that your password is too long.

EDIT: Bela pointed out in the comments another common bug: the site will happily accept your chosen password, but will truncate it to a shorter length (without telling you any warnings about it, of course).

What needs to be done: Explicitly tell the user exactly how many characters they may use, and if the password is too long, tell them about it.

Stupidity Award: access.enom.com

If you change your password at this site, be extremely careful: DO NOT choose a password that is more than 30 characters long. When I changed my password to a 50-character long password, it happily accepted it. Unfortunately, the actual log-in interface only lets you type in 30 characters long! Since access.enom.com has no contact information, you’ll have to call someone somewhere somehow to sort out this mess.

Realistic Outlook

Legacy systems are really, really hard to migrate out of. My prediction is that the stupid, broken web interfaces will continue to thrive for at least 20 years. Why? It’s because people in 2031 will still be using passwords that are around 10 characters long with mostly alphanumeric symbols. Sure, web standards will have evolved by that time, but human brains will still be the same. The steady flow of 10-character passwords by the overwhelming majority of users will ensure that legacy systems remain competitive, at least when it comes to dealing with passwords.

Hopefully, by 2111, we’ll have sane password interfaces for all websites. Perhaps it will become a web standard by then, enforced by an international e-court, or some such.

On Sleep

The wave of interest in recent times of so-called polyphasic sleep has become very pervasive. I’ll have to admit that even I took keen interest in it around the time I started this blog. I’ve read countless articles on sleep, and now, I’ve come to the following conclusion: the problem is not figuring out the bare minimum of hours of sleep you need each day; rather, the real problem is figuring out how you can be more active/productive while awake. If you cannot confidently say “I spend most of my awake-time doing productive things”, then there is no point of “hacking” your day so that you sleep less hours; what’s the point of buying more books if you cannot finish reading the ones you have already?

I think most people initially accept the proposition that if you sleep less, you’ll get more done. The common one-liner you’ll read in most articles goes something along the lines of “even if you shave off just 1 hour off of your usual sleeping cycle, this meager saving will net you X extra years of life!

This kind of thinking is plain wrong. Unless you are narcoleptic, sleeping too much is NOT the problem. The real problem is figuring out what you are going to do while you are awake. You can sleep 6 hours, 4, hours, 2 hours even, in a single day. Nobody cares. The only thing you should care about is how you spend your time.

Now, I do understand that for some people, spending more hours out of bed will guarantee more productivity. But this is only true if the pending task is very simple and does not require much decision-making. Unfortunately (or fortunately, rather), very few people in the modern world today share the same working career as child laborers did at the dawn of the Industrial Age. Unless a supervisor directs all of your attention to predefined goals from the minute you wake up to the minute you go to bed, so that you are reduced to a drone working for the Hive, sleeping less does NOT mean increased productivity.

Besides, there are far more important problems to tackle than “hacking” your brain to get less sleep:

  • How can I spend more quality time with my family?
  • How can I stay healthy?
  • How can I improve my relationship with my partner?
  • How can I improve my mind?
  • How can I conquer my fears?
  • How can I stop procrastinating?
  • …etc.

None of these important questions involve sleep as a material part of the solution. Sleep, my friend, is irrelevant. All of the questions above instead have the goal of increasing the quality of life. So don’t fool yourself into thinking that sleeping less is an integral part of self-improvement. Sleep, like everything else in life that can be objectively quantified, must be taken in moderation.

FYI: Text Files from Linux to Windows XP Notepad

Sometimes when you open up in Windows XP a text file that was created by someone using Linux (e.g., an open source project developer), you’ll get a really ugly block of solid text, with the lines all wrapped around each other. For a long time, I used to get ticked off at whoever had created this file. Well, now that I’ve gotten into programming (Ruby a while back, and now, C++), I’ve now realized that it’s really Windows’ fault, not Linux’s.

In Linux, if you press ENTER to type text on a new line, the program inserts an invisible character called a newline character. This character is represented as ‘\n’ by the program. When you open up the text file later, the text editor will look at all the newline characters, and create a new line each time it finds them.

In Windows XP, and for example in Notepad (and other simple applications), the program expects two characters to denote a newline — the regular ‘\n’ AND another character, the carriage return character, or ‘\r’. Isn’t this redundant and unnecessary? If we’re editing a text file — the only thing we need is a newline character, not a newline AND a carriage return, to denote the beginning of the next line of text.

So take your frustration to Windows. Linux wins once again.

Evolutionary, Not Revolutionary

Take a look at this article. And this one. And this one. And this one. And this one.

The common thread is this: the authors are all from the venerable http://www.anandtech.com tech news/reviews website, and they all express themselves with the play on the words “evolutionary” and “revolutionary.”

Oh, kill me now.

When I first encountered this combination of words some years ago (from Anandtech), I thought, “Ha, that’s cute — but is it really necessary to use two six-syllable words to convey that idea? Can’t they just say that the product is or is not very, oh I don’t know — innovative?” It just seems a little bit stupid to work the reader through the hurdles of e-vo-lu-tion-ar-y and re-vo-lu-tion-ar-y. Certainly, it doesn’t look, or sound, very intelligent either as far as the author(s) are concerned.

Besides, it’s not a case where one word out of the pair is confused for the other. Stylistically, that’s when such combinations really shine. For example — “apposite” vs “opposite”. The two are completely different in meaning (almost antonyms, really), and are sometimes confused to mean the other because of their 1-vowel difference. Another such pair is “accept” and “except.” And maybe even “accrete” and “accrue.” But evolutionary and revolutionary, when used together in the sense of “hey, reader, check this out,” is just annoying. And it’s moronic if you keep repeating it over and over again, like the folks at Anandtech. Lastly, it gets a little bit retarded if you keep mentioning the same pair to say the same message every time — that it’s evolutionary, not revolutionary (and never the other way around).

UPDATE August 27, 2008: OK, I just realized now that Anandtech isn’t the only source of the horrible wordplay. Google evolutionary “not revolutionary” and see for yourself.

“Leon” (1994) and “Most Wanted” (1997) – Plagiarism!

Summary: Most Wanted (1997) copied many, many parts out of Leon (1994) — action sequence for action sequence, camera angle to camera angle.

I watched a 5-minute clip of Most Wanted on TV a couple weeks ago, and it bothered me so much that I have to let it out here in this post.

The scene I watched had the main character escape from some cops in an upstairs apartment room. This action sequence copied so many things from Leon, that I was left with disgust. So here are all the things that were copied from the Leon‘s last action sequence:

  • when the main character gets almost shot by sniper fire from a nearby building, through the windows (and just before this happens, the main character sees the sunlight reflect off of the lens of a sniper rifle’s scope — which is exactly what the main character Leon describes to Matilda in Leon when he is teaching her about sniping); in Leon, the main character gets almost shot by sniper fire when he tries to save his potted plant
  • after some policemen raid the apartment room, the main character shoots all of them, and in response, an outside policeman says something like “man down!”; in Leon, after Leon takes out all of the first wave of policemen, a policeman outside the room says to his radio transmitter, “man down, man down,” and Stansfield (the villain) responds “I told ya”
  • the main character dresses up into a dead (or unconscious?) policeman’s uniform (SWAT uniform), and escapes; in Leon, Leon switches clothes to escape the hundreds of SWAT policemen lined up outside his apartment room

Seriously, the director of Most Wanted, David Hogan, copied all of this out of Luc Besson’s Leon. Shame on him. Yes Mr. Hogan, you have an artistic license as a director, or réalisateur as they say in French, to draw inspiration from primary source material. But to copy everything down to a line in a 5 minute sequence is just too much. It’s unacceptable. That’s why I flipped channels instead of deciding to sit down and finish the movie, which seemed interesting at first.

(By the way, Leon is probably my all-time favorite movie for its blend of comedy, tragedy, action, suspense, and even a touch of romance. There’s no movie quite like it.)