Adding Replay Gain in Linux Automatically

UPDATE February 1, 2010: This post is now somewhat obsolete, as mp3gain supports writing to ID3v2 tags instead of APE tags. Read the warning below.

UPDATE July 17, 2010: WARNING: mp3gain version 1.5.1 (latest stable command line version) seems to corrupt ID3v2 tag information with the “-s i” option (write info into ID3v2 tags instead of default APE tags). For me, it seems to corrupt the JPEG image data inside the ID3v2 tag for some reason. If you care about the existing ID3v2 tags in your mp3 files, do NOT use mp3gain on them with the “-s i” option (the “-s a” option which is the default and only writes to APE tags, does not do any harm if you only care about APE tags). A good tool to check your tag info (ID3v2, APE, etc.) is with MP3 Diags (if you’re on Arch Linux, get it here). So for now, the method below is still my preferred way of doing things (mp3gain with APE and then ape2id3.py).

Replay gain tags in mp3, flac, and ogg files lets the player adjust the volume accordingly to make the song sound not too loud and not too soft. The Music Player Daemon (aka MPD), my favorite music player, recognizes replay gain tags if present. However, for mp3 files, the popular APE format for replay gain tags are not supported; MPD can only read ID3 tags for replay gain for mp3’s. Luckily, an mp3 file can have both APE and ID3 tags. This means that we can use mp3gain (a cute, simple command-line app available in pretty much every Linux distro) to add APE tags into our mp3s with:

mp3gain [file(s)]

This will add APE replay gain tags into the mp3 file(s) chosen. Then, we can use a simple script to read these APE replay gain tags, convert them into ID3 tags, and put these ID3 tags into the same mp3s. Such a script, thankfully, already exists here. Now, the mp3 file will have both APE and ID3 tags for replay gain values! And MPD will happily use the ID3 values.

If you’re in a hurry, you’d automate this process for entire directories, recursively. That’s what this most excellent Linux page on replay gain page suggests. (You should REALLY bookmark that page, as it has the best information hands down about replay gain in Linux.) I’ve modified the scripts for metaflac there to make it work with mp3’s instead. Here are the two scripts:

Script A:

#!/bin/bash
# Define error codes
ARGUMENT_NOT_DIRECTORY=10
FILE_NOT_FOUND=11
# Check that the argument passed to this script is a directory.
# If it's not, then exit with an error code.
if [ ! -d "$1" ]
then
    echo -e "33[1;37;44m Arg "$1" is NOT a directory!33[0m"
    exit $ARGUMENT_NOT_DIRECTORY
fi
echo -e "33[1;37m********************************************************33[0m"
echo -e "33[1;37mCalling tag-mp3-with-rg.sh on each directory in:33[0m"
echo -e "33[1;36m"$1"33[0m"
echo ""
find "$1" -type d -exec ~/syscfg/shellscripts/replaygain/mp3/tag-mp3-with-rg.sh '{}' \;

Script B (the ‘tag-mp3-with-rg.sh’ script referenced above in Script A):

#!/bin/bash
# Error codes
ARGUMENT_NOT_DIRECTORY=10
FILE_NOT_FOUND=11
# Check that the argument passed to this script is a directory.
# If it's not, then exit with an error code.
if [ ! -d "$1" ]
then
    echo -e "33[1;37mArg "$1" is NOT a directory!33[0m"
    exit $ARGUMENT_NOT_DIRECTORY
fi
# Count the number of mp3 files in this directory.
mp3num=`ls "$1" | grep -c \\.mp3`
# If no mp3 files are found in this directory,
# then exit without error.
if [ $mp3num -lt 1 ]
then
    echo -e "33[1;33m"$1" 33[1;37m--> (No mp3 files found)33[0m"
    exit 0
else
    echo -e "33[1;36m"$1" 33[1;37m--> (33[1;32m"$mp3num"33[1;37m mp3 files)33[0m"
fi
# Run mp3gain on the mp3 files in this directory.
echo -e ""
echo -e "33[1;37mForcing (re)calculation of Replay Gain values for mp3 files and adding them as APE2 tags into the mp3 file...33[0m"
echo -e ""
# first delete any APE replay gain tags in the files
mp3gain -s d "$1"/*.mp3
# add fresh APE tags back into the files
mp3gain "$1"/*.mp3
echo -e ""
echo -e "33[1;37mDone.33[0m"
echo -e ""
echo -e "33[1;37mAdding ID3 tags with the same calculated info from above...33[0m"
echo -e ""
# the -d is for debug messages if there are any errors, and the -f is for overwriting any existing ID3 replay gain tags
~/syscfg/shellscripts/replaygain/mp3/ape2id3.py -df "$1"/*.mp3
echo -e ""
echo -e "33[1;37mDone.33[0m"
echo -e ""
echo -e "33[1;37mReplay gain tags (both APE and ID3) successfully added recursively.33[0m"
echo -e ""

And here is the APE to ID3 conversion script from the link above (the ‘ape2id3.py’ script called from Script B):

#! /usr/bin/env python

import sys
from optparse import OptionParser

import mutagen
from mutagen.apev2 import APEv2
from mutagen.id3 import ID3, TXXX

def convert_gain(gain):
   if gain[-3:] == " dB":
       gain = gain[:-3]
   try:
       gain = float(gain)
   except ValueError:
       raise ValueError, "invalid gain value"
   return "%.2f dB" % gain

def convert_peak(peak):
   try:
       peak = float(peak)
   except ValueError:
       raise ValueError, "invalid peak value"
   return "%.6f" % peak

REPLAYGAIN_TAGS = (
   ("mp3gain_album_minmax", None),
   ("mp3gain_minmax", None),
   ("replaygain_album_gain", convert_gain),
   ("replaygain_album_peak", convert_peak),
   ("replaygain_track_gain", convert_gain),
   ("replaygain_track_peak", convert_peak),
)

class Logger(object):
   def __init__(self, log_level, prog_name):
       self.log_level = log_level
       self.prog_name = prog_name
       self.filename = None

   def prefix(self, msg):
       if self.filename is None:
           return msg
       return "%s: %s" % (self.filename, msg)

   def debug(self, msg):
       if self.log_level >= 4:
           print self.prefix(msg)

   def info(self, msg):
       if self.log_level >= 3:
           print self.prefix(msg)

   def warning(self, msg):
       if self.log_level >= 2:
           print self.prefix("WARNING: %s" % msg)

   def error(self, msg):
       if self.log_level >= 1:
           sys.stderr.write("%s: %s\n" % (self.prog_name, msg))

   def critical(self, msg, retval=1):
       self.error(msg)
       sys.exit(retval)

class Ape2Id3(object):
   def __init__(self, logger, force=False):
       self.log = logger
       self.force = force

   def convert_tag(self, name, value):
       pass

   def copy_replaygain_tag(self, apev2, id3, name, converter=None):
       self.log.debug("processing '%s' tag" % name)

       if not apev2.has_key(name):
           self.log.info("no APEv2 '%s' tag found, skipping tag" % name)
           return False
       if not self.force and id3.has_key("TXXX:%s" % name):
           self.log.info("ID3 '%s' tag already exists, skpping tag" % name)
           return False

       value = str(apev2[name])
       if callable(converter):
           self.log.debug("converting APEv2 '%s' tag from '%s'" %
                          (name, value))
           try:
               value = converter(value)
           except ValueError:
               self.log.warning("invalid value for APEv2 '%s' tag" % name)
               return False
           self.log.debug("converted APEv2 '%s' tag to '%s'" % (name, value))

       id3.add(TXXX(encoding=1, desc=name, text=value))
       self.log.info("added ID3 '%s' tag with value '%s'" % (name, value))
       return True

   def copy_replaygain_tags(self, filename):
       self.log.filename = filename
       self.log.debug("begin processing file")

       try:
           apev2 = APEv2(filename)
       except mutagen.apev2.error:
           self.log.info("no APEv2 tag found, skipping file")
           return
       except IOError:
           e = sys.exc_info()
           self.log.error("%s" % e[1])
           return

       try:
           id3 = ID3(filename)
       except mutagen.id3.error:
           self.log.info("no ID3 tag found, creating one")
           id3 = ID3()

       modified = False
       for name, converter in REPLAYGAIN_TAGS:
           copied = self.copy_replaygain_tag(apev2, id3, name, converter)
           if copied:
               modified = True
       if modified:
           self.log.debug("saving modified ID3 tag")
           id3.save(filename)

       self.log.debug("done processing file")
       self.log.filename = None

def main(prog_name, options, args):
   logger = Logger(options.log_level, prog_name)
   ape2id3 = Ape2Id3(logger, force=options.force)
   for filename in args:
       ape2id3.copy_replaygain_tags(filename)

if __name__ == "__main__":
   parser = OptionParser(version="0.1", usage="%prog [OPTION]... FILE...",
                         description="Copy APEv2 ReplayGain tags on "
                                     "FILE(s) to ID3v2.")
   parser.add_option("-q", "--quiet", dest="log_level",
                     action="store_const", const=0, default=1,
                     help="do not output error messages")
   parser.add_option("-v", "--verbose", dest="log_level",
                     action="store_const", const=3,
                     help="output warnings and informational messages")
   parser.add_option("-d", "--debug", dest="log_level",
                     action="store_const", const=4,
                     help="output debug messages")
   parser.add_option("-f", "--force", dest="force",
                     action="store_true", default=False,
                     help="force overwriting of existing ID3v2 "
                          "ReplayGain tags")
   prog_name = parser.get_prog_name()
   options, args = parser.parse_args()

   if len(args) < 1:
       parser.error("no files specified")

   try:
       main(prog_name, options, args)
   except KeyboardInterrupt:
       pass

# vim: set expandtab shiftwidth=4 softtabstop=4 textwidth=79:

It’s pretty simple. Script A just calls Script B recursively on every directory found inside the designated directory. Script B finds mp3 files, and first tags them with APE replay gain tags with mp3gain. Then, it calls the APE to ID3 conversion script above to add equivalent ID3 tags into those same mp3s. I’ve modified Script B so that it first deletes any APE replay gain tags already present in the mp3 file before doing the replay gain calculations — but this is optional. I also added a bunch of ANSI color escape codes to Script A and B so that they look prettier. These three scripts work beautifully well together with mp3 files inside directories. However, the directories MUST be album directories, as all mp3 files found in a directory are treated as having come from the same album for album replay gain tags (track replay gain tags are always independent on a per-file basis).

I should probably rewrite Script A and B in Python to make it easier to maintain — but everything is pretty simple as it is. If you have 1 big folder full of mp3’s from different artists/albums, then you could change the

mp3gain "$1"/*.mp3

in Script B into something like

for file in $mp3files
do
    mp3gain "$file"
done

This way, mp3gain is called separately for each mp3 file (instead of being called once for all mp3 files in the folder). Now you don’t have to worry about the mp3’s in that folder being treated as having come from 1 album for those album gain tags. To top things off, you should edit your shell’s config (e.g., .zshrc), and alias Script A to something easy, like rgmp3, so that you can just do

rgmp3 [directory]

to get this whole thing to work. Now run this command on your master mp3 folder, take a nap, and come back. All your mp3’s will now have both APE and ID3 replay gain tags!

I hope people find this useful. I’ve googled and googled countless times about replay gain in the past, and until I discovered the excellent link mentioned above a couple days ago, I could never really get replay gain tags working for my mp3’s.

Advertisements

14 thoughts on “Adding Replay Gain in Linux Automatically

  1. Pingback: Fire & Forget bash script for Ogg and MP3 ReplayGain tagging « duesenklipper

  2. The use of metadata and this metadata conversion process is not strictly necessary. Reading the mp3gain man page offers this:

    “mp3gain optionally writes gain adjustments directly into the encoded data. In this case, the adjustment works with all mp3 players, i.e. no support for a special tag is required. This mode is activated by any of the options -r, -a, -g, or -l.”

    -r or -a are the commonly used methods:

    -r being apply Track gain automatically
    -a being apply Album gain automatically

    The tag information is written anyway because this allows the gain adjustment to be undone, despite the changes having been made directly into the encoded data.

    By default mp3gain warns when applying gain may induce clipping, but it seems to be conservative so a warning of possible clipping doesn’t necessarily mean there will be any. In my limited experience I’ve found it safe to ignore the warnings, and now have the warning suppressed by default. If in the future I do encounter clipping I can always undo the changes with ‘mp3gain -u ‘

    The big advantage of ‘hardcoding’ the gain and using the tags only as a means for undoing if required, is that it works even with software players and hardware devices which don’t have any support for ReplayGain. So MPD and its clients work just fine, and so does your pocket mp3 player. As far as I know the only mp3 players (hardware) which properly support Replay Gain are Sansa Fuze, Sansa Clip and anything which runs Rockbox. I have a Fuze and it fully supports ReplayGain in MP3, Ogg Vorbis and Flac, and so do my old iRiver players which run Rockbox. But my Nokia tablet doesn’t, so on that I use only mp3 so that the hardcoded mp3gain gives me the same benefit.

  3. Thanks Takla for your valuable input. I investigated some more, and this is what wikipedia says at the time of this writing:

    MP3Gain can modify files in two ways:

    * write metadata;
    * modify MP3 data, writing undo information as metadata.

    In both cases, it first computes the desired gain (volume adjustment), either per track or per album, using the Replay Gain algorithm.

    In the first instance, it simply writes this as a tag (in APEv2 format), which can be read by other applications that implement Replay Gain.

    In the second instance, it modifies the overall volume (scale factor) in each MP3 frame, and writes undo information as a tag. Note that this is completely reversible. By subtracting the modification that was done and removing the tag, it does not introduce any digital generation loss because it does not decode and re-encode the file.

    So I guess what you are referring to is the second instance. I was not aware of this — thanks for pointing it out.

    I’ve also noticed that mp3gain now supports writing ID3v2 instead of APE tags with the -s i option. I will rewrite the scripts to reflect these new ideas and post an update when I find the time.

  4. I think some people will always prefer the metadata method (touching the raw audio is heresy!) but in practice there’s no difference to the listener and it’s all reversible. The other benefit is that the scripts remain very simple, so are more easily understood by a novice (as I am with scripting) and easily adapted.

    I’ve been using scripts based on the same ones you linked to and the second script (the one called after the directory listing is obtained by find) becomes as simple as:

    #!/bin/bash

    # Script created by Bobulous, October 2008.
    # See http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html
    #
    # This script takes as an argument a directory name,
    # then uses mp3gain to add Replay Gain tags (for album and
    # track gain) to each mp3 file in that directory.
    # Then metaflac is used to display the Replay Gain values for
    # each track.
    #
    # Use find (with -exec) to call this script on
    # a directory structure containing mp3 files, so that this
    # script is run on each directory in that structure. E.g.
    # find ./music/mp3 -type d -exec ‘command’ ‘{}’ \;
    #
    # See http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html
    #

    # Error codes
    ARGUMENT_NOT_DIRECTORY=10
    FILE_NOT_FOUND=11

    # Check that the argument passed to this script is a directory.
    # If it’s not, then exit with an error code.
    if [ ! -d “$1” ]
    then
    echo “Arg “$1″ is NOT a directory!”
    exit $ARGUMENT_NOT_DIRECTORY
    fi

    # Count the number of mp3 files in this directory.
    mp3num=`ls “$1″ | grep -c \\.mp3`

    # If no mp3 files are found in this directory,
    # then exit without error.
    if [ $mp3num -lt 1 ]
    then
    echo $1″ (No mp3 files, moving on)”
    exit 0
    else
    echo $1″ (“$mp3num” mp3 files)”
    fi

    # Run mp3gain on the mp3 files in this directory.
    echo “Calculating Replay Gain values for mp3 files.”
    mp3gain -a -k “$1″/*.mp3 # using -c instead of -k introduces possible clipping (ignores clipping warning when applying gain)

    echo “”

    It works fine, though cautious users would probably omit the -c option. I did my whole collection recently which is about 13000 Ogg Vorbis but also about 6000 mp3 and a handful of m4a and mpc. You can see that the same basic script is fine for vorbisgain, mpcgain, aacgain with only minimal adjustment (as easy as a quick find+replace in a text editor in some cases).

  5. I apologize for the late comment approval — I was away from the computer for some time. Yes, your script is indeed much simpler because you don’t have to deal with metadata info.
    Thanks again, Takla.

  6. I’ve since found some nasty clipping using ‘mp3gain -a -c’ so I undid the gain using ‘mp3gain -u’ and instead use the script with

    ‘mp3gain -a -k’

    This is much better, applies track and album gain but doesn’t allow clipping.

  7. Thanks – I’ll try building the binary (I’m on Ubuntu, and 1.5.1 doesn’t seem to have made it into the repo yet.)

  8. To me it looks like mp3gain’s id3 support is still *incomplete*:
    as far as I can see no replaygain_* id3 tags get added, but only some MP3GAIN_* ones … which are ignored by many software and portable players.

    So, those additional steps are still needed 😦

    I don’t understand why it is so hard for the mp3gain developers to include support for the replaygain_* id3 tags:
    are they willing to push their standard?

  9. sorry, I was too fast:
    mp3gain writes the replaygain information to the ID3v2’s RVA2 frame … which is supported now by many players …

  10. Thx a lot for those scripts, although the color commands didn’t work here, which makes the CLI output kind of … weird. 🙂

  11. @hennes: If the colors don’t work properly, then it’s probably a terminal/terminfo issue (as you’ve probably noted yourself). Looking at these scripts again (the bash ones), I can see that they are quite ugly (I’ve improved a lot in shell scripting since I’ve written them). Maybe I’ll do a re-write, with everything in zsh — maybe even converting the python one into a shell script while I’m at it… (but first I’ll have to re-rip all of my CD’s into FLAC files, since mp3’s are so 1998.) Give me a few years… :p

  12. Nevermind about the coding, the scripts are working fine here. 😉

    I’m thinking about wrapping up all these scripts for handling FLAC/MP3/OGG into a single program. Actually I already did this in a bash-script, but it’s kind of ugly and lacking some features like timestamp recognition (which would come in very handy when adding files to a previously processed big collection). Maybe I’ll do this in python someday…

Comments are closed.