Monthly Archives: November 2013

Pretty impressed with GitHub

I’ve been hearing about GitHub for quite some time and have stolen numerous bits of open-source code from there. I didn’t get what all the hubbub was about though until I decided to sign up and throw a small project up there last night.

Wow. Everything is pretty dang slick – the standalone desktop utility for committing new code (about a hundred times easier than installing Git from scratch), the nice looking code diffs,  history maps, and tidy interface. It truly ENCOURAGES you to jump into some open source project and start contributing. Kudos to lowering the barrier to entry. It’s akin to how MindStorm made robotics fun and easy to dabble in. I’m optimistic about uploading some of my old projects and sprucing them up a bit.

Git itself I could take or leave. SVN is just fine for many things. But the Hub is really a fine force for good on the web.

When to bother with truly random numbers

My experience in computer programming has taught me that truly random numbers are often unnecessary, in the sense that convenient approximations to randomness usually turn out to be good enough for practical purposes.

-Donald Knuth, Things a Computer Scientist Rarely Talks About, p.41

Quantum physicists will argue that, with enough knowledge about the generator and the universe, random numbers don’t actually exist – we simply perceive them as random for all practical purposes.

Cryptographers often stress the proper generation of random numbers, especially in compiled code. If the underlying processor architecture is understood well enough, the next number can be guessed and key generation undermined. Oh no!

The frequency of computer security incidents and news in general during the past ten years has put the cryptographic side of random numbers perpetually on the map in programmers consciousnesses. However, we need to remember that the more sophisticated methods are rarely actually needed. You need not worry so much that your dice are technically a little bit loaded. Primitive “randomness” in most cases is completely fine so don’t worry about using those quick and easy built-in RAND() functions for your statistical sampling or your video game engine. They are random ENOUGH where it counts – in our own heads.

The four pillars of object-oriented programming – In four short sentences

Just about every textbook out there on C++ or Java or (fill in the blank) modern OO language includes a page or two on the three (sometimes four) pillars of object oriented programming. The problem is that the definitions are so abstract and filled with jargon that, in my opinion, they are largely useless, especially at the introductory level.

I’m going to try and restate them as simply as possible:

1. Abstraction

Stuff is broken into nice pieces.

2. Encapsulation

You don’t have to see what’s inside other stuff.

3. Inheritance

Some stuff is like other stuff.

4. Polymorphism

Because some stuff is like other stuff, they can share some ideas and behaviors.

I’m sure this can be improved upon.

Resist the urge to write a paragraph or two! Can once sentence do it?

A batch file for extracting audio from an MP4 and then muxing it back in

This is a continuation of my quest to process some 8,000 MP4 files in a on-demand library and perform some cleanup on their audio.

I wrote the cleanup program as a command-line app in C# the day before. Now I needed a way to point it at an existing file, extract the audio as PCM WAV, perform my modifications, and then create a new MP4 using the original video and the new audio. Along the way, I also needed to chop some extra filler material off the front of the video, shorting it by several seconds.

I could have written another C# app to do this or – better yet – used Power Shell, but I decided to go with an old-school DOS batch file. Getting the paths and quotes and variable handling all sorted out took some trial and error, but in the end it turned out fine. Using Swiss File Knife, I put this to work on the entire library and it’s already munched through about a thousand hours of material so far.

This depends on FFMPEG (any build from the past year or two should work), as well as the Nero AAC encoder, and my tool – discussed earlier – “IntroToneRemover”.

@echo off

echo Stripping and trimming intro tone from "%~f1"

echo Extracting wav from MP4
"c:\utils\ffmpeg" -i "%~f1" "%~f1.wav"

echo Detecting silence point and selectively muting intro
"c:\utils\introtoneremover" "%~f1.wav"

echo Read detected silence point into variable
( set /p silencepoint=
)<"%~f1.wav.silencepoint"

echo Silencepoint is %silencepoint%

echo Subtract 10 from the silence point as that is how many seconds of intro we want to keep
if silencepoint gtr 9 (
    set /a trimpoint=%silencepoint%-10
) else (
    set /a trimpoint=0
)

echo Will trim the first %trimpoint% seconds

echo Encoding audio back to AAC
"c:\utils\neroaacenc" -if "%~f1.wav" -q 0.56 -ignorelength -of "%~f1.newaudio.mp4"

echo Combing original video back with new audio
"c:\utils\ffmpeg" -i "%~f1" -i "%~f1.newaudio.mp4" -ss %trimpoint% -avoid_negative_ts 1 -c copy -map 0:v:0 -map 1:a:0 -shortest "%~f1.complete.mp4"

echo Cleaning up temp files and renaming final output file
del "%~f1"
del "%~f1.wav"
del "%~f1.wav.silencepoint"
del "%~f1.newaudio.mp4"
move "%~f1.complete.mp4" "%~f1"

echo Operation complete!

Actually there is something in here that doesn’t work. In ffmpeg, you can’t use -ss (seconds) to trim material from the front of the file when using the “copy” codec. Why? You can only truly cut on a keyframe and you don’t know where they are. They certainly aren’t right at the moment of your cut point unless you get extremely lucky. You have to reencode the video to make the cut actually work right. The file above resulted in out-of-sync audio. The video source I had was made with a lengthy 2-step encode and looked pretty good. I wanted to keep it as-is. It would have also taken ages to reencode the entire library without dividing up the work between multiple boxes or VM instances, so I decided to bag that part for now. In the end, I had to take the time-trimming part out – for now. Still, what is above could easily be adapted if you didn’t mind reencoding the video.

Intro cleanup – selectively writing silence to portions of digital audio

This is a continuation on my post from yesterday about easily programmatically measuring amplitude in a wav file.

Here, I’m looking to remove an abnoxious reference tone that occurs at the beginning of thousands of encoded videos in our library. The length of the tone is not consistent, though it does fall somewhere between the 10 and 30 second mark. I’ll need to find when it stops and then write silence (zeros) from the beginning until that point. I also save the value to a file where it can be referenced later by FFMPEG to trim the length of the video portion during muxing.

Before:

trimremover1

After:

trimremover3

trimremover2

I ran into a few pitfalls that had to be overcome when writing this. First, this is not safe:

Math.Abs(-32768);

Because the limit of a 16-bit integer, used in the wav file, extends one value lower into the negative ranges that it does into the positive. So if you encounter a sample with that value (and you WILL if you run across even a moment of digital clipping in your source, then the program will barf on you since +32768 is not possible, only +32767. I had to modify the absolute value function to account for this.

I also switched to only reading in the first 30 seconds worth of the wav file and then do a selective overwrite. This increased the performance from about 4 seconds to <100 ms. Finally, you'll see I added some limits to the "silence mark" that is detected. If it falls out of acceptable range, it will just assume something is wrong (perhaps the file is a rare anomaly with no tone or a late start) and will default to leaving it untouched. Tomorrow I'll be running this against about 3 TB of MP4s to clean up the beginnings.

class Program
{
    const int sampleRate = 48000;
    const int channels = 2; // Stereo
    const int bytesPerSingleChannelSample = 2;
    const int firstDataByte = 46; // The first 46 bytes have header info
    const int dataLengthToScan = (endScanPosition * sampleRate * bytesPerSingleChannelSample * channels);

    const int startScanPosition = 2; // Number of seconds into the audio before beginning scan. Sometimes the first second of audio can have garbage.
    const int endScanPosition = 30; // Longest amount of time to scan for silence.
    const int minSilencePoint = 6; // Lowest acceptable number for the silence mark. Anything lower than this would mean something is wrong.
    const int maxSilencePoint = 25; // Anything higher than this probably also means something is wrong.
    const int amplitudeSilenceThreshold = 100; // Average amplitude to be considered "silence";

    static void Main(string[] args)
    {
        if (args.Length != 1)
        {
            Console.Write("Usage: IntroToneRemover audiofile.wav");
            return;
        }

        byte[] data = new byte[dataLengthToScan];
        FileStream fileStream = new FileStream(args[0], FileMode.Open, FileAccess.Read);
        fileStream.Read(data, 0, dataLengthToScan);
        fileStream.Close();

        int silencePoint = FindSilencePoint(data);
        Console.WriteLine("SilencePoint=" + silencePoint.ToString());

        if (silencePoint >= minSilencePoint && silencePoint <= maxSilencePoint)
        {
            int head = firstDataByte;
            Console.WriteLine("Overwriting bytes " + firstDataByte + " to " + (silencePoint * sampleRate * bytesPerSingleChannelSample * channels).ToString());
            while (head < (silencePoint * sampleRate * bytesPerSingleChannelSample * channels))
            {
                data[head] = 0; // Overwrite with silence
                head++;
            }

            FileStream writer = new FileStream(args[0], FileMode.Open, FileAccess.Write);
            writer.Write(data, 0, dataLengthToScan);
            File.WriteAllText(args[0] + ".silencepoint", silencePoint.ToString()); // Write a file containing the value of the silence point. Can be picked up later by another app.
            Console.WriteLine("Tone removal complete!");
        }
        else
        {
            Console.WriteLine("SilencePoint was out of range. File left untouched.");
        }
    }

    public static int FindSilencePoint(byte[] data)
    {
        int head = firstDataByte; 
        int sampleCount = 1;
        List<short> sampleBuffer = new List<short>();
        List<Probe> probes = new List<Probe>();

        while (head <= (data.Length – 1))
        {
            short sampleAmplitude = BitConverter.ToInt16(data, head);
            sampleAmplitude = SafeAbs(sampleAmplitude);
            sampleBuffer.Add(sampleAmplitude);

            // After enough samples are collected in the buffer, print out their average amplitude and then clear the buffer
            if (sampleBuffer.Count >= (sampleRate))
            {
                probes.Add(new Probe() { Seconds = SamplesToSeconds(sampleCount), Amplitude = GetMedian(sampleBuffer) });
                Console.WriteLine(SamplesToSeconds(sampleCount).ToString("0") + ": " + GetMedian(sampleBuffer).ToString());
                sampleBuffer.Clear();
            }

            sampleCount++;
            // Advance the reading head to the next sample, skipping the second channel if it exists.
            // We only need to check the left channel of the stereo to simplify things
            head = head + (bytesPerSingleChannelSample * channels);
        }

        // Now look through all the probes we took and find the first substantially silent one
        int silencePoint = 0;
        foreach (var probe in probes)
        {
            if (probe.Seconds > startScanPosition && probe.Amplitude < amplitudeSilenceThreshold)
            {
                silencePoint = Convert.ToInt32(Math.Floor(probe.Seconds));
                break;
            }
        }

        return silencePoint;
    }

    public class Probe
    {
        public float Seconds { get; set; }
        public short Amplitude { get; set; }
    }

    public static short SafeAbs(short value)
    {
        if (value >= 0) return value;
        if (value == short.MinValue)
            return short.MaxValue;
        return Convert.ToInt16(-value);
    }

    public static short GetMedian(List<short> values)
    {
        short[] temp = values.ToArray();
        Array.Sort(temp);
        int middleIndex = Convert.ToInt32(Math.Floor((float)(temp.Length / 2)));
        return temp[middleIndex];
    }

    public static float SamplesToSeconds(int samples)
    {
        return samples / (sampleRate);
    }
}

Programmatically trimming intro material from a video by detecting amplitude changes in the audio

The company I work for has thousands of lectures available in a video-on-demand library. All of these videos begin with a title card that is displayed for 10-20 seconds before the hour-long lecture begins. During this time, a high pitch tone is played back. Way back in the 1980s, this was used to help set audio levels in the studio and for broadcast, but they really aren’t of any use now on the web. In fact, they can be downright annoying! I decided to write a bit of software to clean them up and make the user experience a bit more enjoyable.

Each video starts with 10-20 seconds of tone, followed by about 5 seconds of silence, and then the beginning of the video. As these scene changes are done live by hand during recording, they are a bit different in every case so there is no hard rule to follow. I needed a way to detect when the title card segment was complete. I could have analyzed the video frames themselves, but these were not always consistent. I decided to simply analyze the audio to find my breaks. First, I took several videos from the library and used ffmpeg to simply extract the audio as a PCM wav. Fortunately, the command-line defaults do this without any additional switches.

ffmpeg -i video.mp4 output.wav

The beginning of a typical audio track looked like this (screenshot from Audacity).

wavampprobe1

Fortunately, while pitch is complicated to calculate in digital audio, amplitude is very easy. For each 16-bit (2 byte) sample, you have a value between -32768 to +32787. Zero is complete silence and 32K is blow-your-ears-out loud. So all we have to do is read through the bytes of the wav file and keep track of how loud stuff is and we should be able to easily tell when the intro tone disappears and a few seconds of silence occur.

I first tried simply probing intervals (say, every 1000 samples), but they led to occasional anomalies. Then I switched to finding the mean of an entire seconds worth of audio. Finally, I switched to finding the median amplitude as this gave me an even more accurate reading.

class Program
{
    const int sampleRate = 48000;
    const int channels = 2;
    const int bytesPerSingleChannelSample = 2;

    static void Main(string[] args)
    {
        byte[] data = File.ReadAllBytes(args[0]);

        int head = 44; // The first 44 bytes have header info
        int sampleCount = 1;
        List<int> sampleBuffer = new List<int>();

        while (head <= (data.Length - 1) && head < 5000000) // Stop after reading 5 Megs of data - that is plenty
        {
            short sampleAmplitude = BitConverter.ToInt16(data, head);
            sampleAmplitude = Math.Abs(sampleAmplitude);
            sampleBuffer.Add(sampleAmplitude);

            // After enough samples are collected in the buffer, print out their average amplitude and then clear the buffer
            if (sampleBuffer.Count >= (sampleRate))
            {
                Console.WriteLine(SamplesToSeconds(sampleCount).ToString("0") + ": " + GetMedian(sampleBuffer).ToString());
                sampleBuffer.Clear();
            }

            sampleCount++;
            // Advance the reading head to the next sample, skipping the second channel if it exists.
            // We only need to check the left channel of the stereo to simplify things
            head = head + (bytesPerSingleChannelSample * channels);
        }
    }

    public static int GetMedian(List<int> ints)
    {
        int[] temp = ints.ToArray();
        Array.Sort(temp);
        int middleIndex = Convert.ToInt32(Math.Floor((float)(temp.Length / 2)));
        return temp[middleIndex];
    }

    public static float SamplesToSeconds(int samples)
    {
        return samples / (sampleRate);
    }
}

The output of the program for the same wav file is shown below.

wavampprobe2

You can easily see the cut to silence as the average amplitude drops from 6000+ to only 22 (practically dead silence) at the 17-second mark. A few seconds later, the video begins and some intro music fades in and values go back up.

Even though I’m wasting memory by reading in the entire large (500 meg) raw audio file, the application still only takes a couple of seconds to run. At this point, I can write zeros (silence) back over everything from beginning of the file to the location of the first second of silence. Additionally, when muxing the file back into the MP4 with ffmpeg, I can trim extra material from the front of the recording to make the length of the title card consistent. Scripting all of that is a job for tomorrow though.

For anyone looking to understand the PCM wav format, this is probably the easiest walkthrough to read.

Hacking an ancient braille printer – Part 2, translation and formatting

In this previous post I explained how I was able to send data to an ancient 1980s-era heavy-metal braille embosser. After experimenting with several different forms to text, I quickly discovered that any kind of page meta data (margins, etc.) would cause it to wig out. Also, just plain carriage returns (regardless of convention) would cause it to skip an entire line, wasting a lot of space.

I needed to come up with a way to remove line breaks and page breaks from documents and convert them to padded fixed-width 42-character lines, with a form feed every 25 lines for the tractor feed. I experimented with this by hand it the results seemed successful, so I threw together a simple text file formatting console application. This isn’t the most beautiful or optimized piece of code, but it worked after the first draft and build so I’m happy with it. I’ve included it at the bottom of this post.

So usage goes like this:

embosser1

Write or paste the text of the document you want to emboss into OpenOffice Writer. Then use the very nice odt2braille plugin to convert it to contracted grade 2 braille in ASCII (.brl) format.

embosser2

The file you get would work fine with a modern printer/embosser, but now we need to beat it into the proper shape for our iron-age equipment.

embosser0

After which we get something like this:

embosser3

Now we can use rawprint to send the data straight to the parallel port with no meta data. The final bug I had to overcome was occasionally random line breaks and page feeds. I discovered that simply power-cycling the embosser between each job caused this problem to go away. Some sort of buffer isn’t getting cleared otherwise. All in all the whole process only takes about one minute from start to finish. Then the embosser itself takes about a minute to loudly spit out a page. I was pretty happy with the results, as was my wife. This will get a lot of use in the future, without having to drop several thousand for a modern machine.

embosser5

static void Main(string[] args)
{
    if(args.Length < 3 || args.Length > 3)
    {
        Console.WriteLine("Usage: EbosserFormat input.txt output.txt singlespace|doublespace");
        return;
    }

    string inputFile = args[0];
    string outputFile = args[1];
    string flag = args[2];

    bool doubleSpace = false;
    if (flag == "doublespace")
        doubleSpace = true;
    
    string contents = File.ReadAllText(inputFile);

    List<string> lines = new List<string>(contents.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None));
    List<string> outputLines = new List<string>();

    int lineCount = 1;
    int pageCount = 1;
    foreach (var line in lines)
    {
        string line2 = line.Trim().Replace("\f", "");
        if(line2.Length > 42)
            Console.WriteLine("Warning: Line " + lineCount.ToString() + " was more than 42 characters long. It will be broken in the output.");

        outputLines.Add(line2.PadRight(42));

        if (doubleSpace)
        {
            outputLines.Add("".PadRight(42));
            lineCount++;
        }

        // If we just added line 25 of a page, then we need a page break to advance the printer feed forward to the next page
        if (lineCount % 25 == 0)
        {
            outputLines.Add("\f");
            pageCount++;
        }

        lineCount++;
    }

    string outputContents = "";
    foreach(var line in outputLines)
    {
        outputContents += line;
    }

    File.WriteAllText(outputFile, outputContents);

    Console.WriteLine("File " + outputFile + " written.");
    Console.WriteLine(lineCount.ToString() + " lines on " + pageCount.ToString() + " pages.");
}

Simple Share from Admin WordPress Plugin

I decided to write a very simple plugin for WordPress to help me selectively share posts on Twitter and Facebook. There are lots of plugins like this out there already though, so why write another one? The answer is that none of them met my needs or requirements for how I wanted to share links. The other ones out there are all either too automatic or sophisticated resulting in spammy posts and unwanted analytics. I wanted something more manual. Why?

  • Diversity: I have three different blogs: a pretty active personal blog, a moderately active programming blog, and a pretty slow coffee blog. I want to be able to easily post links from any of them.
  • Selectivity: I don’t want to share every post. I share most posts on Twitter, but certain kinds of scrapbook or research posts I don’t wish to share at all. Posts that might be a bit on the boring side (or inversely, on the controversial side!) I definitely don’t want to draw attention to on Facebook. I didn’t want something that just read my RSS feed and looked for new data.
  • Timing: Most of the time I want to share a post right after it’s finished. But sometimes, I want to wait a few hours or even days to share it. This is especially the case if I write 3-4 posts in one sitting. I don’t want to spam my friends, but maybe dole them out over the following week. Occasionally, I will want to share a post from a year or two ago if it suddenly becomes relevant due to some current event or commentary.

To accomplish this, I was frequently cutting and pasting titles and links between windows, occasionally messing up keystrokes and annoying myself. I tired of that after a year or so. I should have done something about it much earlier!

simple-share

Nothing quite foot the bill though, so I decided to write my own. The plugin is simply called Simple Share from Admin. In the WordPress administration back-end, it adds a single column to the listing of posts page. Click on “All Posts” on the menu to see this table. There you will find a “Tweet this Post” and a “Share on Facebook” link. Click either one of them and new tab will open up asking you to confirm the action. Twitter will apply its own URL shorter.

That’s it. Your browser must have an active, logged in session with Twitter and/or Facebook for this to work. The back-end API for either service is not used – only simple URLs and query strings. Multiple accounts are of course not supported. If you need something fancier, then this plugin is not for you – go try one of the other ones.

You can download it from the WordPress plugin directory here.