Thu, 08 Jan 2015

One way to use a simple usb foot pedal to control VLC Media Player for audio transcription, featuring NodeJS for Unix socket manipulation

(being an extended note-to-self so I don't have to scour the Internet for my sources after another three months).

I need to do some audio transcription, but foot pedals meant for this purpose are expensive. So is transcription software. Both of these ease and speed up the transcription process: which is very welcome since transcribing is slow work. For me, fifteen minutes of interview recording takes an hour to transcribe near-verbatim. A foot pedal allows you to pause and resume the recording without taking your fingers off the keyboard or having to switch windows or anything. Specialized software might provide useful hotkeys to drop in timestamps or automatically enter conversation partners' names on a new line when you press Enter.

Actually, for this project there was potentially some budget for expensive equipment (unlike my last transcription project, my zero-budget master's thesis. Back then, f4 was still free). But as I was pondering all this, the Internet was quick to point out to me that all of this could be done on a shoestring budget. Take a cheap usb foot pedal, said the Internet, in the range of 10-30 Euros (instead of 60 to 160 or more for 'transcription' pedals), and augment its powers with a few keymaps and quick scripts in your language of choice.

Me being someone in whose house a 30 Euro router is currently performing the functions of a 150 Euro router thanks to openwrt, this obviously appealed to me.

This was all several months ago; and I got it working fairly quickly. Now that I actually need to use it, I went digging for the scripts I had found on the Internet or pieced together from pieces on stackoverflow.com. But of course, I couldn't get away with such a cheap setup without paying a replacement price. I won't mention how many hours of my time exactly went into getting things working again. Finding all the sources I had used was one thing, but on top of that my OS has not been standing still in the meantime and I found I had a different mechanism to deal with to capture and conver the signal coming from the foot pedal. All the code I had found or written was clear enough, but it took a good deal of time figuring out how registering the keypress needed to be done with udev.

What follows now seems fishily roundabout, but it worked so once I had finally cleared up the other bother I left it as it is. The solution I had figured out was to use Node's net module to create a Unix socket to listen on, toggling a counter on or off at each signal from the foot pedal, and either pausing or resuming VLC accordingly. I remember wanting to try out Node since I was getting interested in it at the time, and was impressed with the modules available for operating system interaction.

What basically happens, on my Debian system, is this:

The USB pedal sends a scancode, which the system translates to a keycode which higher-level processes can then convert into symcodes (the glyphs then rendered by fonts on screen). Unhelpfully, by default pressing the pedal prints '1'. So we need to intercept the scancode and map it to some other keycode that we find more interesting. First, we find out what the scancode is with evtest:

$ sudo evtest

My device is no. 21 in the list, so I select that, then press the foot pedal and evtest tells me (among many other things) what it's scancode is: 0x7001e.

To do this, we register the input device in udev's hardware database by identifying the device with its vendor and product id, and specifying what scancodes we want to map to what keycodes. In /lib/udev/hwdb.d/90-custom-keyboard.hwdb I put:

keyboard:usb:v0426p3011* #identify hardware by vendor id and product id
 KEYBOARD_KEY_7001E=f12 #scancode (0x)7001e should map to f12

Then rebuild the hardware database and draw udev's attention to the change:

sudo udevadm hwdb --update
sudo udevadm trigger

Now when we press the foot pedal, it produces F12. Next, we need to do something with this key. I wrote a little script in Bash to send a signal to a socket, and I used xbindkeys to run this script whenever F12 is pressed. (You could also just use the bind command, but that's less sustainable: requires that terminal window to be open and you have to mess with the escape sequences for the key in question, also it's not persistent). The script is called signal.sh and looks like this:

#!/usr/bin/env bash

echo 'signal' | nc.openbsd -U /home/hans/keypress;

We're getting there. Here comes the Javascript. The following script called listen.js creates the socket to which the above script sends the signal, and calls the final script as a result.

#!/usr/bin/env node

//a script to listen for a certain keypress and do something with it.
//intended to be used with a footpedal for speeding up transcription.

var net = require('net');
var fs = require('fs');
var sys = require('sys')
var exec = require('child_process').exec;



var stat = 0;

//callback for exec to log result to stdout
function puts(error, stdout, stderr) { sys.puts(stdout) }

//define server and callback
var unixServer = net.createServer(function(client){

if (stat == 0){
        console.log('pause');
        stat = 1;
        exec("/home/hans/vlccontrol.py pause", puts);

}else if(stat == 1){
        console.log('play');
        stat = 0;
        exec("/home/hans/vlccontrol.py jogbackward && /home/hans/vlccontrol.py pause", puts);
    }
});



//recover server if already in use etc.
unixServer.on('error', function (e) {
    if (e.code == 'EADDRINUSE') {
        var clientSocket = new net.Socket();
        clientSocket.on('error', function(e) { // handle error trying to talk to server
            if (e.code == 'ECONNREFUSED') {  // No other server listening
                fs.unlinkSync('/home/hans/keypress');
                unixServer.listen('/home/hans/keypress', function() { //'listening' listener
                    console.log('server recovered');
                });
            }
        });
        clientSocket.connect({path: '/home/hans/keypress'}, function() { 
            console.log('Server running, giving up...');
            process.exit();
        });
    }
});

//listen to server
unixServer.listen('/home/hans/keypress');

The first block pulls in the modules we're going to use. The second block is just an if/else that calls vlccontrol.py with pause the first time the key is pressed, and with with jogbackward and (un)pause the next time. Pretty basic. The next block is copied pretty much straight from stackoverflow and is some error recovery if the filehandle for the socket already exists. Finally we start listening on the socket, so as long as this script is running in a terminal window every press of the foot pedal will get picked up and handled.

This bit of NodeJS glue code then calls the real stuff controlling VLC. I got it from VLC's wiki in this guide, which also explains how to configure some more sockets which this script uses to talk to VLC: send it commands and read status like the current time elapsed in the file you're playing. I modified it by encapsulating the timestamp stuff to its own function, because I also wanted to add a function to put interview participants' names in the transcript with a shortcut key, and occasionally get a timestamp to go along with those. I used xbindkeys again to map those shortcut keys, but for this I can directly call vlccontrol.py speakerswitch 1 (or 0 or 2 for other pre-defined speakers).

As I said before, there is suspiciously much redirection going on here with all these socket connections and glue scripts. The Python script makes yet more calls to xdotool to send text to the text editor (that's the final satisfaction, instead of a specialized program combining a rich-text editor with an audio player, I'm just transcribing in plain text in Geany or Gedit or such). But, I guess this collection of small independent tools is pretty Unixy, and it works, so once again we surpress the faint nagging desire to rather build a grand monolithic edifice to do all of this in one bundle, and declare it an elegant solution under the circumstances. Plus, I've wasted enough time figuring this out so if it works then that's good enough.

PS: the original resource on this topic for me was this one, and the one which finally got me on the right track concerning the changes to udev since that was written is here. You can follow the links from there ...