Thu, 08 Jan 2015

One way to use a simple usb foot pedal to control VLC Media Player for audio transcription, featuring NodeJS for Unix socket manipulation

(being an extended note-to-self so I don't have to scour the Internet for my sources after another three months).

I need to do some audio transcription, but foot pedals meant for this purpose are expensive. So is transcription software. Both of these ease and speed up the transcription process: which is very welcome since transcribing is slow work. For me, fifteen minutes of interview recording takes an hour to transcribe near-verbatim. A foot pedal allows you to pause and resume the recording without taking your fingers off the keyboard or having to switch windows or anything. Specialized software might provide useful hotkeys to drop in timestamps or automatically enter conversation partners' names on a new line when you press Enter.

Actually, for this project there was potentially some budget for expensive equipment (unlike my last transcription project, my zero-budget master's thesis. Back then, f4 was still free). But as I was pondering all this, the Internet was quick to point out to me that all of this could be done on a shoestring budget. Take a cheap usb foot pedal, said the Internet, in the range of 10-30 Euros (instead of 60 to 160 or more for 'transcription' pedals), and augment its powers with a few keymaps and quick scripts in your language of choice.

Me being someone in whose house a 30 Euro router is currently performing the functions of a 150 Euro router thanks to openwrt, this obviously appealed to me.

This was all several months ago; and I got it working fairly quickly. Now that I actually need to use it, I went digging for the scripts I had found on the Internet or pieced together from pieces on But of course, I couldn't get away with such a cheap setup without paying a replacement price. I won't mention how many hours of my time exactly went into getting things working again. Finding all the sources I had used was one thing, but on top of that my OS has not been standing still in the meantime and I found I had a different mechanism to deal with to capture and conver the signal coming from the foot pedal. All the code I had found or written was clear enough, but it took a good deal of time figuring out how registering the keypress needed to be done with udev.

What follows now seems fishily roundabout, but it worked so once I had finally cleared up the other bother I left it as it is. The solution I had figured out was to use Node's net module to create a Unix socket to listen on, toggling a counter on or off at each signal from the foot pedal, and either pausing or resuming VLC accordingly. I remember wanting to try out Node since I was getting interested in it at the time, and was impressed with the modules available for operating system interaction.

What basically happens, on my Debian system, is this:

The USB pedal sends a scancode, which the system translates to a keycode which higher-level processes can then convert into symcodes (the glyphs then rendered by fonts on screen). Unhelpfully, by default pressing the pedal prints '1'. So we need to intercept the scancode and map it to some other keycode that we find more interesting. First, we find out what the scancode is with evtest:

$ sudo evtest

My device is no. 21 in the list, so I select that, then press the foot pedal and evtest tells me (among many other things) what it's scancode is: 0x7001e.

To do this, we register the input device in udev's hardware database by identifying the device with its vendor and product id, and specifying what scancodes we want to map to what keycodes. In /lib/udev/hwdb.d/90-custom-keyboard.hwdb I put:

keyboard:usb:v0426p3011* #identify hardware by vendor id and product id
 KEYBOARD_KEY_7001E=f12 #scancode (0x)7001e should map to f12

Then rebuild the hardware database and draw udev's attention to the change:

sudo udevadm hwdb --update
sudo udevadm trigger

Now when we press the foot pedal, it produces F12. Next, we need to do something with this key. I wrote a little script in Bash to send a signal to a socket, and I used xbindkeys to run this script whenever F12 is pressed. (You could also just use the bind command, but that's less sustainable: requires that terminal window to be open and you have to mess with the escape sequences for the key in question, also it's not persistent). The script is called and looks like this:

#!/usr/bin/env bash

echo 'signal' | nc.openbsd -U /home/hans/keypress;

We're getting there. Here comes the Javascript. The following script called listen.js creates the socket to which the above script sends the signal, and calls the final script as a result.

#!/usr/bin/env node

//a script to listen for a certain keypress and do something with it.
//intended to be used with a footpedal for speeding up transcription.

var net = require('net');
var fs = require('fs');
var sys = require('sys')
var exec = require('child_process').exec;

var stat = 0;

//callback for exec to log result to stdout
function puts(error, stdout, stderr) { sys.puts(stdout) }

//define server and callback
var unixServer = net.createServer(function(client){

if (stat == 0){
        stat = 1;
        exec("/home/hans/ pause", puts);

}else if(stat == 1){
        stat = 0;
        exec("/home/hans/ jogbackward && /home/hans/ pause", puts);

//recover server if already in use etc.
unixServer.on('error', function (e) {
    if (e.code == 'EADDRINUSE') {
        var clientSocket = new net.Socket();
        clientSocket.on('error', function(e) { // handle error trying to talk to server
            if (e.code == 'ECONNREFUSED') {  // No other server listening
                unixServer.listen('/home/hans/keypress', function() { //'listening' listener
                    console.log('server recovered');
        clientSocket.connect({path: '/home/hans/keypress'}, function() { 
            console.log('Server running, giving up...');

//listen to server

The first block pulls in the modules we're going to use. The second block is just an if/else that calls with pause the first time the key is pressed, and with with jogbackward and (un)pause the next time. Pretty basic. The next block is copied pretty much straight from stackoverflow and is some error recovery if the filehandle for the socket already exists. Finally we start listening on the socket, so as long as this script is running in a terminal window every press of the foot pedal will get picked up and handled.

This bit of NodeJS glue code then calls the real stuff controlling VLC. I got it from VLC's wiki in this guide, which also explains how to configure some more sockets which this script uses to talk to VLC: send it commands and read status like the current time elapsed in the file you're playing. I modified it by encapsulating the timestamp stuff to its own function, because I also wanted to add a function to put interview participants' names in the transcript with a shortcut key, and occasionally get a timestamp to go along with those. I used xbindkeys again to map those shortcut keys, but for this I can directly call speakerswitch 1 (or 0 or 2 for other pre-defined speakers).

As I said before, there is suspiciously much redirection going on here with all these socket connections and glue scripts. The Python script makes yet more calls to xdotool to send text to the text editor (that's the final satisfaction, instead of a specialized program combining a rich-text editor with an audio player, I'm just transcribing in plain text in Geany or Gedit or such). But, I guess this collection of small independent tools is pretty Unixy, and it works, so once again we surpress the faint nagging desire to rather build a grand monolithic edifice to do all of this in one bundle, and declare it an elegant solution under the circumstances. Plus, I've wasted enough time figuring this out so if it works then that's good enough.

PS: the original resource on this topic for me was this one, and the one which finally got me on the right track concerning the changes to udev since that was written is here. You can follow the links from there ...

Tue, 26 Nov 2013

Computer Literature Queue

The Ode Community Book Club has just started up its second edition: we're reading Dive into HTML5 and Beginning HTML and CSS. After this we'll probably read something on Git, so I've got my reading line d up for the next few months, but nonetheless I've also been adding some more to the list.

Read the rest of this post

Fri, 14 Jun 2013

Inventory-management: trying out some lightweight photo managers

The idea

I recently came across a fairly old article on in which Chad Files explains how to use the f-spot photo manager to create an inventory of possessions. The idea is quite simple: you use the tagging/organizing functionality of a photo management application to organize photographs you take of all the things you have to make a searchable/sortable inventory. Indeed, this is by no means tied to f-spot: there are quite a few similar programs with the same functionality allowing tagging, categorizing, and organizing in collections.

This is interesting for my personal itch of wanting to create a digital representation of things I have in storage and in my document archive. I've been using tellico -- an excellent app, and it does the job, but with two major shortcomings for my purposes.

  1. Folders, boxes and other objects which contain other objects cannot be identified as such. In other words: to record that an object (say, a tax notice) is in a particular folder (say Folder A), you have to add the folder as a property of the object, and you cannot create an object 'Folder A' of type folder that knows that it contains that letter.

  2. Entry of objects is slow and tedious and duplicate prone (partially as a result of point 1), and bound to a pre-determined set of object properties.

Read the rest of this post

Converging on the short stack, clearing out mental cruft

I give myself 300 words for this post (including the title and this notice).

I'm cleaning the house and getting some important correspondence done, today on Friday my free day. Working four days a week - having a three-day weekend -- is almost like leading a double life! The trouble seems to be that I do all of the same stuff in the other half of my life: trying to work with computers, just beyond my capabilities.

I've borrowed some important insights from Mike Levin today. His thesis of the need for a 'short stack' resonates with what I've felt for a while. I think he's expressing it well: Unix is a masterful base of flexibility, and knowing how to use Unix at the level of muscle memory is going to be the key to survival in a fast-changing technological landscape. I didn't say that very well; go read his site. and in particular his Levinux project.

That brings me to well over half my wordcount. The point is this. I'm an information addict. My use of computers is conditioned to be one of consumption. I keep dropping into 'hang mode', hanging in front of the computer consuming more and more information. Even when I start out to take some concrete steps to automate or reliable-ize or secure some of my computer usage, I end up spending most of my time looking for the perfect tool, or brainstorming the perfect tool which is several steps beyond my current practical coding abilities.

So in forty words: my life is not in computers. Computers are part of where the many lines from which my life emerges take place. But I can do and organize things outside computers too. Words used up; more later.

Sat, 25 May 2013

Talking with computers

When trying to make sense of how my use of computers works, there's a guiding metaphor which seems to help me. It's the idea of streams of conversation.

What I mean by this, essentially, is that all my activities with the computer, and all the computer's activities in support of my activities making use of it, are ongoing streams of events: conversations, logs, checking states and taking action based on that.

Read the rest of this post

Wed, 22 May 2013

Blog publishing with a private 'cloud'

git-annex for archiving

I'm planning on trying out git-annex very soon. Git-annex is a system built on top of the git source control management system which makes it possible to store large files in git repositories and use the system as a file synchronizer and archiver. Git by itself can't handle large files very well, but git-annex makes this better and also sort of automates the synchronization process.

According to its website, git-annex "is not a backup system [but] it may be a useful component of an archival system, or a way to deliver files to a backup system".

My backup system is horribly out of date. Since git-annex also has some more Dropbox-like features that I have wanted to do for myself for some time, I plan on trying it out very soon as a way to synchronize files and 'deliver files to a backup system'.

blogging on a filesystem-based publishing platform with git-annex (or just rsync)

I also realized yesterday that git-annex and similar systems might have a bonus advantage for people like me who use a plain-text blogging platform. Since, unlike Dropbox, with git-annex (I think?) I have fine-grained control over which directories sync with which directories, I can use it to effectively synchronize my blog post drafts, and publish new posts, from any machine without having to log in and/or mount my remote drives.

Note that rsync could be just as effective for this. Using git-annex means using git which means everything is in version control, though. (Of course, I could sync with rsync while keeping everything in version control beside that).

Plus, apparently git-annex now has an android application. Type short, tweet-like blog posts on my phone on the go, and queue them up to sync to my hosting account over ssh as soon as I get a connection? I like the idea.

If this works, I think it would solve a problem that has been keeping me from posting more often. I've complained about it before. It's a very small thing, but somehow it does affect my propensity to post. This is that while it's great to be able to start a blog post anytime, anywhere, in a plain text file, this means that it's an additional step to mount my remote filesystem and transfer the post. Or remember my login details if I want to post from somewhere else. Of course, with a web-based interface (e.g. Wordpress) the hurdle is simply one step forward: to log in before you can start posting. So it isn't really a disadvantage. _But_, with something like git-annex (or just rsync), I get the advantage of being able to start blog posts cheaply (just open a text editor), and (from all my computers anyways) the computer will take care of copying it to the server (and to all my other work and backup locations).

I'm looking forward to trying it.

Wed, 10 Apr 2013

A tutorial on building an ncurses application with Perl that I mostly understand

Just need to reread the part of Learning Perl on installing modules.

here it is on CPAN.

Fri, 15 Mar 2013

Dealing with a backlog in exercises

I've got into the position where I have two (consecutive) sets of end-of-chapter exercises to do for Learning Perl.

Instead of trying to do everything at once to catch up, I'm going to sit down and do 20 minutes. Using a recently-reminded-of piece of wisdom: read (work) for understanding, not for memory. So I'm going to give my full attention to these 20 minutes, to make sure the work I do helps me understand the material, NOT to get the exercises done.

Wed, 13 Mar 2013

Hyperlinks as a kind of data organization method?

(note: this is braindump quality writing)

I'm interested in lowest-common-denominator ways of structuring and ordering data on computers. This has recently been sparked by realizing that computers simply follow instructions, and understanding computers really is about understanding their languages (at all levels: from logic gates and bits through programming languages to storage of human languages in text).

Anyways, I'm thinking about how much computer information can actually be stored in plain text. And I was wondering about hyperlinks. Could hyperlinks function as a simple way of structuring information and organizing relationships between pieces of information?

Read the rest of this post

Tue, 05 Mar 2013

Update to Learning Perl Book Reading Club Progress Chart

I think it's time for an update to my previous progress chart for my Learning Perl reading. It's nice to see this visually.

Remember, the total number of chunks in this chart is 76, including 59 days of approximately 5 pages a day, plus one day per set of chapter exercises. My current progress after the first day of reading in Chapter 11: 48 days, or 63% of the total.

That feels a lot slower than at my previous report when it was 32%. But it's really good progress, and I can definitely feel how far I've come not just in the number of pages but in my overall feel for Perl and programming.

(Click on the image to see it full-size).