Shell printf tricks

Do you know the printf trick in shell?

Shell printf repeats its format pattern if you give it more arguments than the pattern has.
You can use this for all kinds of tricks:

$ printf "/some/path/%s " file1 file2 file3 file4
/some/path/file1 /some/path/file2 /some/path/file3 /some/path/file4 
$ printf "%s=%s\n" a 1 b 2 c 3 d 4
a=1
b=2
c=3
d=4

It’s is really nice when you combine it with some other output:

$ printf -- "-p %s " $(ps aux | grep fcgiwr[a]p | f 2)
-p 10613 -p 10615 -p 10616 -p 10617 -p 10619 -p 10620 -p 10621 -p 10622 -p 10623

Say I wanted to pass all the pids of a program that forks a lot to strace. I could do:

strace $(printf -- "-p %s " $(ps aux | grep fcgiwr[a]p | f 2))

An aside, you may have noticed that non-standard f command in the ps pipeline. I got that from here a long time ago (2008 according to the file’s timestamp) and it’s really fantastic—perfect for all sorts of unixy things that output tables (ps, ls -l, etc).

I’m so dumb

I was banging my head against a wall trying to get help for go build with go build help (and for some reason go doesn’t support go build --help, but I kept getting this error:

package help is not in GOROOT (/opt/homebrew/Cellar/go/1.18.3/libexec/src/help)

Does brew not come with docs by default? How else am I supposed to interpret that?

No, I’m just dumb. It’s not go build help, it’s

go help build

Stuck on the final Maquette level

Last night I was playing Maquette and I got stuck on the final level (called “The Exchange”) for a really long time. I was at a place with some towers with crystals above them and I had opened the doors to the first tower; Inside was a switch. Flipping the switch made a bunch of arches appear, leading to another building.

But aside from that nothing happened. I wandered around for about an hour before getting ashamed that I couldn’t solve the puzzle and looking up the answer on the internet. But it turns out, I had encountered a bug! Flipping the switch is supposed to open a door! But instead I just got the switch sound and nothing else happened. I tried loading my last autosave but the same thing happened.

Finally I tried selecting the “Restart” option from the pause menu. I was nervous (restart what exactly? The Level, the Area, the Game?) so I hard saved my game first. Turns out it just reset back to the beginning of “The Exchange”. Within 1 minute I was back at the switch in question and this time I could hear (and see) it open a door when I hit it!

So, if you’re stuck in that section of the game and it seems like you’re just not getting it, it’s not you, the game is just bugged.

Failed Emacs builds, hanging kernels, abort(), oh my

My nightly Emacs builds stopped about a month and a half ago. A couple days after I noticed it was failing I tried to debug the issue and found that building openssl was hanging—I found that Jenkins was timing out after an hour or so. I should mention that it’s dying on a Mac OS X 10.10 (Yosemite) VM, which is currently the oldest macOS version I build Emacs for. I tried building manually in a terminal—the next day it was still sitting there, not finished. I decided it was going to be annoying and so I avoided looking deeper into it for another month and a half (sorry!). Today I tracked it down and “fixed” it—here is my tale…

I (ab)use homebrew to build openssl. Brew configures openssl then runs make and then make test. make test was hanging. Looking at the process list, I could see 01-test_abort.t was the hanging test. It was also literally the first test. Weird. I checked out the code:

#include <openssl/crypto.h>

int main(int argc, char **argv)
{
    OPENSSL_die("Voluntary abort", __FILE__, __LINE__);
    return 0;
}

Well, that seems straightforward enough. Why would it hang? I tryed to kill off the test process to see if it would continue. There was a lib wrapper, a test harness and the actual binary from the source shown above—they all died nicely except for the actual aborttest executable. I couldn’t even kill -9 that one—that usually means there’s some sort of kernel issue going on—everything should be kill -9able.

Next I ran it by hand (./util/shlib_wrap.sh test/aborttest) and confirmed that the test just hung and couldn’t be killed. I built it on a different machine and it worked just fine there. So I dug into the openssl code. What does OPENSSL_die() do, anyway?

Not much:

// Win32 #ifdefs removed for readability:
void OPENSSL_die(const char *message, const char *file, int line)
{
    OPENSSL_showfatal("%s:%d: OpenSSL internal error: %s\n",
                      file, line, message);
    abort();
}

Ok, that’s nothing. What about OPENSSL_showfatal()? Also not much:

{
#ifndef OPENSSL_NO_STDIO
    va_list ap;

    va_start(ap, fmta);
    vfprintf(stderr, fmta, ap);
    va_end(ap);
#endif
}

That’s just a print, nothing exciting. Hmmm. So I wrote a test program:

#include <stdlib.h>

int main() {
  abort();
}

I compiled it up and… it hung, too! What?? Ok. I tried it as root (hung). Tried it with dtruss:

...lots of dtruss nonsense snipped...
37772/0xcaf1:  sigprocmask(0x3, 0x7FFF5DD71C74, 0x0)         = 0x0 0
37772/0xcaf1:  __pthread_sigmask(0x3, 0x7FFF5DD71C80, 0x0)       = 0 0
37772/0xcaf1:  __pthread_kill(0x603, 0x6, 0x0)       = 0 0

So it got to the kernel with pthread_kill() and hung after that. So I tried another sanity check: In one terminal I ran sleep 100. In another I found the process id and did kill -ABRT $pid. The kill returned, but the sleep was now hung and not able to be killed by kill -9, like everything else. Now I was very confused. This can’t be a real bug, everyone would be seeing this! Maybe it’s a VM emulation issue caused by my version of VMWare? I can’t upgrade my VMWare because the next version after mine requires Mac OS 10.14 but this Mac Mini of mine only supports 10.13. Sigh. Also, the Emacs builds were working just fine and then they suddenly stopped and I hadn’t updated the OS or the host OS or VMWare. Nothing was adding up!

As I sanity check, I decided to reinstall the OS on the VM (right over top of the existing one, nothing clean or anything). There was a two hour long sidetrack here with deleting VM snapshots, resizing the VM disk (which required booting into recovery mode), downloading the OS installer and finally letting the install run. But that’s not important. The important part is that I opened up terminal immediately after the OS installed and ran my abort() test:

$ ./test
Abort trap: 6
$ 

It worked! How about OpenSSL?

$ ./util/shlib_wrap.sh test/aborttest
test/aborttest.c:14: OpenSSL internal error: Voluntary abort
Abort trap: 6
$ 

Yay! But why?? I don’t actually know. Was it a corrupt kernel? A bad driver that got installed? (What drivers would get installed on this Jenkins builder?) I don’t feel very satisfied here. I’m quite skeptical, in fact! But it’s working. Emacs builds should start coming out again. And I can ignore everything again until the next fire starts! 🙂

Fixing the blower motor in my central air system

On Wednesday as I was going to bed I noticed it was quite hot in my house. I checked my central air blower unit and there was frost on the coils and the blower wasn’t moving. It kept trying to start but not being able to start. The 7 segment led was showing “b5” which I was able to find in the manual:

So the next day I removed the blower assembly and tried to extract the motor. The motor shaft extended quite far past the end of the fan and it was rusty so it didn’t want to come off. I attempted to get it off using gravity and a hammer:

I failed. I was only able to get it this far:

So I took it to my dad’s house because he has a better tools than I do. We ended up drilling the end of the shaft out to get the motor removed. We tested on the bench and read a lot of troubleshooting manuals and determined that the motor was shorted. It was hard to turn and there were very obvious magnetic “detents” we could feel when turning it by hard. We took the motor apart and looked around:

We measured the resistance across all the pairs of pins on the 3 pin motor connector: 2 pairs measured 3 ohms and one pair was at 0.9 ohms. We kept the meter plugged into the shorted pair and moved a bunch of wires around. At some point we noticed it had changed up to 3 ohms but we weren’t sure which part we had messed with to make it that way. All attempts to short it out again to identify the bad wire area failed. From this point on it was never shorted.

We put it all back together, fixed the drilled out shaft by cutting it off with a hacksaw and then sanding off all the sharp edges and rust. I took it home installed it and it spun up. It ran for about two minutes and then died. Same symptoms. I kind of expected it since we really didn’t fix anything, just shuffled things around, but it was still disappointing.

The next day I ordered a used motor off ebay and suffered through a hot July Friday.

Saturday I decided to have one last ditch effort to fix it since the weekend was supposed to be upwards of 90ºF. I got it apart and found this:

That looks obviously bad and you’d wonder how we could miss it, but that’s only because it’s blown up so big. That wire is one of the skinny winding wires around the motor. In fact I couldn’t actually see the issue at all until I used my phone camera as a magnifying glass. I pulled out some slack on the wire around that break to see if there were any more issues. it looked like this:

To me it looked clearly exposed. I had the meter plugged in this whole time and I could tell as I freed this bad area the short cleared. To fix, I wrapped it in electrical tape and then put it back down where it was:

Even pressing it down as much as I could I couldn’t get it to show a short on the meter, so I believed I had it fixed. I put it all back together:

I mounted it up and tested it. It worked! And more importantly, didn’t die after just a couple minutes. Eveything’s been running all day now and my house is finally back down into the sub 80ºF range. I didn’t have central air until a couple years ago, and it’s amazing how fast I’ve transitioned to thinking that 85ºF is unlivably hot.

Now I just have to decide what to do with the replacement I bought off ebay. I just know that if I cancel the order then my motor will die just a couple days later. But if I don’t cancel, then my fix will work for the next 50 years…

Screw Shareholder Value

I was digging around and I found the original SSV announcement posted to Dominion (the (in)famous Sisters Of Mercy Mailing list).

What is SSV? Just read the letter—it explains everything. More info can be found here, here, or here, or here.

Here is the email, reproduced with headers for posterity:

Received: from maekong.ohm.york.ac.uk by rewind.indigita.com (NTMail Server 2.11.25 - ntmail@net-shopper.co.uk) id David Mon, 27 Oct 97 10:50:32 +0000 (PST)
Received: by maekong.ohm.york.ac.uk
                  id m0xPtmn-00009TC
                  (/\oo/\ Smail3.1.29.1 #29.7); Mon, 27 Oct 97 18:21 GMT
Sender: <dominion-request@maekong.ohm.york.ac.uk>
Received: from emout29.mail.aol.com ([ 198.81.11.134]) by maekong.ohm.york.ac.uk
                   with smtp id m0xPtmf-00009PC
                   (/\oo/\ Smail3.1.29.1 #29.7); Mon, 27 Oct 97 18:21 GMT
Received: (from root@localhost)
                    by emout29.mail.aol.com (8.7.6/8.7.3/AOL-2.0.0)
                          id NAA28825 for dominion@maekong.ohm.york.ac.uk;
                            Mon, 27 Oct 1997 13:20:53 -0500 (EST)
Date: Mon, 27 Oct 1997 13:20:53 -0500 (EST)
From: Baxville@aol.com
Message-ID: <971027131941_697440177@emout09.mail.aol.com>
To: dominion@maekong.ohm.york.ac.uk
Subject: SSV - Go Figure
Comments: Sisters Of Mercy mailing list
Sender: dominion-request@maekong.ohm.york.ac.uk
X-Info: indigita mail server
Mime-Version: 1.0

SSV  "Go Figure"

After years of stalemate, Mr Andrew Eldritch has managed to get East West to
accept (in lieu of two remaining Sisters albums) a record bearing no
resemblance whatsoever to The Sisters Of Mercy. This album will be released -
if at all - under a completely different name, which is just as well, as it's
got practically nothing to do with the Sisters. Furthermore, East West agreed
not to hear the album in advance....  so it bears no resemblance to *any*
quality product, let alone the Sisters.

....in return for which, Mr Eldritch agrees to let East West keep 75000
pounds which they owe him, and which they were refusing to pay in any event.
Unsurprisingly in the circumstances, Mr Eldritch made sure that East West got
the record they deserve, but made sure that they then paid a lot more for it
than he deserves.... ....especially for one afternoon recording the
occasional mumble on the reject material of some amateur acquaintances from
Hamburg. Go figure.

It is rumoured, indeed, that the whole album only took two days from start to
finish (somewhat less than the usual nine months), and that the "rather bad
sub-techno" music was under-average and boring even before the drums were
mysteriously removed. The "lyrics" dwell almost exclusively on the
glorification of shooting people and selling drugs to schoolchildren. It is
rumoured that the full name of the band (SSV-NSMABAAOTWMODAACOTIATW) stands
for "Screw shareholder value - not so much a band as another opportunity to
waste money on drugs and ammunition courtesy of the idiots at Time Warner".
Go figure.

How desperate must the corporation be? Desperate enough to try and force an
artist to record with threats of massive litigation after a seven-year
impasse, expecting him to make a good record while witholding a fortune in
back-royalties, and then desperate enough to pay "a very large amount" for a
record which the artist neither wrote, nor composed, nor arranged, nor
produced, whose content is merely a puerile attempt to be as offensive as
possible? Go figure.

Finally, rumour has it that the record company are planning to release it,
which would, you might think, be a conscious insult to the general public if
East West were smart enough to know a pile of crap when they hear it. Either
way, go figure.

Anyway, Mr Eldritch is now free to resume his recording career with The
Sisters Of Mercy, who have been waiting very patiently for him at a chemist's
round the corner, and very sensibly having nothing to do with the SSV album
 - because they didn't have to.

The Sisters will be celebrating the liberation with a small tour of Europe
and America in January/February, and the release of a stonking new (Sisters)
single on the day after Mr Eldritch's contract officially expires, which will
be a couple of months later. Work has started on the next (Sisters) album.
Normal service has been resumed.

Oh, and in case you haven't got the picture, we do NOT recommend the purchase
of an SSV album, should East West actually try to sell one to you. We
recommend that you wait for the magic of the Net to mysteriously provide you
with a free one  - just this once.

BaxCorp

DKIM and Exim4 on Debian

I wanted to get DKIM working on an Debian box I have that runs Exim. The first thing to do is to create the keys:

$ openssl genrsa -out diamonds.key 4096
$ openssl rsa -in diamonds.key -pubout > diamonds.pub

I was following these instructions and noticed that Exim supports ed25519 DKIM signatures. Neat! I decided I may as well create those keys, too:

$ openssl genpkey -algorithm ed25519 -out hearts.key
$ openssl pkey -outform DER -pubout -in hearts.key | tail -c +13 | base64 > hearts.pub

From there I stuck the public keys in DNS:

diamonds._domainkey.example.com. 3600 IN TXT  "v=DKIM1; k=rsa; t=s; p=MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAxDMS3KRFCU4PEtygOUdALBt7jmz5IIX2+KHoV4fd0CLjXRvOqA5H8rU3e+y1lNese9yjPLksPqiOh5vtx8Tysjv2MSTXB1Kgr0tl+1IlJL4ihdpUgR1veKB5X4wK3Ppkr5Oy42H7xNHf/yj6aC1E+alZ8TdssuHY3ReqO6YvGa72UqTMmL1gBl9SXBUl" "vD+FqvfFtkQFFMU9QSTtrIuzcup6NC6z3a4I4UGz4YOZSxeUARKzySGFzPd7vwmrKEZVhlA0HzmJm9eGrjq6IiLVdgTJhSZ8Ecn9h65x9EjhNYYhsufTbcPDljlZYpA4b+dkTEs35a4KjOM2wY7gUdY9ydOqOCfz2BpzJ25Mn3K8nTV8a7fInWCnKg0sm6Fuiwe0DrQjrTe7xGC3Y03CU8eziynOukyWnfsCAnpWcUGa15bp1/O0Le+ZYsKOWxA" "CL5cKlYPw1VJrqz7ZQ1i+s+twOLgEKWm8gwKMsDysgpM1WvE+IhlJkkZLkWavF9pAKeSD6akkHcbkB3QsDKgNugDC4EEm6XV/+hPcTS9Gmd2PYPswxg8nlEdUDjxLul6UbKzWwkYihzKxhMSqCEXTUkt6eHjT+KAIHXVm86elFEmOcuadUWwr+74fgnTpv1XbWIs5qqqh/zROhvUUR8EXZbjOchFEX3YjLO8NDPqHdW4zHt0CAwEAAQ=="
hearts._domainkey.example.com. 3600 IN TXT    "v=DKIM1; t=s; k=ed25519; p=MTGVeSXmIzviF/B+ANc/bLqP2zEWhO/rw6o8HxIl5+8="

Ed25519 is quite compact! t=s:y is found in the DKIM RFC (section 3.6.1). t is for various flags. s is for strict (I’m just guessing the mnemonic)—it means all the domain names have to match. Apparently you don’t want this if you use subdomains in your email addresses (I don’t). y means “This domain is testing DKIM”—ie, don’t worry if it fails. It seemed reasonable to set that while I was playing around.

Next, I had to set up Exim in Debian. This was kind of a pain because there’s the Exim config, then the Debian wrapper around that config. This is made more complicated by the fact that Debian has a debconf option named dc_use_split_config. You can see which way yours is set in /etc/exim4/update-exim4.conf.conf (the double .conf is not a typo!). If it’s false then when you update /etc/exim4/conf.d you first have run /usr/sbin/update-exim4.conf.template which cats everything in the conf.d dir into /etc/exim4/exim4.conf.template. Then you have to run /usr/sbin/update-exim4.conf which combines /etc/exim4/exim4.conf.localmacros and /etc/exim4/exim4.conf.template and puts the resulting final config file in /var/lib/exim4/config.autogenerated. Fwew.

The basic DKIM config is in /etc/exim4/exim4.conf.localmacros. I added these lines:

DKIM_CANON = relaxed
DKIM_SELECTOR = diamonds : hearts
DKIM_DOMAIN = example.com
DKIM_PRIVATE_KEY = /etc/exim4/dkim/$dkim_selector.key

For my setup this wasn’t enough. The DKIM_* macros are only used by the “remote_smtp” transport (found in /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp). I was using a “satellite” configuration with a smarthost. This means it uses the “remote_smtp_smarthost” transport (found in /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost). You can tell what transport is being used by looking for T= in /var/log/exim4/mainlog.

I copied all the DKIM related stuff from /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp into /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost, namely these lines:

# 2019-08-26: David added these:
.ifdef DKIM_DOMAIN
dkim_domain = DKIM_DOMAIN
.endif
.ifdef DKIM_SELECTOR
dkim_selector = DKIM_SELECTOR
.endif
.ifdef DKIM_PRIVATE_KEY
dkim_private_key = DKIM_PRIVATE_KEY
.endif
.ifdef DKIM_CANON
dkim_canon = DKIM_CANON
.endif
.ifdef DKIM_STRICT
dkim_strict = DKIM_STRICT
.endif
.ifdef DKIM_SIGN_HEADERS
dkim_sign_headers = DKIM_SIGN_HEADERS
.endif

Then I ran update-exim4.conf.template and update-exim4.conf and finally systemctl restart exim4.

At this point I could send emails through and the DKIM headers were added.

Next I removed the y flag from the t flags in the DNS since everything appeared correct. I also added the ADSP DNS record:

_adsp._domainkey.example.com. 3600 IN  TXT     "dkim=all"

Then I wrote this post and called it a night!

AT&T causes mDNS on Linux To Fail

Today my Jenkins builds were not working because all of the build slaves were offline. Digging around in the logs showed that the couldn’t connect because of name resolution failures. I use mDNS on my network (the slaves are Mac OS X VMs running on a Mac Mini), and so they were named something like xxxx-1.local and xxxx-2.local I tried just pinging the machines and that refused to resolve the name, too.

I verified that Avahi was running, and then used avahi-resolve --name xxxx-1.local to check the mDNS name resolution. It worked just great.

So why would mDNS be working fine network-wise, but no programs were resolving correctly? It struck me that I didn’t know (or couldn’t remember!) how mDNS tied in to the system. Who hooks in to the name resolution and knows to check for *.local using mdns?

It turns out it’s good old /etc/nsswitch.conf (I should have remembered that)! There’s a line in there:

hosts:          files mdns4_minimal [NOTFOUND=return] dns

That tells the libc resolver (that everything uses) that when it’s looking for a hostname, it should first look in the /etc/hosts file, then it should check mDNS, then if mDNS didn’t handle it, check regular DNS. Wait, so mDNS is built right in to libc??

Nope! On my Debian system there’s a package called libnss-mdns that has a few files in it:

/lib/x86_64-linux-gnu/libnss_mdns.so.2
/lib/x86_64-linux-gnu/libnss_mdns4.so.2
/lib/x86_64-linux-gnu/libnss_mdns4_minimal.so.2
/lib/x86_64-linux-gnu/libnss_mdns6.so.2
/lib/x86_64-linux-gnu/libnss_mdns6_minimal.so.2
/lib/x86_64-linux-gnu/libnss_mdns_minimal.so.2

Those are plugins to the libc name resolver so that random stuff like mDNS doesn’t have to be compiled into libc all the time. In fact, there’s a whole bunch of other libnss plugins in Debian that I don’t even have installed.

So my guess was that this nss-mdns plugin was causing the problem. There are no man pages in the package, but there are a couple README files. I poked around trying random things and reading and re-reading the READMEs many times before this snippet finally caught my eye:

If, during a request, the system-configured unicast DNS (specified in /etc/resolv.conf) reports an SOA record for the top-level local name, the request is rejected. Example: host -t SOA local returns something other than Host local not found: 3(NXDOMAIN)This is the unicast SOA heuristic.

Ok. I doubted that was happening but I decided to try their test anyway:

$ host -t SOA local
local. has SOA record ns1-etm.att.net. nomail.etm.att.net. 1 604800 3600 2419200 900

Crap.

Those bastards at AT&T set their DNS server up to hijack unknown domains! They happily give out an SOA for the non-existant .local TLD. So AT&T’s crappy DNS is killing my Jenkins jobs??? Grrrrr…

The worst part is that I tried to use Cloudflare’s 1.0.0.1 DNS. My router was configured for it. But two things happened: 1. I got a new modem after having connection issues recently, 2. I enabled IPv6 on my router for fun. The new modem seems to have killed 1.0.0.1. I can no longer connect to it at all. Enabling IPv6 gave me an AT&T DNS server through DHCP (or whatever the IPv6 equivalent is).

So, straightening out my DNS (I had to revert back to Google’s 8.8.8.8) caused NXDOMAIN responses to the .local SOA, and that caused mDNS resolution to immediately work, and my Jenkins slaves came back online. Fwew.

Firefox imgur failure

My imgur had been acting up for a while now. In my Firefox it wouldn’t render the actual image but rendered most of the rest of the page just fine:

If I went to i.imgur.com (adding the .png or .jpg to the url) then the image loaded just fine. So it wasn’t getting blocked by anything in my network stack (ad-blocker and the like). I opened up the console to take a look and I saw this error:

10:57:53.646 SecurityError: The operation is insecure.                             analytics.js:34:17
             consoleDebug    jafo.js:838:8
             directFire      jafo.js:272:12
             _sessionStart   jafo.js:563:16
             init/<          jafo.js:134:12
             b               lodash.min.js:10:215
             d               raven-3.7.0.js:379:23
             j               https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:26855
             fireWith        https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:27673
             ready           https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:29465
             I               https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:29656

The line in question was doing localstorage.Get("something-or-other"). So Firefox was mysteriously throwing a SecurityError when accessing local storage. After testing a million different things (disabling ad-blockers, privacy extensions, etc.) and reading countless Stack Overflow and forum posts, I discovered that this was because I had "https://imgur.com" listed in my cookie exceptions as "Block".

This was actually surprisingly hard to find because the Firefox preference UI for cookie exceptions is pretty bad—it doesn't sort nicely and it doesn't have a search. It's very confusing because the "Manage Cookies and Site Data" screen looks almost identical but sorts correctly (all domains and subdomains grouped together) and has a handy search bar. If you are used to that screen and then try to use the cookie exceptions screen I can almost guarantee you'll type the cookie into the text box at the top and then scratch your head for a few seconds trying to figure out why the list isn't reacting to your search, only to realize that the text box is for adding exceptions and not for searching! Sigh. At least you can click the "Website" column header to get a plain radix sort instead of the default sort of "Status" (which apparently doesn't do a secondary sort on the Website so everything is just random!). When sorting by "Website" take care to thoroughly search—"http" and "https" versions of the site are separate (this is what cost me the most time).

Once I removed the block, imgur started working again:

Yay. Kinda sucks that imgur's code isn't defensive enough to deal with localstorage exceptions, which I thought were widely known to happen (I know on sites I run I see them happening for crazy errors like "Disk Full" and "IO Error").

Debian OpenSSL 1.1.0f-4 and macOS 10.11 (El Capitan)

Some people were reporting that an IMAP server wasn’t working on their Mac. It was working from linux machines, and from Thunderbird on all OSes. From Macs I was getting this testing from the command line:

$ openssl s_client -connect <my-imap-server>:993
CONNECTED(00000003)
39458:error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-64.50.6/src/ssl/s23_clnt.c:593:

This led me to a recent libssl package upgrade on my server (to version 1.1.0f-4). Checking the changelogs I found this:

  * Disable TLS 1.0 and 1.1, leaving 1.2 as the only supported SSL/TLS
    version. This will likely break things, but the hope is that by
    the release of Buster everything will speak at least TLS 1.2. This will be
    reconsidered before the Buster release.

Ah-ha! To quickly get back up and running I grabbed the old version from http://ftp.us.debian.org/debian/pool/main/o/openssl/libssl1.1_1.1.0f-3_amd64.deb and installed it (and then held the package so it wouldn’t auto-upgrade).

I do hope Debian reconsiders this change, at least in the short term, since I can’t easily force OS upgrades to everyone that uses this server. Ideally Apple would update their old OSes to support TLS 1.2, but I’m not holding my breath.

Fedora libvirt/qemu error on upgrade

Today we upgraded a server that ran a bunch of VMs to Fedora 25 and all the VMs failed to come back online after rebooting.

Checking the logs I found this:

2017-04-14T08:39:35.304547Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x10000 in != 0x20000: Invalid argument
2017-04-14T08:39:35.304579Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
2017-04-14T08:39:35.304759Z qemu-system-x86_64: load of migration failed: Invalid argument
2017-04-14 08:39:35.305+0000: shutting down

After searching fruitlessly for all of those error messages in Google, I finally discovered the problem. The upgrade had shut libvirt down, which saved the state of all the machines into /var/lib/libvirt/qemu/save/. Then it started it back up and the save files were no longer compatible with the new qemu/libvirt binaries.

The solution was to just delete the save files in /var/lib/libvirt/qemu/save/ (which is a little sad, because it’s like yanking the power cord out of a running box). The VMs all started fine after that.

Horizon Zero Dawn Xi Cauldron Is Confusing

After you override the core in the Xi Cauldron (which unlike the others I’ve played through so far, happens pretty early), Aloy says “now I can override more machines”. But annoyingly, you don’t actually get the override until you finish the quest, which happens when you exit the cauldron. Along the way back you’ll encounter a few of the machines that require the Xi override and I found it frustrating when it wasn’t working. I was even thinking maybe it was a bug and I didn’t trigger something correctly, especially because Aloy mutters to herself, “I can’t override this machine yet—I’ll have to explore more cauldrons”. The game acts like you have the override, but you don’t get it until later. I googled but didn’t find anything, so I wrote this. If this happens to you, don’t be confused.

Rust nightly + Homebrew

I used to use this Homebrew “Tap” to install Rust nightly versions, but it stopped working at some point. I messed around with it a lot and determined it had something to do with newer Rusts not liking the install_name_tool calls that homebrew forces on libraries during installation.

So I looked into rustup (the nice shell version, not the crazy rust binary version which sadly seems to be on track to replace it) and came up with the following shell function.

rust_new_nightly_to_brew() {
    curl -sSf https://sh.rustup.rs -o /tmp/rustup.sh &&
    bash /tmp/rustup.sh --disable-sudo --channel=nightly \
       --prefix=$(brew --prefix)/Cellar/rustup-nightly/$(date '+%Y-%m-%d') &&
    brew switch rustup-nightly $(date '+%Y-%m-%d')
}

Stick that in .bashrc and then run rust_new_nightly_to_brew to install a new nightly into Homebrew’s world. It’s nice to live in this world because if you don’t like a nightly for some reason it’s trivial to revert to your last good install with brew switch.

Perl Module XS configuration is hard

I wrote a simple little Perl Module recently, and it reminded me how frustrating it is to get it all working.

I’m not even talking about the .xs preprocessor (xsubpp)—that’s weird, but it’s fairly straightforward. Most contingencies are accounted for and you can make it do whatever you want, and in my case, the results were pretty minimal and beautiful in their own way. No, I’m talking about actually building it and making it compile in a cross platform way.

If you’re used to standard unix C source releases, you know you can just run ./configure and then make and it’ll (probably) just work. Behind the scenes configure is madness, but the principle it works on (feature detection by actually compiling things) is sound. My module is an interface to a small third party library. I can’t count on the library being installed on the machine already, so I want to statically link against it. It includes a configure script for unix machines and some windows source that isn’t covered by ./configure.

In Perl, there are two different ways to build your module: ExtUtils::MakeMaker and Module::Build. For pure Perl modules, I will only use Module::Build, as ExtUtils::MakeMaker seems too hairy. But for an XS module, I’m not sure. ExtUtils::MakeMaker looked fairly configurable so I try that first. It works very well for compiling XS Unix stuff, but because it builds a Makefile and lets you just drop your own rules in, it encouraged me to just call my library’s ./configure and then make in its directory:

use ExtUtils::MakeMaker;

WriteMakefile(
    # ... boilerplate stuff stripped for brevity...

    INC               => '-I./monotonic_clock/include',
    MYEXTLIB          => 'monotonic_clock/.libs/libmonotonic_clock.a'
);

sub MY::postamble {
'
$(MYEXTLIB): monotonic_clock/configure
    cd monotonic_clock && CFLAGS="$(CCFLAGS)" ./configure && $(MAKE) all
monotonic_clock/configure: monotonic_clock/configure.ac
    cd monotonic_clock && ./bootstrap.sh
'
}

This of course worked just fine on my Mac.

But it utterly failed everywhere else. Ugh, there’s hardly any green there! Ok, so I focused on the Unix failures first—I know this, I should be able to get stuff working. Turns out Linux wants -fPIC on the library’s code because I’m statically linking against it and it will end up in my modules shared library “.so” file. Ok, so I can just unilaterally add -fPIC:

sub MY::postamble {
'
$(MYEXTLIB): monotonic_clock/configure Makefile.PL
    cd monotonic_clock && CFLAGS="$(CCFLAGS) -fPIC" ./configure && $(MAKE) all
monotonic_clock/configure: monotonic_clock/configure.ac
    cd monotonic_clock && ./bootstrap.sh
'
}

That works on clang on the Mac and gcc on Linux. I actually tested on my Debian machine. Everything should be great now!

Nope. Half the Linuxes are still failing, some of the Macs too, and I haven’t even addresses Windows yet. I discover that -fPIC happens to be defined by ExtUtils::MakeMaker on the appropriate platforms in the CCCDLFLAGS make variable, and switch to it thinking this should solve the remaining Unix problems.

Then I start thinking about Windows. I boot up my Windows 8 VM where I have Strawberry Perl installed and try out my module. It fails utterly. I forgot—you can’t just call ./configure in windows! Plus my library doesn’t even handle Windows in its configure script. I start thinking about writing my own Makefile rules to build it so I can drop the configure script completely. But I don’t like it. I’m going to need to detect which back-end the library should be using. I basically have to recreate what configure does, but in a Makefile. And I don’t know how much shell I can use in my rules and still work on Windows, meaning the detection is going to be a huge issue.

So I decide to drop ExtUtils::MakeMaker and use Module::Build instead. It’s actually pretty straightforward:

use strict;
use warnings FATAL => 'all';
use Module::Build;

my $builder = Module::Build->new(
    # ... boilerplate stuff stripped for brevity...

    extra_compiler_flags => '-DHAVE_GETTIMEOFDAY', # We're going to assume everyone is at least that modern
    include_dirs => 'monotonic_clock/include',
    c_source     => ['monotonic_clock/src/monotonic_common.c'],
);

# Add the appropriate platform-specific backend.
#
# This isn't as good as the configure script that comes with
# monotonic_clock, since it actually tests for the feature instead of
# assuming that non-darwin unixes support POSIX clock_gettime. On the other
# hand, this handles windows.
push(@{$builder->c_source},
     $^O                 eq 'darwin'  ? 'monotonic_clock/src/monotonic_mach.c' :
     $builder->os_type() eq 'Windows' ? 'monotonic_clock/src/monotonic_win32.c' :
     $builder->os_type() eq 'Unix'    ? 'monotonic_clock/src/monotonic_clock.c' :
                                        'monotonic_clock/src/monotonic_generic.c');

$builder->create_build_script();

I figure out that I can abuse the c_source configuration option. It’s supposed to be a directory where the source code lives and they search it recursively for “*.c” files. But it turns out if I pass a C file to it instead of a directory, the recursive search finds that one C file! So now I can add specific C files for the particular platforms. I now have something that compiles on my Mac, my Debian machine, and my Windows VM. Hooray! That covers everything. Finally!

Sigh. That’s worse than my last Makefile.PL based version! What’s going on? Ok, my c_source hack is biting me. I noticed that it was helpfully adding -I options for the sources directory to the compiler, but since I was passing in actual files to c_source I was getting compiler lines like -Imonotonic_clock/src/monotonic_clock.c. This was a warning on my Debian gcc and on Windows gcc, but my Mac’s clang just ignored it with no message at all. So I blew it off. Well, it turns out other compilers are more strict.

So I start poking around the Module::Build source code. I discover that the c_source is adding to an internal include_dirs list. So I override the compile function to strip .c files out of the include_dirs list.:

# This hacks around the fact that we are using c_source to store files, when Module::Build expects directories.
my $custom = Module::Build->subclass(
    class => 'My::Builder',
    code  => <<'CUSTOM_CODE');
sub compile_c {
  my ($self, $file, %args) = @_;
  # Adding to c_source adds to include_dirs, too. Since we're adding files, remove them.
  @{$self->include_dirs} = grep { !/.c$/ } @{$self->include_dirs};
  $self->SUPER::compile_c($file, %args);
}
CUSTOM_CODE

It seems really hacky (I hate having to write Perl code in a string) and a bit fragile (I sure hope they don’t change the API in a new version) but it actually works! I publish that and wait to see my wall of green.

I don’t get to see it yet. This is baffling to me. The worst part is the test reports themselves. They aren’t showing me the build process, and they don’t try to identify the Linux distro, leaving me to intuit it from the versions of various things. It looks like one of the failing distros is Debian Wheezy (aka the current “stable”). So I as a last ditch effort use “debootstrap” to build a minimal install that I can chroot into. I try compiling and I can actually reproduce the error. This is phenomenal because I don’t have to guess any more.

After poking around for a while I discover that in older glibcs, the clock_gettime function that my library uses requires librt and therefore a -lrt option to the linker. Ok. But I don’t want to add that unilaterally—I got bit by that earlier in this process. I’m also frustrated that my backend detection code is just hardwired to Module::Builds os_type() function. Do I really know that old FreeBSDs support clock_gettime? No, I don’t, and I don’t want to figure it out. I think back wistfully to ./configure—it just detects stuff by compiling it and seeing if it works. That’s really what I want—it’s the only way to know for sure without researching and testing on every. single. platform.

And then it hits me, I guess I could do that. Module::Build uses ExtUtils::CBuilder internally, and so I can too. It turns out to actually not be that bad. The longest part of the code is redirecting stdout and stderr so you don’t see a bunch of compilation errors while it’s figuring things out:

# autoconf style feature tester. Can't believe someone hasn't written this yet...
use ExtUtils::CBuilder;
my $cb = ExtUtils::CBuilder->new(quiet=>1);

sub test_function_lib {
    my ($function, $lib) = @_;
    my $source = 'conf_test.c';
    open my $conf_test, '>', $source or return;
    print $conf_test <<"C_CODE";
int main() {
    int $function();
    return $function();
}
C_CODE
    close $conf_test;

    my $conf_log='conf_test.log';
    my @saved_fhs = eval {
        open(my $oldout, ">&", *STDOUT) or return;
        open(my $olderr, ">&", *STDERR) or return;
        open(STDOUT, '>>', $conf_log) or return;
        open(STDERR, ">>", $conf_log) or return;
        ($oldout, $olderr)
    };

    my $worked = eval {
        my $obj = $cb->compile(source=>$source);
        my @junk = $cb->link_executable(objects => $obj, extra_linker_flags=>$lib);
        unlink $_ for (@junk, $obj, $source, $conf_log);
        return 1;
    };

    if (@saved_fhs) {
        open(STDOUT, ">&", $saved_fhs[0]) or return;
        open(STDERR, ">&", $saved_fhs[1]) or return;
        close($_) for (@saved_fhs);
    }

    $worked
}

Using it is pretty easy:

my $have_gettimeofday = test_function_lib("gettimeofday", "");
my $need_librt = test_function_lib("clock_gettime", "-lrt");

The one annoyance is that the feature detection doesn’t completely work on Windows. The technique I used (stolen unabashedly from autoconf) just declares the function with the wrong arguments and return value and tries to link with it. This works in general because C doesn’t encode the arguments or return values into the symbol name. Except Windows apparently does with its API functions: my feature detector detects gettimeofday() just fine, but not QueryPerformanceCounter(). If I declare QueryPerformanceCounter() properly then Windows links with it, otherwise I get an undefined symbol error. I don’t care enough to properly research Windows’s ABI, so I decided to just check $^O, assuming that all Windows will work. A quick check of the Windows documentation reveals that it’s has been supported since Windows 2000 and that’s good enough for me.

I upload to CPAN and finally see that green I’ve been looking for.

But it makes me wonder, why did I have to write this? It’s 2015, has nobody ever needed to build their Perl XS module differently for different platforms? I know I can’t be the first, but I had a really hard time finding any documentation or discussion about it. I didn’t see anything on CPAN that looked like it might solve my problems. Should this be a feature in Module::Build? I’ve only written two XS modules in my life, but both times I needed different sources on different platforms. Does anyone want to point me to something that does this well, or failing that, build off this idea and make something neat? I would love it, certainly.

Mac OS X codesigning woes

I just discovered this wonderful bug. Apparently “hdiutil makehybrid” is stripping code signatures in some cases.

I first verify the code signature on an App (a build of Emacs, in this case)—there are no errors:

$ codesign --verify _dmg-build/Emacs.app/
$

I then use “hdiutil makehybrid” to create a disk image out of the directory.

$ hdiutil makehybrid -hfs -hfs-volume-name "Emacs Test" -hfs-openfolder _dmg-build _dmg-build -o /tmp/tmp.dmg

I then mount the created image and run try to verify the signature again—but this time it fails!

$ open /tmp/tmp.dmg
$ codesign --verify /Volumes/Emacs\ Test/Emacs.app/
/Volumes/Emacs Test/Emacs.app/: code object is not signed at all
In subcomponent: /Volumes/Emacs Test/Emacs.app/Contents/MacOS/bin-i386-10_5/grep-changelog

Investigating further, I use “xattr” to list the extended attributes on the “grep-changelog” file. First, the good file:

$ xattr _dmg-build/Emacs.app/Contents/MacOS/bin-i386-10_5/grep-changelog
com.apple.cs.CodeDirectory
com.apple.cs.CodeRequirements
com.apple.cs.CodeSignature

And now the bad file:

$ xattr /Volumes/Test\ Emacs/Emacs.app/Contents/MacOS/bin-i386-10_5/grep-changelog
com.apple.FinderInfo

Yup, all the code signature stuff is completely gone! (The “FinderInfo” stuff is OK, it’s just there as a side effect of mounting the disk image).

I’m not exactly sure how to fix this. Apple recently changed code signing requirements so that 10.9.5 now requires deep signatures (way to change something fundamental in a point release, guys). Also the only thing that correctly makes the deep signatures is Xcode 6 which was released only about 1 week before 10.9.5 was released (way to give advanced warning, guys).

2014-10-03 Update:

I filed a bug with Apple and they suggested I use “hdiutil create -srcfolder” instead of “makehybrid“. This does copy the extended attributes correctly. I had originally not used “create” for two reasons: It didn’t have the “-hfs-openfolder” option and the man page claims that only “makehybrid” makes optimally small filesystems. Turns out that “create -srcfolder” automatically does the same thing as “makehybrid -hfs-openfolder” (though it is not documented in the man page) and in practice the resulting .dmgs are just as small or smaller. Problem solved!

Playstation 4 NW-31250-1 Error

All my Playstation 4 downloads were failing today with “DNS Error” and “NW-31250-1”.

I ran a tcpdump on my router and found this:

15:07:10.389761 00:ee:ff:aa:bb:cc (oui Unknown) > 11:22:33:44:55:66 (oui Unknown), ethertype IPv4 (0x0800), length 109: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 95)
    red-death-router.porkrind.org.domain > 10.0.0.113.49218: [bad udp cksum 0x15cb -> 0xfee5!] 21068 ServFail q: A? ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net. 0/0/0 (67)

That looks promising, a DNS SERVFAIL response to a query for “ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net” (whatever that is).

My router is set up to use Google’s DNS: 8.8.8.8. So a quick test from my computer showed:

$ dig @8.8.8.8 ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net

; <<>> DiG 9.8.3-P1 <<>> @8.8.8.8 ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8788
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net. IN A

;; Query time: 58 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Jun  4 15:09:39 2014
;; MSG SIZE  rcvd: 67

“status: SERVFAIL”! But when using Level 3’s venerable 4.2.2.2:

$ dig @4.2.2.2 ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net

; <<>> DiG 9.8.3-P1 <<>> @4.2.2.2 ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14680
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net. IN A

;; ANSWER SECTION:
ic.8e0b5f0d.0c2f0f.gs2.sonycoment.loris.llnwd.net. 300 IN A 208.111.144.58

;; Query time: 26 msec
;; SERVER: 4.2.2.2#53(4.2.2.2)
;; WHEN: Wed Jun  4 15:09:26 2014
;; MSG SIZE  rcvd: 83

Aha, it works! So Google’s DNS is broken!

To fix this I set up my PS4’s networking manually and added 4.2.2.2 as a secondary DNS server. Then all my downloads started working again. I didn’t have to delete them and start over, either—just clicked retry and they continued from wherever they were.

Mac OS X 10.9 (Mavericks) and SSH pkcs8 keys

After upgrading to Mavericks (Mac OS X 10.9) I found that ssh-add wasn’t working. After investigating I discovered that the SSH shipped with Mavericks has a regression and doesn’t support pkcs8 keys. Mac OS X 10.8’s SSH supported these keys just fine.

Earlier in the year I had read an article about using pkcs8 formatted keys to encrypt your SSH private keys more strongly. I went ahead and did this because 10.8 (and my Linux machines) supported it just fine. 10.9, however ships with a different SSH. “ssh -V” outputs:

OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011

The previous version did not have “OSSLShim”, but rather used OpenSSL. My guess is that Apple replaced OpenSSL with some sort of API shim to another (Apple built?) library that doesn’t have support for pkcs8. Weak.

Anyway, the workaround is to use the openssl command line program to decrypt the key like this:

openssl pkcs8 -in ~/.ssh/id_rsa | ssh-add -

I put that in a file called “~/mavericks_sucks” so that I can just do:

. mavericks_sucks

in the terminal after I boot my computer and then everything works after that.

I’ve submitted a bug to Apple’s bug reporter, but it was marked as a duplicate of bug 14776937 but of course I can’t read bug 14776937 or get status on it because Apple’s whole bug reporting system is a piece of crap. Oh well. Hopefully their stupid shim will support all the features of normal OpenSSL (before 10.10).

Why is there no process viewer in Firefox?

As I sit here right now my laptop is uncomfortably hot and its fans are making a really loud high pitched whine. Something on my computer is spinning the CPU and it’s making me literally hot. I look at the processes and FireFox is taking between 40% and 80% of the CPU! I’m basically sitting idle too. Well, right now I’m typing this message in Firefox, but 5 minutes ago the computer was sitting unattended and it was still going nuts.

Ok, so Firefox is wigging. But why? I can’t find any way to do some introspection. I suspect it’s one or two errant pages sitting in a tight javascript or flash loop. But how can I find out which pages? I have 26 windows open and most windows have multiple tabs open each (one has 30 tabs). Each of them is open for a reason–some unfinished business I have at each page, so I can’t just close them randomly. Besides, that would admit defeat.

What I need is a plug-in that monitors all the javascript and plugins that are running and keeps track of how long each runs. Then I want a unix top-like view of all the open pages and how many resources each is consuming (how much memory, how much cpu time their javascript is taking, etc). Why doesn’t this already exist? Sadly I just don’t have time right now to write such a thing, considering how much I’d have to first learn about the internal workings of Mozilla threads (I assume they are all green threads, because something is running so much my typing has a significant delay every few seconds).

Please, please, please (let me get what I want) won’t somebody write something like this? How do you Mozilla developers work? Is there a Javascript profiler available already? I’ve searched the add-ons before but I can’t find anything like this.

All I know is my lap is getting hot and I can’t do anything about it because I need Firefox open right now and it’s beginning to really annoy me.

Selling out: How a Free Software advocate ended up releasing a shareware program

I write computer programs for a living and quite often I end up writing code for myself at home. Occasionally I get the software into a releasable state and stick it up on my web site. When I had a steady salary I would always just GPL everything I released. Mostly since I didn’t want to support it–I felt that if I tried to hoard the source code I’d be on the spot if someone needed it fixed for them, and I’d feel bad if I couldn’t do it. Giving away the source cleared my conscience since I figured, if you don’t like it, change it yourself.

When I quit my job and started consulting I lost the security that comes from a steady paycheck. I’m making more money now, but it always feels temporary and fleeting. So when I needed to write a piece of software to scratch my own itch I started thinking about selling it instead of releasing it as free software. I know, that seems antithetical to the Free Software movement and, well, it is. My dad laughed out loud when I told him I was releasing a shareware app since I’ve been know to rail about proprietary software before. It’s amazing how a little financial insecurity can change your outlook. Don’t worry, I still donate to the FSF and believe in Free Software.

Mostly, I was very curious about how much money I would make form a piece of shareware. My program was a pretty simple idea–I wanted to have multiple libraries in iTunes for different versions of my music. I have my music all ripped to a central server that transcodes on the fly so it can serve out different file formats at different encoding rates to my various clients. When I’m away from home I stream .mp3’s to my iTunes and .oggs to my xmms. At home I stream .wavs to my iTunes and .flacs to my xmms. If I’m somewhere with not a lot of bandwidth I can stream at a lower bitrate too.

So I wanted to set up my iTunes with a library full of .mp3 streams and separate library with the same songs as .wav streams. I looked around at the existing freeware and shareware solutions and found them all lacking. They all were separate applications that you had to run which I thought was too cumbersome. At this point Apple hadn’t added the option key and iTunes boot trick (which I still think is more cumbersome than my final solution).

What I really wanted was something that fit inside of iTunes so I could switch without launching a separate program. Something that would just be smooth. Since I started programming on the Mac back in the System 7 days I was familiar with the low level carbon calls and I figured I could abuse iTunes plug-in interface to do what I wanted. So I hacked on it for a weekend and got it working. MultiTunes was born.

When I was done I thought MultiTunes was smooth enough to warrant releasing it. So I downloaded a copy of the GPL and was prepping a release when I thought, “wait, these other guys sell theirs. Why can’t I sell mine? How much would I make anyway? Could I make a living selling dorky little shareware apps?” I looked around for people doing shareware but didn’t really find anyone that would divulge their numbers and so I couldn’t really get a handle on how much it would really pay. So in the spirit of experimentation, I decided to release it as shareware and see what happened.

Releasing a program is hard. It always takes 10 times longer than I expect to get everything right. I think I spent more days on the details of the release than I spent getting the program working in the first place. I had to make some artwork for the disk image, a script to create the disk image so I can just to “make release” and not have any manual steps–the more manual steps you have to make, the more errors will occur in the process causing more delays and more overall time spent resulting in less payoff per hour of work. More importantly, the harder it is to release, the more I dread it and refuse to work on the program. So it’s important for me to have the release process be really smooth. I want to edit a version number in exactly one place, edit the changelog, tag the repo and “make release”.

Releasing my first shareware program was doubly hard. How do I get someone to pay me? I looked up merchant accounts and credit card services and realized they were way too much work. Maybe if I was doing this full time it would make sense, but no, I don’t want that. I looked at some shareware services but ultimately decided to just use Paypal and some custom server software to give out registration codes on receipt of payment. This actually didn’t take too long to write–maybe a day or two–and I made sure I wrote it generically enough that I could easily add more programs to sell when I inevitably came up with more brilliant ideas. It’s been 2 years since then and I’ve come up with approximately zero. But the point is, the infrastructure for selling wasn’t too hard, provided I was willing to give up a percentage of my profits to Paypal (which I was).

The next step was deciding on a price. Too high and no one is going to buy it. Too low and I’m essentially throwing away money that people would be willing to give me. I looked around at MultiTunes‘ competitors, as well as other shareware iTunes plug-ins. I thought my solution was smoother than my competitors’ so I thought the price could stand to be higher. I didn’t think I could charge as much as Octiv’s VolumeLogic, which before it was defunct cost $19.99. I asked myself what I was willing to pay for MultiTunes had I not written it and I decided I was willing to pay $10. But I know I’m a cheapskate when it comes to shareware and so I added 50% to that. I also decided to use the cheesy and cliche $0.99 perception trick and so the price I chose was $15.99.

I thought it might be too high, but then again, it’s easier to lower the price if it’s too high but it’s annoying to try to raise the price if you feel it’s too low. In the end I think the price I chose was right, but it’s more of a feeling than a fact backed by hard evidence. The only real way to tell would be to offer it to different people at a different prices and see what happened, but that seemed unfair and hard to set up.

So, I have a nifty program, a infrastructure for selling it, and a price. How much money am I going to make? Well, I’ll write a separate post about that soon, but in the meantime I’ll answer it this way: Not enough to live off of. Certainly not as much as the Delicious Library guys, but then again my program isn’t nearly as cool, sophisticated or good looking. Then again, I wrote it by myself in a weekend or two! And even factoring the time I’ve spent improving it and the email support I provide (and a random telephone support call that I got while visiting my parents for Christmas from a guy who looked up my phone number (!!!)) I think the profit per hour ratio is not too bad. Sadly I didn’t really keep track of the time I worked on it so it’s hard to calculate that number exactly.

Anyway, that’s all for now. I’ll post something soon with specific numbers–profits, piracy rates, etc. The kind of stuff I wished I’d had an idea about when I first started.

Polling is always wrong

I read this article on reddit and while I thought the author was correct that RSS is really a bad design, I think he missed the real underlying reason why. That reason is that polling is always wrong. Always. Really!

RSS ends up fundamentally being a polling technology. Everyone hits your site saying, “hey, you got something new?”

“How about now?”

“Now?”

“Still nothing, huh”

“Ok, Now?”

Over and over it asks, never taking a break. What is the appropriate speed at which it asks? 1 hour? 1 second? Who knows.

It’s not just internet protocols that poll. Polling as a technique is pervasive, hitting every level from the highest to the lowest. Does your OS go out and hit your CD-ROM drive every so often looking for a disc to finally be inserted? Why isn’t there a SCSI command that makes the drive send a command to the computer when a disc is inserted? If there is, why isn’t your OS using it?

What about something simple like the volume slider in your system tray? My friend Jim was using Intel’s PowerTool and discovered the Gnome volume control kept taking the processor out of C3 sleep. Turns out the volume control was waking up constantly and polling ALSA to see if another program had changed the volume behind its back. ALSA even has an API for being notified of the volume change!

Even at the low level of embedded programming there are the same choices. Does your mouse’s microcontroller poll a GPIO to see if you’ve pressed the mouse button, or does the button trigger an interrupt?

Why do people poll?

There’s only one reason. It’s easy. It’s trivial to wrap your head around—You just loop forever, checking for whatever you’re polling for and then sleeping for a little bit. But to do it the right way is more complicated—you need to have to have some infrastructure. You need a way to register for the event you want, which implies you also need some way of getting called back when your event happens. In some cases (like the mouse button), it’s almost as easy to do it the right way as it is to poll. But others, like the RSS case, it’s much harder.

To do RSS properly you’d need some way of getting events pushed to you. One way would be with email, as that infrastructure already exists. But given the amount of disparate email servers and RSS readers, it might be hard to hook your RSS reader to your email. Or you could simulate a callback by having the server hold the connection open until something happens. But given that the connection will eventually time out, that’s really just a degenerate way of polling.

Ah, forget it. Let’s just poll!

Last Modified on: Dec 31, 2014 18:59pm