Bizarre Error while compiling a Rust program

I was trying to compile a third party Rust crate today on a Debian box and I was getting a weird error I’d never seen:

  = note: cc: fatal error: cannot read spec file './specs': Is a directory
          compilation terminated.

I’ve done a bunch of embedded C stuff in the past so I knew what a specs file was—but why couldn’t gcc open its specs file? The first clue was that this particular crate had a directory at its top level called “specs”. I moved it away and the error went away. Ok. But why is gcc looking for a specs file in my cwd??

Rust printed out the linker command it was using to invoke gcc with so I could look through that but there was nothing referencing a spec file there.

I figured it might be something set up incorrectly on my local machine but there’s practically no documentation about where gcc looks for specs files. I ended up downloading the gcc source (thank heavens for apt source!) and grepping it for “cannot read spec file”, which lead me to the load_specs() function in gcc/gcc.c. Ok. So I poked around that file, following the thread of function calls around and just sort of looking for anything. I eventually noticed startfile_prefixes which appeared to be where they searched for specs. Searching around the file led me to a section where it looped over LIBRARY_PATH and added each component to startfile_prefixes. This stuck out to be because I had LIBRARY_PATH set in my env.

I looked at it more closely and noticed it was


Hmm. Why does it have a stray : on the end? Maybe that’s it. I tested it with:

LIBRARY_PATH=/usr/local/lib cargo build

And everything built just fine! Yay!

So it was my fault after all… Sigh. I checked my .bashrc and found a bug in one of my path construction functions which was putting the extra : on the end. Then I spent some time relearning a bunch of excruciating shell details and getting everything fixed. 🙂


I can’t remember who wrote this first, but my friend and I have been using some version of this shell snippet for years:

ssh_find() {
    for agent in "$SSH_AUTH_SOCK" /tmp/ssh-*/agent.* /tmp/*/Listeners* /tmp/keyring-*/ssh; do
        SSH_AUTH_SOCK="$agent" ssh-add -l 2>/dev/null
        case "$?" in # 0 some keys, 1 no keys, 2 no agent
            0) export SSH_AUTH_SOCK="$agent"; echo "Found Keys." ; break ;;
            1) export SSH_AUTH_SOCK="$agent"; echo "Found Agent.";  ;; # Keep looking

This function looks through /tmp for ssh-agent sockets (using various macOS, Linux, openssh naming schemes) and then finds the first one that (a) it has permissions for and (b) have keys unlocked in them (giving precedence to any currently active ones). If it can find some valid agents but none of them have keys then it will use the last one it found.

Later in .bashrc I have a section for interactive shells and I call it from there:

# Interactive only stuff:
if [[ "$-" =~ i ]]; then

    # ssh related stuff

    if [ x$SSH_AUTH_SOCK == x ]; then
        exec ssh-agent bash


That looks for an agent and starts one up if there were none running1 2.

How is this actually useful?

Basically it makes it so ever tab I open in my terminal (or in a tmux) latches on to an existing ssh-agent so that ssh just works from anywhere without ever thinking about it.

It’s also useful for cron jobs. I have some cron jobs that look like this:

@hourly . $HOME/.bashrc && ssh_find > /dev/null && rsync -a something:/something/blah /blah/blah

I also have similar ones that do some git operation that needs an ssh key.

It is generally convenient for things that need ssh keys to run, but where you don’t necessarily want the keys sitting around on the disk in unencrypted form. This lets them sit unencrypted in RAM, which is a little better (but at the expense of needing to ssh-add them manually every reboot).

  1. This is kinda gross, since exec throws out the entire bash progress and starts things over but with a new running ssh-agent (more or less doubling the startup time). This only happens on interactive logins so it’s not as bad as you might expect. I do it this way instead of eval $(ssh-agent) so that if the agent was started by ssh-ing into the machine then it gets killed when the ssh process dies (ie, on logout).
  2. This is also racy—picture the case where a gui terminal program starts up and remembers that you had 7 tabs open, so it opens those 7 tabs and starts shells in them in parallel. On my Debian box I have a super complicated and impossible to read fix for this that involves the flock(2) command. I hate it and am embarrassed by the whole thing so fixing the race is left as an exercise to the reader. 🙂

Automator Is Pretty Sweet

I played around with Apple’s Automator app when it was first released, but I could never really find a good use for it (I wrote some actions but I would never reliably use them for whatever reason). But I did eventual write one that I use constantly.

It’s a “Quick Action”, which means it shows up in Finder when you right click:

It extremely simple on the Automator side:

Note: My real script replaces <my-server> with the actual server name.

The key things are to set “Workflow receives current” to “files or folders”, “in” to “” and in the “Run Shell Script” section, “Pass input” to “as arguments”. I set the Image to “Send” because it seemed reasonable.

The script it self is trivial:

. ~/.bashrc
for f in "$@"
    echo "$f"
    rsync -a "$f" <my-server>:/tmp

The trickiest part here is getting SSH working without asking for your password (since this script effectively runs headless). I set up my ~/.bashrc to search through ssh-agent socket files to find one that it has permission to read and that has keys. It then sets up environment variables so that ssh uses that agent, giving me password-less ssh nearly 100% of the time.

I actually have several variants of this Automator action that rsync to different places (some of which are watched by inotify and kick other things off when files appear there). It’s really handy for them to just show up in the Finder right-click menu like that, and I find myself constantly using them to move stuff I’ve downloaded through my local web browser over to my server. I find it way more reliable than trying to keep the remote server mounted through smb or nfs.

Fixing Element Search

My MacOS Element Matrix client (Electron based) started getting errors when I searched. My Windows Matrix client worked just fine, so I was pretty sure it wasn’t a server issue. When I typed anything into the message search I’d get this error right underneath:

“Message search initialisation failed, check your settings for more information”

And when I clicked the “settings” link I’d see this:

Error opening the database: DatabaseUnlockError(“Invalid passphrase”)

Clicking on the “Reset” button brought a scary warning dialog (“you probably don’t need to do this”—yeah, I kinda think I do) but clicking through it didn’t actually seem to do anything.

Searching for those terms got me to a github issue. Nothing seemed resolved but I found a message in the middle that helped me fix the problem.

Here’s what I did:

  1. Close the app
  2. cd ~/Library/Application Support/Riot. I had to poke around till I found that. There was also a ~/Library/Application Support/Element but it seemed older and out of date, oddly.
  3. tar cjf EventStore-bad-2023-06-18.tar.bzr EventStore. This made a backup copy of the EventStore directory just in case I needed it (I didn’t).
  4. rm -rf EventStore. Kill it with fire.
  5. Restart the app

Viola! Working search again!

Shell printf tricks

Do you know the printf trick in shell?

Shell printf repeats its format pattern if you give it more arguments than the pattern has.
You can use this for all kinds of tricks:

$ printf "/some/path/%s " file1 file2 file3 file4
/some/path/file1 /some/path/file2 /some/path/file3 /some/path/file4 
$ printf "%s=%s\n" a 1 b 2 c 3 d 4

It’s is really nice when you combine it with some other output:

$ printf -- "-p %s " $(ps aux | grep fcgiwr[a]p | f 2)
-p 10613 -p 10615 -p 10616 -p 10617 -p 10619 -p 10620 -p 10621 -p 10622 -p 10623

Say I wanted to pass all the pids of a program that forks a lot to strace. I could do:

strace $(printf -- "-p %s " $(ps aux | grep fcgiwr[a]p | f 2))

An aside, you may have noticed that non-standard f command in the ps pipeline. I got that from here a long time ago (2008 according to the file’s timestamp) and it’s really fantastic—perfect for all sorts of unixy things that output tables (ps, ls -l, etc).

I’m so dumb

I was banging my head against a wall trying to get help for go build with go build help (and for some reason go doesn’t support go build --help, but I kept getting this error:

package help is not in GOROOT (/opt/homebrew/Cellar/go/1.18.3/libexec/src/help)

Does brew not come with docs by default? How else am I supposed to interpret that?

No, I’m just dumb. It’s not go build help, it’s

go help build

Stuck on the final Maquette level

Last night I was playing Maquette and I got stuck on the final level (called “The Exchange”) for a really long time. I was at a place with some towers with crystals above them and I had opened the doors to the first tower; Inside was a switch. Flipping the switch made a bunch of arches appear, leading to another building.

But aside from that nothing happened. I wandered around for about an hour before getting ashamed that I couldn’t solve the puzzle and looking up the answer on the internet. But it turns out, I had encountered a bug! Flipping the switch is supposed to open a door! But instead I just got the switch sound and nothing else happened. I tried loading my last autosave but the same thing happened.

Finally I tried selecting the “Restart” option from the pause menu. I was nervous (restart what exactly? The Level, the Area, the Game?) so I hard saved my game first. Turns out it just reset back to the beginning of “The Exchange”. Within 1 minute I was back at the switch in question and this time I could hear (and see) it open a door when I hit it!

So, if you’re stuck in that section of the game and it seems like you’re just not getting it, it’s not you, the game is just bugged.

Failed Emacs builds, hanging kernels, abort(), oh my

My nightly Emacs builds stopped about a month and a half ago. A couple days after I noticed it was failing I tried to debug the issue and found that building openssl was hanging—I found that Jenkins was timing out after an hour or so. I should mention that it’s dying on a Mac OS X 10.10 (Yosemite) VM, which is currently the oldest macOS version I build Emacs for. I tried building manually in a terminal—the next day it was still sitting there, not finished. I decided it was going to be annoying and so I avoided looking deeper into it for another month and a half (sorry!). Today I tracked it down and “fixed” it—here is my tale…

I (ab)use homebrew to build openssl. Brew configures openssl then runs make and then make test. make test was hanging. Looking at the process list, I could see 01-test_abort.t was the hanging test. It was also literally the first test. Weird. I checked out the code:

#include <openssl/crypto.h>

int main(int argc, char **argv)
    OPENSSL_die("Voluntary abort", __FILE__, __LINE__);
    return 0;

Well, that seems straightforward enough. Why would it hang? I tryed to kill off the test process to see if it would continue. There was a lib wrapper, a test harness and the actual binary from the source shown above—they all died nicely except for the actual aborttest executable. I couldn’t even kill -9 that one—that usually means there’s some sort of kernel issue going on—everything should be kill -9able.

Next I ran it by hand (./util/ test/aborttest) and confirmed that the test just hung and couldn’t be killed. I built it on a different machine and it worked just fine there. So I dug into the openssl code. What does OPENSSL_die() do, anyway?

Not much:

// Win32 #ifdefs removed for readability:
void OPENSSL_die(const char *message, const char *file, int line)
    OPENSSL_showfatal("%s:%d: OpenSSL internal error: %s\n",
                      file, line, message);

Ok, that’s nothing. What about OPENSSL_showfatal()? Also not much:

    va_list ap;

    va_start(ap, fmta);
    vfprintf(stderr, fmta, ap);

That’s just a print, nothing exciting. Hmmm. So I wrote a test program:

#include <stdlib.h>

int main() {

I compiled it up and… it hung, too! What?? Ok. I tried it as root (hung). Tried it with dtruss:

...lots of dtruss nonsense snipped...
37772/0xcaf1:  sigprocmask(0x3, 0x7FFF5DD71C74, 0x0)         = 0x0 0
37772/0xcaf1:  __pthread_sigmask(0x3, 0x7FFF5DD71C80, 0x0)       = 0 0
37772/0xcaf1:  __pthread_kill(0x603, 0x6, 0x0)       = 0 0

So it got to the kernel with pthread_kill() and hung after that. So I tried another sanity check: In one terminal I ran sleep 100. In another I found the process id and did kill -ABRT $pid. The kill returned, but the sleep was now hung and not able to be killed by kill -9, like everything else. Now I was very confused. This can’t be a real bug, everyone would be seeing this! Maybe it’s a VM emulation issue caused by my version of VMWare? I can’t upgrade my VMWare because the next version after mine requires Mac OS 10.14 but this Mac Mini of mine only supports 10.13. Sigh. Also, the Emacs builds were working just fine and then they suddenly stopped and I hadn’t updated the OS or the host OS or VMWare. Nothing was adding up!

As I sanity check, I decided to reinstall the OS on the VM (right over top of the existing one, nothing clean or anything). There was a two hour long sidetrack here with deleting VM snapshots, resizing the VM disk (which required booting into recovery mode), downloading the OS installer and finally letting the install run. But that’s not important. The important part is that I opened up terminal immediately after the OS installed and ran my abort() test:

$ ./test
Abort trap: 6

It worked! How about OpenSSL?

$ ./util/ test/aborttest
test/aborttest.c:14: OpenSSL internal error: Voluntary abort
Abort trap: 6

Yay! But why?? I don’t actually know. Was it a corrupt kernel? A bad driver that got installed? (What drivers would get installed on this Jenkins builder?) I don’t feel very satisfied here. I’m quite skeptical, in fact! But it’s working. Emacs builds should start coming out again. And I can ignore everything again until the next fire starts! 🙂

Fixing the blower motor in my central air system

On Wednesday as I was going to bed I noticed it was quite hot in my house. I checked my central air blower unit and there was frost on the coils and the blower wasn’t moving. It kept trying to start but not being able to start. The 7 segment led was showing “b5” which I was able to find in the manual:

So the next day I removed the blower assembly and tried to extract the motor. The motor shaft extended quite far past the end of the fan and it was rusty so it didn’t want to come off. I attempted to get it off using gravity and a hammer:

I failed. I was only able to get it this far:

So I took it to my dad’s house because he has a better tools than I do. We ended up drilling the end of the shaft out to get the motor removed. We tested on the bench and read a lot of troubleshooting manuals and determined that the motor was shorted. It was hard to turn and there were very obvious magnetic “detents” we could feel when turning it by hard. We took the motor apart and looked around:

We measured the resistance across all the pairs of pins on the 3 pin motor connector: 2 pairs measured 3 ohms and one pair was at 0.9 ohms. We kept the meter plugged into the shorted pair and moved a bunch of wires around. At some point we noticed it had changed up to 3 ohms but we weren’t sure which part we had messed with to make it that way. All attempts to short it out again to identify the bad wire area failed. From this point on it was never shorted.

We put it all back together, fixed the drilled out shaft by cutting it off with a hacksaw and then sanding off all the sharp edges and rust. I took it home installed it and it spun up. It ran for about two minutes and then died. Same symptoms. I kind of expected it since we really didn’t fix anything, just shuffled things around, but it was still disappointing.

The next day I ordered a used motor off ebay and suffered through a hot July Friday.

Saturday I decided to have one last ditch effort to fix it since the weekend was supposed to be upwards of 90ºF. I got it apart and found this:

That looks obviously bad and you’d wonder how we could miss it, but that’s only because it’s blown up so big. That wire is one of the skinny winding wires around the motor. In fact I couldn’t actually see the issue at all until I used my phone camera as a magnifying glass. I pulled out some slack on the wire around that break to see if there were any more issues. it looked like this:

To me it looked clearly exposed. I had the meter plugged in this whole time and I could tell as I freed this bad area the short cleared. To fix, I wrapped it in electrical tape and then put it back down where it was:

Even pressing it down as much as I could I couldn’t get it to show a short on the meter, so I believed I had it fixed. I put it all back together:

I mounted it up and tested it. It worked! And more importantly, didn’t die after just a couple minutes. Eveything’s been running all day now and my house is finally back down into the sub 80ºF range. I didn’t have central air until a couple years ago, and it’s amazing how fast I’ve transitioned to thinking that 85ºF is unlivably hot.

Now I just have to decide what to do with the replacement I bought off ebay. I just know that if I cancel the order then my motor will die just a couple days later. But if I don’t cancel, then my fix will work for the next 50 years…

Screw Shareholder Value

I was digging around and I found the original SSV announcement posted to Dominion (the (in)famous Sisters Of Mercy Mailing list).

What is SSV? Just read the letter—it explains everything. More info can be found here, here, or here, or here.

Here is the email, reproduced with headers for posterity:

Received: from by (NTMail Server 2.11.25 - id David Mon, 27 Oct 97 10:50:32 +0000 (PST)
Received: by
                  id m0xPtmn-00009TC
                  (/\oo/\ Smail3.1.29.1 #29.7); Mon, 27 Oct 97 18:21 GMT
Sender: <>
Received: from ([]) by
                   with smtp id m0xPtmf-00009PC
                   (/\oo/\ Smail3.1.29.1 #29.7); Mon, 27 Oct 97 18:21 GMT
Received: (from root@localhost)
                    by (8.7.6/8.7.3/AOL-2.0.0)
                          id NAA28825 for;
                            Mon, 27 Oct 1997 13:20:53 -0500 (EST)
Date: Mon, 27 Oct 1997 13:20:53 -0500 (EST)
Message-ID: <>
Subject: SSV - Go Figure
Comments: Sisters Of Mercy mailing list
X-Info: indigita mail server
Mime-Version: 1.0

SSV  "Go Figure"

After years of stalemate, Mr Andrew Eldritch has managed to get East West to
accept (in lieu of two remaining Sisters albums) a record bearing no
resemblance whatsoever to The Sisters Of Mercy. This album will be released -
if at all - under a completely different name, which is just as well, as it's
got practically nothing to do with the Sisters. Furthermore, East West agreed
not to hear the album in advance....  so it bears no resemblance to *any*
quality product, let alone the Sisters. return for which, Mr Eldritch agrees to let East West keep 75000
pounds which they owe him, and which they were refusing to pay in any event.
Unsurprisingly in the circumstances, Mr Eldritch made sure that East West got
the record they deserve, but made sure that they then paid a lot more for it
than he deserves.... ....especially for one afternoon recording the
occasional mumble on the reject material of some amateur acquaintances from
Hamburg. Go figure.

It is rumoured, indeed, that the whole album only took two days from start to
finish (somewhat less than the usual nine months), and that the "rather bad
sub-techno" music was under-average and boring even before the drums were
mysteriously removed. The "lyrics" dwell almost exclusively on the
glorification of shooting people and selling drugs to schoolchildren. It is
rumoured that the full name of the band (SSV-NSMABAAOTWMODAACOTIATW) stands
for "Screw shareholder value - not so much a band as another opportunity to
waste money on drugs and ammunition courtesy of the idiots at Time Warner".
Go figure.

How desperate must the corporation be? Desperate enough to try and force an
artist to record with threats of massive litigation after a seven-year
impasse, expecting him to make a good record while witholding a fortune in
back-royalties, and then desperate enough to pay "a very large amount" for a
record which the artist neither wrote, nor composed, nor arranged, nor
produced, whose content is merely a puerile attempt to be as offensive as
possible? Go figure.

Finally, rumour has it that the record company are planning to release it,
which would, you might think, be a conscious insult to the general public if
East West were smart enough to know a pile of crap when they hear it. Either
way, go figure.

Anyway, Mr Eldritch is now free to resume his recording career with The
Sisters Of Mercy, who have been waiting very patiently for him at a chemist's
round the corner, and very sensibly having nothing to do with the SSV album
 - because they didn't have to.

The Sisters will be celebrating the liberation with a small tour of Europe
and America in January/February, and the release of a stonking new (Sisters)
single on the day after Mr Eldritch's contract officially expires, which will
be a couple of months later. Work has started on the next (Sisters) album.
Normal service has been resumed.

Oh, and in case you haven't got the picture, we do NOT recommend the purchase
of an SSV album, should East West actually try to sell one to you. We
recommend that you wait for the magic of the Net to mysteriously provide you
with a free one  - just this once.


DKIM and Exim4 on Debian

I wanted to get DKIM working on an Debian box I have that runs Exim. The first thing to do is to create the keys:

$ openssl genrsa -out diamonds.key 4096
$ openssl rsa -in diamonds.key -pubout >

I was following these instructions and noticed that Exim supports ed25519 DKIM signatures. Neat! I decided I may as well create those keys, too:

$ openssl genpkey -algorithm ed25519 -out hearts.key
$ openssl pkey -outform DER -pubout -in hearts.key | tail -c +13 | base64 >

From there I stuck the public keys in DNS: 3600 IN TXT  "v=DKIM1; k=rsa; t=s; p=MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAxDMS3KRFCU4PEtygOUdALBt7jmz5IIX2+KHoV4fd0CLjXRvOqA5H8rU3e+y1lNese9yjPLksPqiOh5vtx8Tysjv2MSTXB1Kgr0tl+1IlJL4ihdpUgR1veKB5X4wK3Ppkr5Oy42H7xNHf/yj6aC1E+alZ8TdssuHY3ReqO6YvGa72UqTMmL1gBl9SXBUl" "vD+FqvfFtkQFFMU9QSTtrIuzcup6NC6z3a4I4UGz4YOZSxeUARKzySGFzPd7vwmrKEZVhlA0HzmJm9eGrjq6IiLVdgTJhSZ8Ecn9h65x9EjhNYYhsufTbcPDljlZYpA4b+dkTEs35a4KjOM2wY7gUdY9ydOqOCfz2BpzJ25Mn3K8nTV8a7fInWCnKg0sm6Fuiwe0DrQjrTe7xGC3Y03CU8eziynOukyWnfsCAnpWcUGa15bp1/O0Le+ZYsKOWxA" "CL5cKlYPw1VJrqz7ZQ1i+s+twOLgEKWm8gwKMsDysgpM1WvE+IhlJkkZLkWavF9pAKeSD6akkHcbkB3QsDKgNugDC4EEm6XV/+hPcTS9Gmd2PYPswxg8nlEdUDjxLul6UbKzWwkYihzKxhMSqCEXTUkt6eHjT+KAIHXVm86elFEmOcuadUWwr+74fgnTpv1XbWIs5qqqh/zROhvUUR8EXZbjOchFEX3YjLO8NDPqHdW4zHt0CAwEAAQ==" 3600 IN TXT    "v=DKIM1; t=s; k=ed25519; p=MTGVeSXmIzviF/B+ANc/bLqP2zEWhO/rw6o8HxIl5+8="

Ed25519 is quite compact! t=s:y is found in the DKIM RFC (section 3.6.1). t is for various flags. s is for strict (I’m just guessing the mnemonic)—it means all the domain names have to match. Apparently you don’t want this if you use subdomains in your email addresses (I don’t). y means “This domain is testing DKIM”—ie, don’t worry if it fails. It seemed reasonable to set that while I was playing around.

Next, I had to set up Exim in Debian. This was kind of a pain because there’s the Exim config, then the Debian wrapper around that config. This is made more complicated by the fact that Debian has a debconf option named dc_use_split_config. You can see which way yours is set in /etc/exim4/update-exim4.conf.conf (the double .conf is not a typo!). If it’s false then when you update /etc/exim4/conf.d you first have run /usr/sbin/update-exim4.conf.template which cats everything in the conf.d dir into /etc/exim4/exim4.conf.template. Then you have to run /usr/sbin/update-exim4.conf which combines /etc/exim4/exim4.conf.localmacros and /etc/exim4/exim4.conf.template and puts the resulting final config file in /var/lib/exim4/config.autogenerated. Fwew.

The basic DKIM config is in /etc/exim4/exim4.conf.localmacros. I added these lines:

DKIM_CANON = relaxed
DKIM_SELECTOR = diamonds : hearts
DKIM_PRIVATE_KEY = /etc/exim4/dkim/$dkim_selector.key

For my setup this wasn’t enough. The DKIM_* macros are only used by the “remote_smtp” transport (found in /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp). I was using a “satellite” configuration with a smarthost. This means it uses the “remote_smtp_smarthost” transport (found in /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost). You can tell what transport is being used by looking for T= in /var/log/exim4/mainlog.

I copied all the DKIM related stuff from /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp into /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost, namely these lines:

# 2019-08-26: David added these:
dkim_domain = DKIM_DOMAIN
dkim_selector = DKIM_SELECTOR
dkim_private_key = DKIM_PRIVATE_KEY
dkim_canon = DKIM_CANON
dkim_strict = DKIM_STRICT
dkim_sign_headers = DKIM_SIGN_HEADERS

Then I ran update-exim4.conf.template and update-exim4.conf and finally systemctl restart exim4.

At this point I could send emails through and the DKIM headers were added.

Next I removed the y flag from the t flags in the DNS since everything appeared correct. I also added the ADSP DNS record: 3600 IN  TXT     "dkim=all"

Then I wrote this post and called it a night!

AT&T causes mDNS on Linux To Fail

Today my Jenkins builds were not working because all of the build slaves were offline. Digging around in the logs showed that the couldn’t connect because of name resolution failures. I use mDNS on my network (the slaves are Mac OS X VMs running on a Mac Mini), and so they were named something like xxxx-1.local and xxxx-2.local I tried just pinging the machines and that refused to resolve the name, too.

I verified that Avahi was running, and then used avahi-resolve --name xxxx-1.local to check the mDNS name resolution. It worked just great.

So why would mDNS be working fine network-wise, but no programs were resolving correctly? It struck me that I didn’t know (or couldn’t remember!) how mDNS tied in to the system. Who hooks in to the name resolution and knows to check for *.local using mdns?

It turns out it’s good old /etc/nsswitch.conf (I should have remembered that)! There’s a line in there:

hosts:          files mdns4_minimal [NOTFOUND=return] dns

That tells the libc resolver (that everything uses) that when it’s looking for a hostname, it should first look in the /etc/hosts file, then it should check mDNS, then if mDNS didn’t handle it, check regular DNS. Wait, so mDNS is built right in to libc??

Nope! On my Debian system there’s a package called libnss-mdns that has a few files in it:


Those are plugins to the libc name resolver so that random stuff like mDNS doesn’t have to be compiled into libc all the time. In fact, there’s a whole bunch of other libnss plugins in Debian that I don’t even have installed.

So my guess was that this nss-mdns plugin was causing the problem. There are no man pages in the package, but there are a couple README files. I poked around trying random things and reading and re-reading the READMEs many times before this snippet finally caught my eye:

If, during a request, the system-configured unicast DNS (specified in /etc/resolv.conf) reports an SOA record for the top-level local name, the request is rejected. Example: host -t SOA local returns something other than Host local not found: 3(NXDOMAIN)This is the unicast SOA heuristic.

Ok. I doubted that was happening but I decided to try their test anyway:

$ host -t SOA local
local. has SOA record 1 604800 3600 2419200 900


Those bastards at AT&T set their DNS server up to hijack unknown domains! They happily give out an SOA for the non-existant .local TLD. So AT&T’s crappy DNS is killing my Jenkins jobs??? Grrrrr…

The worst part is that I tried to use Cloudflare’s DNS. My router was configured for it. But two things happened: 1. I got a new modem after having connection issues recently, 2. I enabled IPv6 on my router for fun. The new modem seems to have killed I can no longer connect to it at all. Enabling IPv6 gave me an AT&T DNS server through DHCP (or whatever the IPv6 equivalent is).

So, straightening out my DNS (I had to revert back to Google’s caused NXDOMAIN responses to the .local SOA, and that caused mDNS resolution to immediately work, and my Jenkins slaves came back online. Fwew.

Firefox imgur failure

My imgur had been acting up for a while now. In my Firefox it wouldn’t render the actual image but rendered most of the rest of the page just fine:

If I went to (adding the .png or .jpg to the url) then the image loaded just fine. So it wasn’t getting blocked by anything in my network stack (ad-blocker and the like). I opened up the console to take a look and I saw this error:

10:57:53.646 SecurityError: The operation is insecure.                             analytics.js:34:17
             consoleDebug    jafo.js:838:8
             directFire      jafo.js:272:12
             _sessionStart   jafo.js:563:16
             init/<          jafo.js:134:12
             b               lodash.min.js:10:215
             d               raven-3.7.0.js:379:23

The line in question was doing localstorage.Get("something-or-other"). So Firefox was mysteriously throwing a SecurityError when accessing local storage. After testing a million different things (disabling ad-blockers, privacy extensions, etc.) and reading countless Stack Overflow and forum posts, I discovered that this was because I had "" listed in my cookie exceptions as "Block".

This was actually surprisingly hard to find because the Firefox preference UI for cookie exceptions is pretty bad—it doesn't sort nicely and it doesn't have a search. It's very confusing because the "Manage Cookies and Site Data" screen looks almost identical but sorts correctly (all domains and subdomains grouped together) and has a handy search bar. If you are used to that screen and then try to use the cookie exceptions screen I can almost guarantee you'll type the cookie into the text box at the top and then scratch your head for a few seconds trying to figure out why the list isn't reacting to your search, only to realize that the text box is for adding exceptions and not for searching! Sigh. At least you can click the "Website" column header to get a plain radix sort instead of the default sort of "Status" (which apparently doesn't do a secondary sort on the Website so everything is just random!). When sorting by "Website" take care to thoroughly search—"http" and "https" versions of the site are separate (this is what cost me the most time).

Once I removed the block, imgur started working again:

Yay. Kinda sucks that imgur's code isn't defensive enough to deal with localstorage exceptions, which I thought were widely known to happen (I know on sites I run I see them happening for crazy errors like "Disk Full" and "IO Error").

Debian OpenSSL 1.1.0f-4 and macOS 10.11 (El Capitan)

Some people were reporting that an IMAP server wasn’t working on their Mac. It was working from linux machines, and from Thunderbird on all OSes. From Macs I was getting this testing from the command line:

$ openssl s_client -connect <my-imap-server>:993
39458:error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version:/BuildRoot/Library/Caches/

This led me to a recent libssl package upgrade on my server (to version 1.1.0f-4). Checking the changelogs I found this:

  * Disable TLS 1.0 and 1.1, leaving 1.2 as the only supported SSL/TLS
    version. This will likely break things, but the hope is that by
    the release of Buster everything will speak at least TLS 1.2. This will be
    reconsidered before the Buster release.

Ah-ha! To quickly get back up and running I grabbed the old version from and installed it (and then held the package so it wouldn’t auto-upgrade).

I do hope Debian reconsiders this change, at least in the short term, since I can’t easily force OS upgrades to everyone that uses this server. Ideally Apple would update their old OSes to support TLS 1.2, but I’m not holding my breath.

Fedora libvirt/qemu error on upgrade

Today we upgraded a server that ran a bunch of VMs to Fedora 25 and all the VMs failed to come back online after rebooting.

Checking the logs I found this:

2017-04-14T08:39:35.304547Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x10000 in != 0x20000: Invalid argument
2017-04-14T08:39:35.304579Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
2017-04-14T08:39:35.304759Z qemu-system-x86_64: load of migration failed: Invalid argument
2017-04-14 08:39:35.305+0000: shutting down

After searching fruitlessly for all of those error messages in Google, I finally discovered the problem. The upgrade had shut libvirt down, which saved the state of all the machines into /var/lib/libvirt/qemu/save/. Then it started it back up and the save files were no longer compatible with the new qemu/libvirt binaries.

The solution was to just delete the save files in /var/lib/libvirt/qemu/save/ (which is a little sad, because it’s like yanking the power cord out of a running box). The VMs all started fine after that.

Horizon Zero Dawn Xi Cauldron Is Confusing

After you override the core in the Xi Cauldron (which unlike the others I’ve played through so far, happens pretty early), Aloy says “now I can override more machines”. But annoyingly, you don’t actually get the override until you finish the quest, which happens when you exit the cauldron. Along the way back you’ll encounter a few of the machines that require the Xi override and I found it frustrating when it wasn’t working. I was even thinking maybe it was a bug and I didn’t trigger something correctly, especially because Aloy mutters to herself, “I can’t override this machine yet—I’ll have to explore more cauldrons”. The game acts like you have the override, but you don’t get it until later. I googled but didn’t find anything, so I wrote this. If this happens to you, don’t be confused.

Rust nightly + Homebrew

I used to use this Homebrew “Tap” to install Rust nightly versions, but it stopped working at some point. I messed around with it a lot and determined it had something to do with newer Rusts not liking the install_name_tool calls that homebrew forces on libraries during installation.

So I looked into rustup (the nice shell version, not the crazy rust binary version which sadly seems to be on track to replace it) and came up with the following shell function.

rust_new_nightly_to_brew() {
    curl -sSf -o /tmp/ &&
    bash /tmp/ --disable-sudo --channel=nightly \
       --prefix=$(brew --prefix)/Cellar/rustup-nightly/$(date '+%Y-%m-%d') &&
    brew switch rustup-nightly $(date '+%Y-%m-%d')

Stick that in .bashrc and then run rust_new_nightly_to_brew to install a new nightly into Homebrew’s world. It’s nice to live in this world because if you don’t like a nightly for some reason it’s trivial to revert to your last good install with brew switch.

Perl Module XS configuration is hard

I wrote a simple little Perl Module recently, and it reminded me how frustrating it is to get it all working.

I’m not even talking about the .xs preprocessor (xsubpp)—that’s weird, but it’s fairly straightforward. Most contingencies are accounted for and you can make it do whatever you want, and in my case, the results were pretty minimal and beautiful in their own way. No, I’m talking about actually building it and making it compile in a cross platform way.

If you’re used to standard unix C source releases, you know you can just run ./configure and then make and it’ll (probably) just work. Behind the scenes configure is madness, but the principle it works on (feature detection by actually compiling things) is sound. My module is an interface to a small third party library. I can’t count on the library being installed on the machine already, so I want to statically link against it. It includes a configure script for unix machines and some windows source that isn’t covered by ./configure.

In Perl, there are two different ways to build your module: ExtUtils::MakeMaker and Module::Build. For pure Perl modules, I will only use Module::Build, as ExtUtils::MakeMaker seems too hairy. But for an XS module, I’m not sure. ExtUtils::MakeMaker looked fairly configurable so I try that first. It works very well for compiling XS Unix stuff, but because it builds a Makefile and lets you just drop your own rules in, it encouraged me to just call my library’s ./configure and then make in its directory:

use ExtUtils::MakeMaker;

    # ... boilerplate stuff stripped for brevity...

    INC               => '-I./monotonic_clock/include',
    MYEXTLIB          => 'monotonic_clock/.libs/libmonotonic_clock.a'

sub MY::postamble {
$(MYEXTLIB): monotonic_clock/configure
    cd monotonic_clock && CFLAGS="$(CCFLAGS)" ./configure && $(MAKE) all
monotonic_clock/configure: monotonic_clock/
    cd monotonic_clock && ./

This of course worked just fine on my Mac.

But it utterly failed everywhere else. Ugh, there’s hardly any green there! Ok, so I focused on the Unix failures first—I know this, I should be able to get stuff working. Turns out Linux wants -fPIC on the library’s code because I’m statically linking against it and it will end up in my modules shared library “.so” file. Ok, so I can just unilaterally add -fPIC:

sub MY::postamble {
$(MYEXTLIB): monotonic_clock/configure Makefile.PL
    cd monotonic_clock && CFLAGS="$(CCFLAGS) -fPIC" ./configure && $(MAKE) all
monotonic_clock/configure: monotonic_clock/
    cd monotonic_clock && ./

That works on clang on the Mac and gcc on Linux. I actually tested on my Debian machine. Everything should be great now!

Nope. Half the Linuxes are still failing, some of the Macs too, and I haven’t even addresses Windows yet. I discover that -fPIC happens to be defined by ExtUtils::MakeMaker on the appropriate platforms in the CCCDLFLAGS make variable, and switch to it thinking this should solve the remaining Unix problems.

Then I start thinking about Windows. I boot up my Windows 8 VM where I have Strawberry Perl installed and try out my module. It fails utterly. I forgot—you can’t just call ./configure in windows! Plus my library doesn’t even handle Windows in its configure script. I start thinking about writing my own Makefile rules to build it so I can drop the configure script completely. But I don’t like it. I’m going to need to detect which back-end the library should be using. I basically have to recreate what configure does, but in a Makefile. And I don’t know how much shell I can use in my rules and still work on Windows, meaning the detection is going to be a huge issue.

So I decide to drop ExtUtils::MakeMaker and use Module::Build instead. It’s actually pretty straightforward:

use strict;
use warnings FATAL => 'all';
use Module::Build;

my $builder = Module::Build->new(
    # ... boilerplate stuff stripped for brevity...

    extra_compiler_flags => '-DHAVE_GETTIMEOFDAY', # We're going to assume everyone is at least that modern
    include_dirs => 'monotonic_clock/include',
    c_source     => ['monotonic_clock/src/monotonic_common.c'],

# Add the appropriate platform-specific backend.
# This isn't as good as the configure script that comes with
# monotonic_clock, since it actually tests for the feature instead of
# assuming that non-darwin unixes support POSIX clock_gettime. On the other
# hand, this handles windows.
     $^O                 eq 'darwin'  ? 'monotonic_clock/src/monotonic_mach.c' :
     $builder->os_type() eq 'Windows' ? 'monotonic_clock/src/monotonic_win32.c' :
     $builder->os_type() eq 'Unix'    ? 'monotonic_clock/src/monotonic_clock.c' :


I figure out that I can abuse the c_source configuration option. It’s supposed to be a directory where the source code lives and they search it recursively for “*.c” files. But it turns out if I pass a C file to it instead of a directory, the recursive search finds that one C file! So now I can add specific C files for the particular platforms. I now have something that compiles on my Mac, my Debian machine, and my Windows VM. Hooray! That covers everything. Finally!

Sigh. That’s worse than my last Makefile.PL based version! What’s going on? Ok, my c_source hack is biting me. I noticed that it was helpfully adding -I options for the sources directory to the compiler, but since I was passing in actual files to c_source I was getting compiler lines like -Imonotonic_clock/src/monotonic_clock.c. This was a warning on my Debian gcc and on Windows gcc, but my Mac’s clang just ignored it with no message at all. So I blew it off. Well, it turns out other compilers are more strict.

So I start poking around the Module::Build source code. I discover that the c_source is adding to an internal include_dirs list. So I override the compile function to strip .c files out of the include_dirs list.:

# This hacks around the fact that we are using c_source to store files, when Module::Build expects directories.
my $custom = Module::Build->subclass(
    class => 'My::Builder',
    code  => <<'CUSTOM_CODE');
sub compile_c {
  my ($self, $file, %args) = @_;
  # Adding to c_source adds to include_dirs, too. Since we're adding files, remove them.
  @{$self->include_dirs} = grep { !/.c$/ } @{$self->include_dirs};
  $self->SUPER::compile_c($file, %args);

It seems really hacky (I hate having to write Perl code in a string) and a bit fragile (I sure hope they don’t change the API in a new version) but it actually works! I publish that and wait to see my wall of green.

I don’t get to see it yet. This is baffling to me. The worst part is the test reports themselves. They aren’t showing me the build process, and they don’t try to identify the Linux distro, leaving me to intuit it from the versions of various things. It looks like one of the failing distros is Debian Wheezy (aka the current “stable”). So I as a last ditch effort use “debootstrap” to build a minimal install that I can chroot into. I try compiling and I can actually reproduce the error. This is phenomenal because I don’t have to guess any more.

After poking around for a while I discover that in older glibcs, the clock_gettime function that my library uses requires librt and therefore a -lrt option to the linker. Ok. But I don’t want to add that unilaterally—I got bit by that earlier in this process. I’m also frustrated that my backend detection code is just hardwired to Module::Builds os_type() function. Do I really know that old FreeBSDs support clock_gettime? No, I don’t, and I don’t want to figure it out. I think back wistfully to ./configure—it just detects stuff by compiling it and seeing if it works. That’s really what I want—it’s the only way to know for sure without researching and testing on every. single. platform.

And then it hits me, I guess I could do that. Module::Build uses ExtUtils::CBuilder internally, and so I can too. It turns out to actually not be that bad. The longest part of the code is redirecting stdout and stderr so you don’t see a bunch of compilation errors while it’s figuring things out:

# autoconf style feature tester. Can't believe someone hasn't written this yet...
use ExtUtils::CBuilder;
my $cb = ExtUtils::CBuilder->new(quiet=>1);

sub test_function_lib {
    my ($function, $lib) = @_;
    my $source = 'conf_test.c';
    open my $conf_test, '>', $source or return;
    print $conf_test <<"C_CODE";
int main() {
    int $function();
    return $function();
    close $conf_test;

    my $conf_log='conf_test.log';
    my @saved_fhs = eval {
        open(my $oldout, ">&", *STDOUT) or return;
        open(my $olderr, ">&", *STDERR) or return;
        open(STDOUT, '>>', $conf_log) or return;
        open(STDERR, ">>", $conf_log) or return;
        ($oldout, $olderr)

    my $worked = eval {
        my $obj = $cb->compile(source=>$source);
        my @junk = $cb->link_executable(objects => $obj, extra_linker_flags=>$lib);
        unlink $_ for (@junk, $obj, $source, $conf_log);
        return 1;

    if (@saved_fhs) {
        open(STDOUT, ">&", $saved_fhs[0]) or return;
        open(STDERR, ">&", $saved_fhs[1]) or return;
        close($_) for (@saved_fhs);


Using it is pretty easy:

my $have_gettimeofday = test_function_lib("gettimeofday", "");
my $need_librt = test_function_lib("clock_gettime", "-lrt");

The one annoyance is that the feature detection doesn’t completely work on Windows. The technique I used (stolen unabashedly from autoconf) just declares the function with the wrong arguments and return value and tries to link with it. This works in general because C doesn’t encode the arguments or return values into the symbol name. Except Windows apparently does with its API functions: my feature detector detects gettimeofday() just fine, but not QueryPerformanceCounter(). If I declare QueryPerformanceCounter() properly then Windows links with it, otherwise I get an undefined symbol error. I don’t care enough to properly research Windows’s ABI, so I decided to just check $^O, assuming that all Windows will work. A quick check of the Windows documentation reveals that it’s has been supported since Windows 2000 and that’s good enough for me.

I upload to CPAN and finally see that green I’ve been looking for.

But it makes me wonder, why did I have to write this? It’s 2015, has nobody ever needed to build their Perl XS module differently for different platforms? I know I can’t be the first, but I had a really hard time finding any documentation or discussion about it. I didn’t see anything on CPAN that looked like it might solve my problems. Should this be a feature in Module::Build? I’ve only written two XS modules in my life, but both times I needed different sources on different platforms. Does anyone want to point me to something that does this well, or failing that, build off this idea and make something neat? I would love it, certainly.

Mac OS X codesigning woes

I just discovered this wonderful bug. Apparently “hdiutil makehybrid” is stripping code signatures in some cases.

I first verify the code signature on an App (a build of Emacs, in this case)—there are no errors:

$ codesign --verify _dmg-build/

I then use “hdiutil makehybrid” to create a disk image out of the directory.

$ hdiutil makehybrid -hfs -hfs-volume-name "Emacs Test" -hfs-openfolder _dmg-build _dmg-build -o /tmp/tmp.dmg

I then mount the created image and run try to verify the signature again—but this time it fails!

$ open /tmp/tmp.dmg
$ codesign --verify /Volumes/Emacs\ Test/
/Volumes/Emacs Test/ code object is not signed at all
In subcomponent: /Volumes/Emacs Test/

Investigating further, I use “xattr” to list the extended attributes on the “grep-changelog” file. First, the good file:

$ xattr _dmg-build/

And now the bad file:

$ xattr /Volumes/Test\ Emacs/

Yup, all the code signature stuff is completely gone! (The “FinderInfo” stuff is OK, it’s just there as a side effect of mounting the disk image).

I’m not exactly sure how to fix this. Apple recently changed code signing requirements so that 10.9.5 now requires deep signatures (way to change something fundamental in a point release, guys). Also the only thing that correctly makes the deep signatures is Xcode 6 which was released only about 1 week before 10.9.5 was released (way to give advanced warning, guys).

2014-10-03 Update:

I filed a bug with Apple and they suggested I use “hdiutil create -srcfolder” instead of “makehybrid“. This does copy the extended attributes correctly. I had originally not used “create” for two reasons: It didn’t have the “-hfs-openfolder” option and the man page claims that only “makehybrid” makes optimally small filesystems. Turns out that “create -srcfolder” automatically does the same thing as “makehybrid -hfs-openfolder” (though it is not documented in the man page) and in practice the resulting .dmgs are just as small or smaller. Problem solved!

Playstation 4 NW-31250-1 Error

All my Playstation 4 downloads were failing today with “DNS Error” and “NW-31250-1”.

I ran a tcpdump on my router and found this:

15:07:10.389761 00:ee:ff:aa:bb:cc (oui Unknown) > 11:22:33:44:55:66 (oui Unknown), ethertype IPv4 (0x0800), length 109: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 95) > [bad udp cksum 0x15cb -> 0xfee5!] 21068 ServFail q: A? 0/0/0 (67)

That looks promising, a DNS SERVFAIL response to a query for “” (whatever that is).

My router is set up to use Google’s DNS: So a quick test from my computer showed:

$ dig @

; <<>> DiG 9.8.3-P1 <<>> @
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8788
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

; IN A

;; Query time: 58 msec
;; WHEN: Wed Jun  4 15:09:39 2014
;; MSG SIZE  rcvd: 67

“status: SERVFAIL”! But when using Level 3’s venerable

$ dig @

; <<>> DiG 9.8.3-P1 <<>> @
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14680
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

; IN A


;; Query time: 26 msec
;; WHEN: Wed Jun  4 15:09:26 2014
;; MSG SIZE  rcvd: 83

Aha, it works! So Google’s DNS is broken!

To fix this I set up my PS4’s networking manually and added as a secondary DNS server. Then all my downloads started working again. I didn’t have to delete them and start over, either—just clicked retry and they continued from wherever they were.

Last Modified on: Dec 31, 2014 18:59pm