Uncategorized | Missives

How To List Firefox’s Current Quarantined Domains

Firefox describes its quarantined domains this way:

Firefox version 115 introduced Quarantined Domains to protect user privacy and security when we discover significant security issues presented by malicious actors. This feature allows us to prevent attacks by malicious actors targeting specific domains when we have reason to believe there may be malicious add-ons we have not yet discovered.

However they don’t go on to explain how to get a list of them! Turns out it’s pretty easy, they’re in the about:config:

Open about:config and then search for extensions.quarantinedDomains.list. Copy and paste the link because Firefox doesn’t like linking to about:config (nor does it allow url filters, grrrrrr 😡).

Building Slint on macOS with the skia backend

I got a weird error while trying to compile a slint program with the skia backend on my macOS machine:

   Compiling skia-bindings v0.75.0
error: failed to run custom build command for `skia-bindings v0.75.0`

[…a huge section of output snipped…]

  clang -MD -MF obj/third_party/externals/libpng/libpng.png.o.d -DPNG_SET_OPTION_SUPPORTED -DNDEBUG  -w -Wno-attributes -ffp-contract=off -fstrict-aliasing -fPIC -fvisibility=hidden -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.5.sdk -target arm64-apple-macos11 -O3 -O0 -isystem /Users/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/libpng -isystem /Users/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/externals/libpng -isystem /Users/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/externals/zlib  -c ../../../../../../../../.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/externals/libpng/png.c -o obj/third_party/externals/libpng/libpng.png.o
  In file included from ../../../../../../../../.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/externals/libpng/png.c:14:
  /Users/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/externals/libpng/pngpriv.h:945:4: error: ZLIB_VERNUM != PNG_ZLIB_VERNUM       "-I (include path) error: see the notes in pngpriv.h"
  #  error ZLIB_VERNUM != PNG_ZLIB_VERNUM \
     ^
  1 error generated.
  [767/827] compile ../../../../../../../../.cargo/registry/src/index.crates.io-6f17d22bba15001f/skia-bindings-0.75.0/skia/third_party/externals/libpng/pngerror.c
  FAILED: obj/third_party/externals/libpng/libpng.pngerror.o

There wasn’t a lot of info online (Google returned nothing of interest for "ZLIB_VERNUM != PNG_ZLIB_VERNUM") but DuckDuckGo eventually led me to this. I checked and noticed I had this environment variable set:

CPATH=/Users/me/.nix-profile/include:/opt/homebrew/include:

Turns out that was causing the issue. I did a quick unset CPATH and then cargo build worked!

Docker Spam in dmesg

I’ve been annoyed recently with a bunch of junk in my dmesg output:

[Thu Jul 11 10:29:27 2024] docker0: port 5(veth9509698) entered blocking state
[Thu Jul 11 10:29:27 2024] docker0: port 5(veth9509698) entered disabled state
[Thu Jul 11 10:29:27 2024] veth9509698: entered allmulticast mode
[Thu Jul 11 10:29:27 2024] veth9509698: entered promiscuous mode
[Thu Jul 11 10:29:27 2024] docker0: port 5(veth9509698) entered disabled state
[Thu Jul 11 10:29:27 2024] veth9509698 (unregistering): left allmulticast mode
[Thu Jul 11 10:29:27 2024] veth9509698 (unregistering): left promiscuous mode
[Thu Jul 11 10:29:27 2024] docker0: port 5(veth9509698) entered disabled state

This repeats every couple seconds ad infinitum and pollutes the log making it hard when it’s time to debug other things.

I finally decided to track it down and found a few different solutions but none of them worked. I finally found a mention in a bug report that it was normal. Surely it’s not normal to spew messages constantly, though… right?

After pondering it a bit and re-reading some context it seemed like they were saying it was normal to print that every time a container started, not just printing it constantly. That led me to thinking, what if I’ve got a container that’s constantly starting and stopping? What would cause that and how do I find it?

Poking around in the docker command I found docker events, which sounded promising. Sure enough, after running it I saw that a container was constantly starting and stopping. It turns out this container was a failed attempt at something and it had a systemd service that was constantly trying to get it to run. When I stopped working on it I forgot remove the .service file from /etc/systemd/system. Once I stopped the service and deleted the service file, the spewing stopped. Ahhhhhhh.

Bizarre Error while compiling a Rust program

I was trying to compile a third party Rust crate today on a Debian box and I was getting a weird error I’d never seen:

  = note: cc: fatal error: cannot read spec file './specs': Is a directory
          compilation terminated.

I’ve done a bunch of embedded C stuff in the past so I knew what a specs file was—but why couldn’t gcc open its specs file? The first clue was that this particular crate had a directory at its top level called “specs”. I moved it away and the error went away. Ok. But why is gcc looking for a specs file in my cwd??

Rust printed out the linker command it was using to invoke gcc with so I could look through that but there was nothing referencing a spec file there.

I figured it might be something set up incorrectly on my local machine but there’s practically no documentation about where gcc looks for specs files. I ended up downloading the gcc source (thank heavens for apt source!) and grepping it for “cannot read spec file”, which lead me to the load_specs() function in gcc/gcc.c. Ok. So I poked around that file, following the thread of function calls around and just sort of looking for anything. I eventually noticed startfile_prefixes which appeared to be where they searched for specs. Searching around the file led me to a section where it looped over LIBRARY_PATH and added each component to startfile_prefixes. This stuck out to me because I had LIBRARY_PATH set in my env.

I looked at it more closely and noticed it was

LIBRARY_PATH=/usr/local/lib:

Hmm. Why does it have a stray : on the end? Maybe that’s it. I tested it with:

LIBRARY_PATH=/usr/local/lib cargo build

And everything built just fine! Yay!

So it was my fault after all… Sigh. I checked my .bashrc and found a bug in one of my path construction functions which was putting the extra : on the end. Then I spent some time relearning a bunch of excruciating shell details and getting everything fixed. 🙂

ssh_find

I can’t remember who wrote this first, but my friend and I have been using some version of this shell snippet for years:

ssh_find() {
    for agent in "$SSH_AUTH_SOCK" /tmp/ssh-*/agent.* /tmp/com.apple.launchd.*/Listeners* /tmp/keyring-*/ssh; do
        SSH_AUTH_SOCK="$agent" ssh-add -l 2>/dev/null
        case "$?" in # 0 some keys, 1 no keys, 2 no agent
            0) export SSH_AUTH_SOCK="$agent"; echo "Found Keys." ; break ;;
            1) export SSH_AUTH_SOCK="$agent"; echo "Found Agent.";  ;; # Keep looking
        esac
    done
}

This function looks through /tmp for ssh-agent sockets (using various macOS, Linux, openssh naming schemes) and then finds the first one that (a) it has permissions for and (b) have keys unlocked in them (giving precedence to any currently active ones). If it can find some valid agents but none of them have keys then it will use the last one it found.

Later in .bashrc I have a section for interactive shells and I call it from there:

#
# Interactive only stuff:
#
if [[ "$-" =~ i ]]; then
    ...snipped...

    # ssh related stuff
    ssh_find

    if [ x$SSH_AUTH_SOCK == x ]; then
        exec ssh-agent bash
    fi

    ...snipped...
fi

That looks for an agent and starts one up if there were none running¹ ².

How is this actually useful?

Basically it makes it so ever tab I open in my terminal (or in a tmux) latches on to an existing ssh-agent so that ssh just works from anywhere without ever thinking about it.

It’s also useful for cron jobs. I have some cron jobs that look like this:

@hourly . $HOME/.bashrc && ssh_find > /dev/null && rsync -a something:/something/blah /blah/blah

I also have similar ones that do some git operation that needs an ssh key.

It is generally convenient for things that need ssh keys to run, but where you don’t necessarily want the keys sitting around on the disk in unencrypted form. This lets them sit unencrypted in RAM, which is a little better (but at the expense of needing to ssh-add them manually every reboot).

This is kinda gross, since exec throws out the entire bash progress and starts things over but with a new running ssh-agent (more or less doubling the startup time). This only happens on interactive logins so it’s not as bad as you might expect. I do it this way instead of eval $(ssh-agent) so that if the agent was started by ssh-ing into the machine then it gets killed when the ssh process dies (ie, on logout).
This is also racy—picture the case where a gui terminal program starts up and remembers that you had 7 tabs open, so it opens those 7 tabs and starts shells in them in parallel. On my Debian box I have a super complicated and impossible to read fix for this that involves the flock(2) command. I hate it and am embarrassed by the whole thing so fixing the race is left as an exercise to the reader. 🙂

Automator Is Pretty Sweet

I played around with Apple’s Automator app when it was first released, but I could never really find a good use for it (I wrote some actions but I would never reliably use them for whatever reason). But I did eventual write one that I use constantly.

It’s a “Quick Action”, which means it shows up in Finder when you right click:

It extremely simple on the Automator side:

Note: My real script replaces <my-server> with the actual server name.

The key things are to set “Workflow receives current” to “files or folders”, “in” to “Finder.app” and in the “Run Shell Script” section, “Pass input” to “as arguments”. I set the Image to “Send” because it seemed reasonable.

The script it self is trivial:

. ~/.bashrc
for f in "$@"
do
    echo "$f"
    rsync -a "$f" <my-server>:/tmp
done

The trickiest part here is getting SSH working without asking for your password (since this script effectively runs headless). I set up my ~/.bashrc to search through ssh-agent socket files to find one that it has permission to read and that has keys. It then sets up environment variables so that ssh uses that agent, giving me password-less ssh nearly 100% of the time.

I actually have several variants of this Automator action that rsync to different places (some of which are watched by inotify and kick other things off when files appear there). It’s really handy for them to just show up in the Finder right-click menu like that, and I find myself constantly using them to move stuff I’ve downloaded through my local web browser over to my server. I find it way more reliable than trying to keep the remote server mounted through smb or nfs.

Fixing Element Search

My MacOS Element Matrix client (Electron based) started getting errors when I searched. My Windows Matrix client worked just fine, so I was pretty sure it wasn’t a server issue. When I typed anything into the message search I’d get this error right underneath:

“Message search initialisation failed, check your settings for more information”

And when I clicked the “settings” link I’d see this:

Error opening the database: DatabaseUnlockError(“Invalid passphrase”)

Clicking on the “Reset” button brought a scary warning dialog (“you probably don’t need to do this”—yeah, I kinda think I do) but clicking through it didn’t actually seem to do anything.

Searching for those terms got me to a github issue. Nothing seemed resolved but I found a message in the middle that helped me fix the problem.

Here’s what I did:

Close the app
cd ~/Library/Application Support/Riot. I had to poke around till I found that. There was also a ~/Library/Application Support/Element but it seemed older and out of date, oddly.
tar cjf EventStore-bad-2023-06-18.tar.bzr EventStore. This made a backup copy of the EventStore directory just in case I needed it (I didn’t).
rm -rf EventStore. Kill it with fire.
Restart the app

Viola! Working search again!

Shell printf tricks

Do you know the printf trick in shell?

Shell printf repeats its format pattern if you give it more arguments than the pattern has.
You can use this for all kinds of tricks:

$ printf "/some/path/%s " file1 file2 file3 file4
/some/path/file1 /some/path/file2 /some/path/file3 /some/path/file4 
$ printf "%s=%s\n" a 1 b 2 c 3 d 4
a=1
b=2
c=3
d=4

It’s is really nice when you combine it with some other output:

$ printf -- "-p %s " $(ps aux | grep fcgiwr[a]p | f 2)
-p 10613 -p 10615 -p 10616 -p 10617 -p 10619 -p 10620 -p 10621 -p 10622 -p 10623

Say I wanted to pass all the pids of a program that forks a lot to strace. I could do:

strace $(printf -- "-p %s " $(ps aux | grep fcgiwr[a]p | f 2))

An aside, you may have noticed that non-standard f command in the ps pipeline. I got that from here a long time ago (2008 according to the file’s timestamp) and it’s really fantastic—perfect for all sorts of unixy things that output tables (ps, ls -l, etc).

I’m so dumb

I was banging my head against a wall trying to get help for go build with go build help (and for some reason go doesn’t support go build --help, but I kept getting this error:

package help is not in GOROOT (/opt/homebrew/Cellar/go/1.18.3/libexec/src/help)

Does brew not come with docs by default? How else am I supposed to interpret that?

No, I’m just dumb. It’s not go build help, it’s

go help build

Stuck on the final Maquette level

Last night I was playing Maquette and I got stuck on the final level (called “The Exchange”) for a really long time. I was at a place with some towers with crystals above them and I had opened the doors to the first tower; Inside was a switch. Flipping the switch made a bunch of arches appear, leading to another building.

But aside from that nothing happened. I wandered around for about an hour before getting ashamed that I couldn’t solve the puzzle and looking up the answer on the internet. But it turns out, I had encountered a bug! Flipping the switch is supposed to open a door! But instead I just got the switch sound and nothing else happened. I tried loading my last autosave but the same thing happened.

Finally I tried selecting the “Restart” option from the pause menu. I was nervous (restart what exactly? The Level, the Area, the Game?) so I hard saved my game first. Turns out it just reset back to the beginning of “The Exchange”. Within 1 minute I was back at the switch in question and this time I could hear (and see) it open a door when I hit it!

So, if you’re stuck in that section of the game and it seems like you’re just not getting it, it’s not you, the game is just bugged.

Failed Emacs builds, hanging kernels, abort(), oh my

My nightly Emacs builds stopped about a month and a half ago. A couple days after I noticed it was failing I tried to debug the issue and found that building openssl was hanging—I found that Jenkins was timing out after an hour or so. I should mention that it’s dying on a Mac OS X 10.10 (Yosemite) VM, which is currently the oldest macOS version I build Emacs for. I tried building manually in a terminal—the next day it was still sitting there, not finished. I decided it was going to be annoying and so I avoided looking deeper into it for another month and a half (sorry!). Today I tracked it down and “fixed” it—here is my tale…

I (ab)use homebrew to build openssl. Brew configures openssl then runs make and then make test. make test was hanging. Looking at the process list, I could see 01-test_abort.t was the hanging test. It was also literally the first test. Weird. I checked out the code:

#include <openssl/crypto.h>

int main(int argc, char **argv)
{
    OPENSSL_die("Voluntary abort", __FILE__, __LINE__);
    return 0;
}

Well, that seems straightforward enough. Why would it hang? I tryed to kill off the test process to see if it would continue. There was a lib wrapper, a test harness and the actual binary from the source shown above—they all died nicely except for the actual aborttest executable. I couldn’t even kill -9 that one—that usually means there’s some sort of kernel issue going on—everything should be kill -9able.

Next I ran it by hand (./util/shlib_wrap.sh test/aborttest) and confirmed that the test just hung and couldn’t be killed. I built it on a different machine and it worked just fine there. So I dug into the openssl code. What does OPENSSL_die() do, anyway?

Not much:

// Win32 #ifdefs removed for readability:
void OPENSSL_die(const char *message, const char *file, int line)
{
    OPENSSL_showfatal("%s:%d: OpenSSL internal error: %s\n",
                      file, line, message);
    abort();
}

Ok, that’s nothing. What about OPENSSL_showfatal()? Also not much:

{
#ifndef OPENSSL_NO_STDIO
    va_list ap;

    va_start(ap, fmta);
    vfprintf(stderr, fmta, ap);
    va_end(ap);
#endif
}

That’s just a print, nothing exciting. Hmmm. So I wrote a test program:

#include <stdlib.h>

int main() {
  abort();
}

I compiled it up and… it hung, too! What?? Ok. I tried it as root (hung). Tried it with dtruss:

...lots of dtruss nonsense snipped...
37772/0xcaf1:  sigprocmask(0x3, 0x7FFF5DD71C74, 0x0)         = 0x0 0
37772/0xcaf1:  __pthread_sigmask(0x3, 0x7FFF5DD71C80, 0x0)       = 0 0
37772/0xcaf1:  __pthread_kill(0x603, 0x6, 0x0)       = 0 0

So it got to the kernel with pthread_kill() and hung after that. So I tried another sanity check: In one terminal I ran sleep 100. In another I found the process id and did kill -ABRT $pid. The kill returned, but the sleep was now hung and not able to be killed by kill -9, like everything else. Now I was very confused. This can’t be a real bug, everyone would be seeing this! Maybe it’s a VM emulation issue caused by my version of VMWare? I can’t upgrade my VMWare because the next version after mine requires Mac OS 10.14 but this Mac Mini of mine only supports 10.13. Sigh. Also, the Emacs builds were working just fine and then they suddenly stopped and I hadn’t updated the OS or the host OS or VMWare. Nothing was adding up!

As I sanity check, I decided to reinstall the OS on the VM (right over top of the existing one, nothing clean or anything). There was a two hour long sidetrack here with deleting VM snapshots, resizing the VM disk (which required booting into recovery mode), downloading the OS installer and finally letting the install run. But that’s not important. The important part is that I opened up terminal immediately after the OS installed and ran my abort() test:

$ ./test
Abort trap: 6
$

It worked! How about OpenSSL?

$ ./util/shlib_wrap.sh test/aborttest
test/aborttest.c:14: OpenSSL internal error: Voluntary abort
Abort trap: 6
$

Yay! But why?? I don’t actually know. Was it a corrupt kernel? A bad driver that got installed? (What drivers would get installed on this Jenkins builder?) I don’t feel very satisfied here. I’m quite skeptical, in fact! But it’s working. Emacs builds should start coming out again. And I can ignore everything again until the next fire starts! 🙂

Fixing the blower motor in my central air system

On Wednesday as I was going to bed I noticed it was quite hot in my house. I checked my central air blower unit and there was frost on the coils and the blower wasn’t moving. It kept trying to start but not being able to start. The 7 segment led was showing “b5” which I was able to find in the manual:

So the next day I removed the blower assembly and tried to extract the motor. The motor shaft extended quite far past the end of the fan and it was rusty so it didn’t want to come off. I attempted to get it off using gravity and a hammer:

I failed. I was only able to get it this far:

So I took it to my dad’s house because he has a better tools than I do. We ended up drilling the end of the shaft out to get the motor removed. We tested on the bench and read a lot of troubleshooting manuals and determined that the motor was shorted. It was hard to turn and there were very obvious magnetic “detents” we could feel when turning it by hard. We took the motor apart and looked around:

We measured the resistance across all the pairs of pins on the 3 pin motor connector: 2 pairs measured 3 ohms and one pair was at 0.9 ohms. We kept the meter plugged into the shorted pair and moved a bunch of wires around. At some point we noticed it had changed up to 3 ohms but we weren’t sure which part we had messed with to make it that way. All attempts to short it out again to identify the bad wire area failed. From this point on it was never shorted.

We put it all back together, fixed the drilled out shaft by cutting it off with a hacksaw and then sanding off all the sharp edges and rust. I took it home installed it and it spun up. It ran for about two minutes and then died. Same symptoms. I kind of expected it since we really didn’t fix anything, just shuffled things around, but it was still disappointing.

The next day I ordered a used motor off ebay and suffered through a hot July Friday.

Saturday I decided to have one last ditch effort to fix it since the weekend was supposed to be upwards of 90ºF. I got it apart and found this:

That looks obviously bad and you’d wonder how we could miss it, but that’s only because it’s blown up so big. That wire is one of the skinny winding wires around the motor. In fact I couldn’t actually see the issue at all until I used my phone camera as a magnifying glass. I pulled out some slack on the wire around that break to see if there were any more issues. it looked like this:

To me it looked clearly exposed. I had the meter plugged in this whole time and I could tell as I freed this bad area the short cleared. To fix, I wrapped it in electrical tape and then put it back down where it was:

Even pressing it down as much as I could I couldn’t get it to show a short on the meter, so I believed I had it fixed. I put it all back together:

I mounted it up and tested it. It worked! And more importantly, didn’t die after just a couple minutes. Eveything’s been running all day now and my house is finally back down into the sub 80ºF range. I didn’t have central air until a couple years ago, and it’s amazing how fast I’ve transitioned to thinking that 85ºF is unlivably hot.

Now I just have to decide what to do with the replacement I bought off ebay. I just know that if I cancel the order then my motor will die just a couple days later. But if I don’t cancel, then my fix will work for the next 50 years…

Screw Shareholder Value

I was digging around and I found the original SSV announcement posted to Dominion (the (in)famous Sisters Of Mercy Mailing list).

What is SSV? Just read the letter—it explains everything. More info can be found here, here, or here, or here.

Here is the email, reproduced with headers for posterity:

Received: from maekong.ohm.york.ac.uk by rewind.indigita.com (NTMail Server 2.11.25 - ntmail@net-shopper.co.uk) id David Mon, 27 Oct 97 10:50:32 +0000 (PST)
Received: by maekong.ohm.york.ac.uk
                  id m0xPtmn-00009TC
                  (/\oo/\ Smail3.1.29.1 #29.7); Mon, 27 Oct 97 18:21 GMT
Sender: <dominion-request@maekong.ohm.york.ac.uk>
Received: from emout29.mail.aol.com ([ 198.81.11.134]) by maekong.ohm.york.ac.uk
                   with smtp id m0xPtmf-00009PC
                   (/\oo/\ Smail3.1.29.1 #29.7); Mon, 27 Oct 97 18:21 GMT
Received: (from root@localhost)
                    by emout29.mail.aol.com (8.7.6/8.7.3/AOL-2.0.0)
                          id NAA28825 for dominion@maekong.ohm.york.ac.uk;
                            Mon, 27 Oct 1997 13:20:53 -0500 (EST)
Date: Mon, 27 Oct 1997 13:20:53 -0500 (EST)
From: Baxville@aol.com
Message-ID: <971027131941_697440177@emout09.mail.aol.com>
To: dominion@maekong.ohm.york.ac.uk
Subject: SSV - Go Figure
Comments: Sisters Of Mercy mailing list
Sender: dominion-request@maekong.ohm.york.ac.uk
X-Info: indigita mail server
Mime-Version: 1.0

SSV  "Go Figure"

After years of stalemate, Mr Andrew Eldritch has managed to get East West to
accept (in lieu of two remaining Sisters albums) a record bearing no
resemblance whatsoever to The Sisters Of Mercy. This album will be released -
if at all - under a completely different name, which is just as well, as it's
got practically nothing to do with the Sisters. Furthermore, East West agreed
not to hear the album in advance....  so it bears no resemblance to *any*
quality product, let alone the Sisters.

....in return for which, Mr Eldritch agrees to let East West keep 75000
pounds which they owe him, and which they were refusing to pay in any event.
Unsurprisingly in the circumstances, Mr Eldritch made sure that East West got
the record they deserve, but made sure that they then paid a lot more for it
than he deserves.... ....especially for one afternoon recording the
occasional mumble on the reject material of some amateur acquaintances from
Hamburg. Go figure.

It is rumoured, indeed, that the whole album only took two days from start to
finish (somewhat less than the usual nine months), and that the "rather bad
sub-techno" music was under-average and boring even before the drums were
mysteriously removed. The "lyrics" dwell almost exclusively on the
glorification of shooting people and selling drugs to schoolchildren. It is
rumoured that the full name of the band (SSV-NSMABAAOTWMODAACOTIATW) stands
for "Screw shareholder value - not so much a band as another opportunity to
waste money on drugs and ammunition courtesy of the idiots at Time Warner".
Go figure.

How desperate must the corporation be? Desperate enough to try and force an
artist to record with threats of massive litigation after a seven-year
impasse, expecting him to make a good record while witholding a fortune in
back-royalties, and then desperate enough to pay "a very large amount" for a
record which the artist neither wrote, nor composed, nor arranged, nor
produced, whose content is merely a puerile attempt to be as offensive as
possible? Go figure.

Finally, rumour has it that the record company are planning to release it,
which would, you might think, be a conscious insult to the general public if
East West were smart enough to know a pile of crap when they hear it. Either
way, go figure.

Anyway, Mr Eldritch is now free to resume his recording career with The
Sisters Of Mercy, who have been waiting very patiently for him at a chemist's
round the corner, and very sensibly having nothing to do with the SSV album
 - because they didn't have to.

The Sisters will be celebrating the liberation with a small tour of Europe
and America in January/February, and the release of a stonking new (Sisters)
single on the day after Mr Eldritch's contract officially expires, which will
be a couple of months later. Work has started on the next (Sisters) album.
Normal service has been resumed.

Oh, and in case you haven't got the picture, we do NOT recommend the purchase
of an SSV album, should East West actually try to sell one to you. We
recommend that you wait for the magic of the Net to mysteriously provide you
with a free one  - just this once.

BaxCorp

DKIM and Exim4 on Debian

I wanted to get DKIM working on an Debian box I have that runs Exim. The first thing to do is to create the keys:

$ openssl genrsa -out diamonds.key 4096
$ openssl rsa -in diamonds.key -pubout > diamonds.pub

I was following these instructions and noticed that Exim supports ed25519 DKIM signatures. Neat! I decided I may as well create those keys, too:

$ openssl genpkey -algorithm ed25519 -out hearts.key
$ openssl pkey -outform DER -pubout -in hearts.key | tail -c +13 | base64 > hearts.pub

From there I stuck the public keys in DNS:

diamonds._domainkey.example.com. 3600 IN TXT  "v=DKIM1; k=rsa; t=s; p=MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAxDMS3KRFCU4PEtygOUdALBt7jmz5IIX2+KHoV4fd0CLjXRvOqA5H8rU3e+y1lNese9yjPLksPqiOh5vtx8Tysjv2MSTXB1Kgr0tl+1IlJL4ihdpUgR1veKB5X4wK3Ppkr5Oy42H7xNHf/yj6aC1E+alZ8TdssuHY3ReqO6YvGa72UqTMmL1gBl9SXBUl" "vD+FqvfFtkQFFMU9QSTtrIuzcup6NC6z3a4I4UGz4YOZSxeUARKzySGFzPd7vwmrKEZVhlA0HzmJm9eGrjq6IiLVdgTJhSZ8Ecn9h65x9EjhNYYhsufTbcPDljlZYpA4b+dkTEs35a4KjOM2wY7gUdY9ydOqOCfz2BpzJ25Mn3K8nTV8a7fInWCnKg0sm6Fuiwe0DrQjrTe7xGC3Y03CU8eziynOukyWnfsCAnpWcUGa15bp1/O0Le+ZYsKOWxA" "CL5cKlYPw1VJrqz7ZQ1i+s+twOLgEKWm8gwKMsDysgpM1WvE+IhlJkkZLkWavF9pAKeSD6akkHcbkB3QsDKgNugDC4EEm6XV/+hPcTS9Gmd2PYPswxg8nlEdUDjxLul6UbKzWwkYihzKxhMSqCEXTUkt6eHjT+KAIHXVm86elFEmOcuadUWwr+74fgnTpv1XbWIs5qqqh/zROhvUUR8EXZbjOchFEX3YjLO8NDPqHdW4zHt0CAwEAAQ=="
hearts._domainkey.example.com. 3600 IN TXT    "v=DKIM1; t=s; k=ed25519; p=MTGVeSXmIzviF/B+ANc/bLqP2zEWhO/rw6o8HxIl5+8="

Ed25519 is quite compact! t=s:y is found in the DKIM RFC (section 3.6.1). t is for various flags. s is for strict (I’m just guessing the mnemonic)—it means all the domain names have to match. Apparently you don’t want this if you use subdomains in your email addresses (I don’t). y means “This domain is testing DKIM”—ie, don’t worry if it fails. It seemed reasonable to set that while I was playing around.

Next, I had to set up Exim in Debian. This was kind of a pain because there’s the Exim config, then the Debian wrapper around that config. This is made more complicated by the fact that Debian has a debconf option named dc_use_split_config. You can see which way yours is set in /etc/exim4/update-exim4.conf.conf (the double .conf is not a typo!). If it’s false then when you update /etc/exim4/conf.d you first have run /usr/sbin/update-exim4.conf.template which cats everything in the conf.d dir into /etc/exim4/exim4.conf.template. Then you have to run /usr/sbin/update-exim4.conf which combines /etc/exim4/exim4.conf.localmacros and /etc/exim4/exim4.conf.template and puts the resulting final config file in /var/lib/exim4/config.autogenerated. Fwew.

The basic DKIM config is in /etc/exim4/exim4.conf.localmacros. I added these lines:

DKIM_CANON = relaxed
DKIM_SELECTOR = diamonds : hearts
DKIM_DOMAIN = example.com
DKIM_PRIVATE_KEY = /etc/exim4/dkim/$dkim_selector.key

For my setup this wasn’t enough. The DKIM_* macros are only used by the “remote_smtp” transport (found in /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp). I was using a “satellite” configuration with a smarthost. This means it uses the “remote_smtp_smarthost” transport (found in /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost). You can tell what transport is being used by looking for T= in /var/log/exim4/mainlog.

I copied all the DKIM related stuff from /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp into /etc/exim4/conf.d/transport/30_exim4-config_remote_smtp_smarthost, namely these lines:

# 2019-08-26: David added these:
.ifdef DKIM_DOMAIN
dkim_domain = DKIM_DOMAIN
.endif
.ifdef DKIM_SELECTOR
dkim_selector = DKIM_SELECTOR
.endif
.ifdef DKIM_PRIVATE_KEY
dkim_private_key = DKIM_PRIVATE_KEY
.endif
.ifdef DKIM_CANON
dkim_canon = DKIM_CANON
.endif
.ifdef DKIM_STRICT
dkim_strict = DKIM_STRICT
.endif
.ifdef DKIM_SIGN_HEADERS
dkim_sign_headers = DKIM_SIGN_HEADERS
.endif

Then I ran update-exim4.conf.template and update-exim4.conf and finally systemctl restart exim4.

At this point I could send emails through and the DKIM headers were added.

Next I removed the y flag from the t flags in the DNS since everything appeared correct. I also added the ADSP DNS record:

_adsp._domainkey.example.com. 3600 IN  TXT     "dkim=all"

Then I wrote this post and called it a night!

AT&T causes mDNS on Linux To Fail

Today my Jenkins builds were not working because all of the build slaves were offline. Digging around in the logs showed that the couldn’t connect because of name resolution failures. I use mDNS on my network (the slaves are Mac OS X VMs running on a Mac Mini), and so they were named something like xxxx-1.local and xxxx-2.local I tried just pinging the machines and that refused to resolve the name, too.

I verified that Avahi was running, and then used avahi-resolve --name xxxx-1.local to check the mDNS name resolution. It worked just great.

So why would mDNS be working fine network-wise, but no programs were resolving correctly? It struck me that I didn’t know (or couldn’t remember!) how mDNS tied in to the system. Who hooks in to the name resolution and knows to check for *.local using mdns?

It turns out it’s good old /etc/nsswitch.conf (I should have remembered that)! There’s a line in there:

hosts:          files mdns4_minimal [NOTFOUND=return] dns

That tells the libc resolver (that everything uses) that when it’s looking for a hostname, it should first look in the /etc/hosts file, then it should check mDNS, then if mDNS didn’t handle it, check regular DNS. Wait, so mDNS is built right in to libc??

Nope! On my Debian system there’s a package called libnss-mdns that has a few files in it:

/lib/x86_64-linux-gnu/libnss_mdns.so.2
/lib/x86_64-linux-gnu/libnss_mdns4.so.2
/lib/x86_64-linux-gnu/libnss_mdns4_minimal.so.2
/lib/x86_64-linux-gnu/libnss_mdns6.so.2
/lib/x86_64-linux-gnu/libnss_mdns6_minimal.so.2
/lib/x86_64-linux-gnu/libnss_mdns_minimal.so.2

Those are plugins to the libc name resolver so that random stuff like mDNS doesn’t have to be compiled into libc all the time. In fact, there’s a whole bunch of other libnss plugins in Debian that I don’t even have installed.

So my guess was that this nss-mdns plugin was causing the problem. There are no man pages in the package, but there are a couple README files. I poked around trying random things and reading and re-reading the READMEs many times before this snippet finally caught my eye:

If, during a request, the system-configured unicast DNS (specified in /etc/resolv.conf) reports an SOA record for the top-level local name, the request is rejected. Example: host -t SOA local returns something other than Host local not found: 3(NXDOMAIN). This is the unicast SOA heuristic.

Ok. I doubted that was happening but I decided to try their test anyway:

$ host -t SOA local
local. has SOA record ns1-etm.att.net. nomail.etm.att.net. 1 604800 3600 2419200 900

Crap.

Those bastards at AT&T set their DNS server up to hijack unknown domains! They happily give out an SOA for the non-existant .local TLD. So AT&T’s crappy DNS is killing my Jenkins jobs??? Grrrrr…

The worst part is that I tried to use Cloudflare’s 1.0.0.1 DNS. My router was configured for it. But two things happened: 1. I got a new modem after having connection issues recently, 2. I enabled IPv6 on my router for fun. The new modem seems to have killed 1.0.0.1. I can no longer connect to it at all. Enabling IPv6 gave me an AT&T DNS server through DHCP (or whatever the IPv6 equivalent is).

So, straightening out my DNS (I had to revert back to Google’s 8.8.8.8) caused NXDOMAIN responses to the .local SOA, and that caused mDNS resolution to immediately work, and my Jenkins slaves came back online. Fwew.

Firefox imgur failure

My imgur had been acting up for a while now. In my Firefox it wouldn’t render the actual image but rendered most of the rest of the page just fine:

If I went to i.imgur.com (adding the .png or .jpg to the url) then the image loaded just fine. So it wasn’t getting blocked by anything in my network stack (ad-blocker and the like). I opened up the console to take a look and I saw this error:

10:57:53.646 SecurityError: The operation is insecure.                             analytics.js:34:17
             consoleDebug    jafo.js:838:8
             directFire      jafo.js:272:12
             _sessionStart   jafo.js:563:16
             init/<          jafo.js:134:12
             b               lodash.min.js:10:215
             d               raven-3.7.0.js:379:23
             j               https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:26855
             fireWith        https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:27673
             ready           https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:29465
             I               https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js:2:29656

The line in question was doing localstorage.Get("something-or-other"). So Firefox was mysteriously throwing a SecurityError when accessing local storage. After testing a million different things (disabling ad-blockers, privacy extensions, etc.) and reading countless Stack Overflow and forum posts, I discovered that this was because I had "https://imgur.com" listed in my cookie exceptions as "Block".

This was actually surprisingly hard to find because the Firefox preference UI for cookie exceptions is pretty bad—it doesn't sort nicely and it doesn't have a search. It's very confusing because the "Manage Cookies and Site Data" screen looks almost identical but sorts correctly (all domains and subdomains grouped together) and has a handy search bar. If you are used to that screen and then try to use the cookie exceptions screen I can almost guarantee you'll type the cookie into the text box at the top and then scratch your head for a few seconds trying to figure out why the list isn't reacting to your search, only to realize that the text box is for adding exceptions and not for searching! Sigh. At least you can click the "Website" column header to get a plain radix sort instead of the default sort of "Status" (which apparently doesn't do a secondary sort on the Website so everything is just random!). When sorting by "Website" take care to thoroughly search—"http" and "https" versions of the site are separate (this is what cost me the most time).

Once I removed the block, imgur started working again:

Yay. Kinda sucks that imgur's code isn't defensive enough to deal with localstorage exceptions, which I thought were widely known to happen (I know on sites I run I see them happening for crazy errors like "Disk Full" and "IO Error").

Debian OpenSSL 1.1.0f-4 and macOS 10.11 (El Capitan)

Some people were reporting that an IMAP server wasn’t working on their Mac. It was working from linux machines, and from Thunderbird on all OSes. From Macs I was getting this testing from the command line:

$ openssl s_client -connect <my-imap-server>:993
CONNECTED(00000003)
39458:error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-64.50.6/src/ssl/s23_clnt.c:593:

This led me to a recent libssl package upgrade on my server (to version 1.1.0f-4). Checking the changelogs I found this:

  * Disable TLS 1.0 and 1.1, leaving 1.2 as the only supported SSL/TLS
    version. This will likely break things, but the hope is that by
    the release of Buster everything will speak at least TLS 1.2. This will be
    reconsidered before the Buster release.

Ah-ha! To quickly get back up and running I grabbed the old version from http://ftp.us.debian.org/debian/pool/main/o/openssl/libssl1.1_1.1.0f-3_amd64.deb and installed it (and then held the package so it wouldn’t auto-upgrade).

I do hope Debian reconsiders this change, at least in the short term, since I can’t easily force OS upgrades to everyone that uses this server. Ideally Apple would update their old OSes to support TLS 1.2, but I’m not holding my breath.

Fedora libvirt/qemu error on upgrade

Today we upgraded a server that ran a bunch of VMs to Fedora 25 and all the VMs failed to come back online after rebooting.

Checking the logs I found this:

2017-04-14T08:39:35.304547Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x10000 in != 0x20000: Invalid argument
2017-04-14T08:39:35.304579Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
2017-04-14T08:39:35.304759Z qemu-system-x86_64: load of migration failed: Invalid argument
2017-04-14 08:39:35.305+0000: shutting down

After searching fruitlessly for all of those error messages in Google, I finally discovered the problem. The upgrade had shut libvirt down, which saved the state of all the machines into /var/lib/libvirt/qemu/save/. Then it started it back up and the save files were no longer compatible with the new qemu/libvirt binaries.

The solution was to just delete the save files in /var/lib/libvirt/qemu/save/ (which is a little sad, because it’s like yanking the power cord out of a running box). The VMs all started fine after that.

Horizon Zero Dawn Xi Cauldron Is Confusing

After you override the core in the Xi Cauldron (which unlike the others I’ve played through so far, happens pretty early), Aloy says “now I can override more machines”. But annoyingly, you don’t actually get the override until you finish the quest, which happens when you exit the cauldron. Along the way back you’ll encounter a few of the machines that require the Xi override and I found it frustrating when it wasn’t working. I was even thinking maybe it was a bug and I didn’t trigger something correctly, especially because Aloy mutters to herself, “I can’t override this machine yet—I’ll have to explore more cauldrons”. The game acts like you have the override, but you don’t get it until later. I googled but didn’t find anything, so I wrote this. If this happens to you, don’t be confused.

Rust nightly + Homebrew

I used to use this Homebrew “Tap” to install Rust nightly versions, but it stopped working at some point. I messed around with it a lot and determined it had something to do with newer Rusts not liking the install_name_tool calls that homebrew forces on libraries during installation.

So I looked into rustup (the nice shell version, not the crazy rust binary version which sadly seems to be on track to replace it) and came up with the following shell function.

rust_new_nightly_to_brew() {
    curl -sSf https://sh.rustup.rs -o /tmp/rustup.sh &&
    bash /tmp/rustup.sh --disable-sudo --channel=nightly \
       --prefix=$(brew --prefix)/Cellar/rustup-nightly/$(date '+%Y-%m-%d') &&
    brew switch rustup-nightly $(date '+%Y-%m-%d')
}

Stick that in .bashrc and then run rust_new_nightly_to_brew to install a new nightly into Homebrew’s world. It’s nice to live in this world because if you don’t like a nightly for some reason it’s trivial to revert to your last good install with brew switch.