Andreas Fuchs' Journal

I hack on things for people.

Elixir: First Impressions

For the longest time now, I’ve admired Erlang from afar. It always seemed to be a bit daunting to take on. For one, there was the slightly weird and inconsistent Prolog-inspired syntax (I was always scratching my head over why this place needs a period and that place doesn’t), and then there was just plain weird stuff like one-based indexes.

While you don’t end up needing indexes very often, a nice syntax on top of Erlang is something I always kind of wanted, but nothing really could deliver. Then I saw Jose Valim demoing Elixir at Strange Loop 2012. It has a ruby-inspired (but more regular) syntax, it can do macros(!), it has protocols(!!!), and it has a very enthusiastic developer community behind it (see expm for an example of the packages that people have written/ported over to Elixir). That its data structures use zero-based index access certainly helps, too (-:

On top of all these nice things, it also lets you use any Erlang library (with only minimally less nice syntax by default). I think I’m sold.

What is all that hair on the floor?

As an initial just-for-fun project, I tried porting over the progress I’d made on a node.js-based gmail->localhost IMAP backup tool that I’d optimistically named gmail-syncer.1 So far, this has required a ton of yak shaving, but I’m enjoying the hell out of every single step down the fractal yak ranch.

  • First, there is no suitable IMAP client library. The thing that comes closest is erlmail. It is somewhat abandoned, and its IMAP client isn’t very usable for my purposes (doesn’t implement capabilities the way I need them, doesn’t really follow the one relatively sane guide to writing an IMAP client). So I’ll have to write my own IMAP interaction code.

  • To write my own IMAP code, I need to parse server responses; this requires parsing the highly weird IMAP protocol, with its somewhat lisp-inspired (but definitely not lispy) ideas of how to represent things. For example, The way a UID FETCH response looks makes it pretty impractical to tokenize & parse the response using a parser generator - unless you enjoy concatenating potentially dozens of megabytes of text that would do better to remain as an opaque binary buffer.

  • Hence, to parse server responses in a smarter way, I have to have a smarter parser. While that can use a pretty nice heuristic (despite its lispy nature, the IMAP server responses are specified to terminate in newlines at certain points), I still need it to cooperate well with something that manages buffers received from the network somewhat smartly. Aaaand that’s where I am right now.

Introducing gmail_synchronize, the tool that doesn’t do very much right now other than fill a buffer and let you read lines or N-byte-long binaries from them. But I’m sure there will be more stuff eventually (-:

To come this far, I’ve written some kilobytes of code (on various levels of the aforementioned yak stack) and thrown them away. The results in the git repo are the best I have come up with, so far. This isn’t much, and so you should take the following opinions with a mine of salt.

My impression of Elixir so far

Here’s a brain dump of what about the language stood out to me:

So far, I really like Elixir (and, by extension, Erlang). There’s a lot to be said about its pattern matching (which is as powerful as Erlang’s), but I don’t think I fully understand it yet. There’s a bit of terminology I still have to learn, but even at this level of (non-)proficiency, it’s making my job way easier.

There’s a very helpful channel on freenode, #elixir-lang. It has the creator of the language in it, and a bunch of very enthusiastic, knowledgeable and helpful people (hi, yrashk and cmn!). This has been invaluable in my learning to use the language.

I still don’t quite get why some of the decisions in it were made the way they were made. For example, it would seem natural to me to have a way to pattern-match binary buffers to test whether some bytes appear next to each other in the buffer, but there isn’t. I guess this may have to do with being able to unambiguously resolve the pattern, but it’s still a bit unsatisfactory. I’m sure this will pass as I learn more of its vocabulary and integrate it into mine.

Testing in Elixir is very cool. Instead of mocking or stubbing things like I would in, say, Ruby, I factor things such that tests can implement a protocol that the part being tested uses, and I’m set. I love protocols, and I think Elixir lets you use them in a very nice way. See here for how the tests interact with a library that follows a protocol. Note the re_buffered variable - in Ruby, I’d be using a method call expectation instead - this is way more satisfying.

Non-modifiable data structures are way less of a pain than I’d imagined (they are in fact pretty pleasing). The pattern matching makes things much easier to follow, and the way updates (which return a new object) work is also pretty cool: You can write stuff like:

1
some_record.buffer("foo").number(20)
…and this returns a record that is like some_record, except its buffer and number components are replaced by the values passed in the function argument list. Pretty pleasing.

I would not have been able to write code so relatively painlessly if it weren’t for the emacs mode that I’ve painfully adjusted to automatically indent Elixir code correctly. Emacs’s smie is really pretty cool, and I wish more emacs modes used it (-:

That’s all so far. I urge you to check out Elixir, and hope you have as much fun with it as I do!


  1. Why write a new tool over using offlineimap? Offlineimap is a huge pain - when used with gmail, it’ll sometimes run into UIDVALIDITY mismatches (which require a re-download of potentially huge mailboxes, which run for days), it’s slow, and its thread-based design is so horrible that it manages to mess up its own UI even when using a single worker thread, and then it can’t even exit cleanly on anything other than a SIGKILL. Arrrrgh.

Write Gmail Filters in a Nice Ruby DSL: Gmail-britta

I’ve just finished (mostly) documenting and writing tests for my latest little library, gmail-britta, so thought I should release it to the world as a sort of holiday gift.

Gmail-britta is a library that lets you write Gmail filters (hah, Britta, get it?) in a way that doesn’t drive you insane - you write them as a ruby program, run that program, and out comes XML that you can import into Gmail’s filter settings.

It does a bunch of other nice things, but I guess it’s better to let the README explain

So far, I (and a few colleagues of mine) have been successfully using this for the past few months to generate filters for work email. Just yesterday I took the step and ported my 156 filters over to a gmail-britta program (yep, that’s my filters, with sensitive email addresses stubbed out), resulting in 34, easier to maintain, more accurate filters.

If you’re interested, please give it a try. Also, please let me know in the issues if you find anything that it doesn’t do, or if you’re feeling super generous, please open a pull request and send me improvements!

Some More Updates

So I’ve been moving stuff off my 6 year old server to a machine hosted in Germany lately. I hope to bring back Boinkmarks on it some day soon. (Not in the way I brought back the git repos, though - no outsourcing for benchmarks!) (-:

There are a couple state changes in my projects that would not warrant a blog post on their own, but I think as a whole are still something to write home about:

  • I’m retiring the Jofr.li web site - it pretty much got obsoleted at birth by Twitter’s URL shortening thing, and it was just a finger exercise anyway. You can still peek at the source if you want to get a feel for how I think a lisp redis-backed hunchentoot app could be structured.

  • The CXML-RPC library’s server part should now work in the newest Hunchentoot. In the process, I think I found a bug in CXML’s Klacks parser under CCL - it fails to decode &# entities.

  • In not so very lisp-related news, I am learning that keeping your server’s configuration in puppet is a really great thing. I’d never done this for my own machines up until this point, but it definitely helps to have all the state re-bootstrapable in one repository. Makes it way easier to reason about system configuration (“Now where are these vhosts’ www root directories again? What, they were behind a bind mount? What was I thinking?!” - these moments are severely reduced when you can just look at git log output).

And lastly, Dan Weinreb died. I wish I had had more opportunities to work with him, chat with him and learn from him.

Git Lives Again - Somewhere Else

I’ve revived the git repos affected by this outage - the cvs->git conversion is now alive again, and the repos there are now kept on github.com.

Turns out there are only two more CVS repos left that I was converting to git: McCLIM and SLIME. So, they’re online again, and I hope you still find them useful.

If you are missing any repos that I forgot to move, please send me a note. I should have extensive backups of everything, so restoring anything that’s missing probably isn’t a problem. (Famous last words, hah!)

git.boinkor.net Outage

I’m currently moving some of boinkor.net’s services off the creaky old machine that used to host it, over to another machine. This affects git.boinkor.net - it’s not going to be available for the next 2 days. (With a bit of luck, it may be back up a little sooner, though.)

This probably affects you if you follow the slime and mcclim git repos hosted there.

This was caused by a case of really bad planning on my part. Sorry for the inconvenience.

IDNA Now Supports Punycode Decoding

My IDNA library now supports decoding IDNA strings via the to-unicode function:

    (to-unicode "xn--mller-kva.example.com")
    ;; =>  "müller.example.com"

That’s in addition to the regular encoding for unicode domain names:

    (to-ascii "müller.example.com")
    ;; => "xn--mller-kva.example.com"

Sadly, I haven’t managed to get the logic for case-sensitive punycode encoding to work yet. But fortunately, IDNA domain name encoding doesn’t require that! Anyone looking for some low-hanging fruit-shaped lisp projects is welcome to add that! (-:

Accessing the Stripe API From Lisp

Stripe is a new payment processor on the Web, and they seem to be a lot less insane than Paypal. On a whim, I made a little (almost completely untested, toy) CL library for accessing their HTTP API from Lisp. Maybe you’ll find it useful: cl-stripe.

This was pretty great fun! Thanks to their nice HTTP API, drakma, and alexandria, I have been able to write this with a minimum of horribly hacky code, in just 5 or 6 hours of working on it, on and off, this saturday afternoon.

If it still looks like fun, I think I may add some clucumber tests to it tomorrow. Stay tuned.

A Weird Problem With ASDF on NFS (and a Workaround)

Recently, we at Franz have been seeing weird failures when building a certain ASDF system on NFS: We were intermittently getting redefinition warnings when loading a system - not all the time, but more often when we compiled during certain timeslots.

This was a weird one, and I think it would be nice to record what this is, and how we figured out what’s going on (and how we arrived at our work-around).

Update 2011-10-11: Faré informs me that this problem is fixed (all the way, no need for a workaround) in ASDF 2.017.9!

The Symptom

We have a system and a specialized operation (concat-op, it generates a single loadable .fasl we can ship) that depends on asdf:load-op. In our build process, we first load the system, then generate some data with the loaded system, and then perform the custom operation on the system.

When performing that operation with a source tree that was checked out on NFS, the load-op that it depends on sometimes got performed a second time: The lisp loaded all the .fasls again, and for some constructs in some .fasls, signaled a WARNING, which made the build break.

Oddly enough, the failure happened only during certain time slots - we would see the build work between 3pm and 4:30pm, and starting at 4:30 it failed consistently until it was time to go home. Huh.

Aside: How ASDF Operates

Not everyone might be familiar with how ASDF works (if you are, feel free to skip to the next section, or stay and nitpick (-:), so here’s a small primer on what happens when you type (asdf:load-system :some-system). Here’s a little walkthrough:

  1. ASDF runs the generic function traverse with the system and the operation as parameters.

  2. traverse walks the dependencies of the system and the contents of the system itself, and determines which operations are not yet done.

    For a load-op on a CL source file, traverse will try to generate a load-op for the input-file of that load-op (the .fasl file), check if that .fasl file exists, and if it doesn’t, then it will also generate a compile-op for the corresponding .lisp file.

  3. As a result, traverse returns a list of operations that must be performed on each component (or module, or system). For a clean source tree, that list looks something like: ((compile-op . source-1) (load-op . source-1) (compile-op . source-2) (load-op . source-2) …)

  4. operate takes that list and just performes each operation on its component in order.

All this means that ASDF takes a two-step approach: It first determines what needs to be done, then does it. All the smarts in ASDF are in that traverse operation and the underlying mechanisms. The rest is just a dolist.

OK, with that out of the way:

The Hunt

I’d gotten this error before, but that was when I was running on a source tree checked out on an NFS-mounted file system on Windows. I didn’t pay it much mind, because, hey, it’s the NFS client on Windows.

But then this exact same problem started happening to a client using two Linux machines as the client and the server. We had a problem.

At first, we suspected that there was an issue with build order (that result list of traverse). This was a blind alley: The files were loaded in exactly the same order in the failing and working scenarios. No luck.

The next thing was to instrument operation-done-p before performing the operation, and there we saw what happened: operation-done-p reported that load-op had not been performed on a file. But that file had been loaded into this very same image just minutes before! Huh?

operation-done-p a generic function and has a method that attempts to handle the most common cases of operations on ASDF components: the method specialized on (operation component), which does the following in the branch that applies to load-op:

(defmethod operation-done-p (operation component))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  (let ((out-files (output-files o c))
        (op-time (component-operation-time o c)))
    (flet ((latest-in ()
             (reduce #'max (mapcar #'safe-file-write-date in-files))))
      (cond
        ;; ...[cut some branches]

        ((not out-files)
         ;; an operation without output-files is probably meant
         ;; for its side-effects in the current image,
         ;; assumed to be idem-potent,
         ;; e.g. LOAD-OP or LOAD-SOURCE-OP of some CL-SOURCE-FILE.
         (and op-time (>= op-time (latest-in))))

         ;; ...[some more branches here]
         )

This consults a registry of times when an operation was performed on a component: component-operation-time returns a universal-time, that is a number of seconds, and compares that to the file-write-date - also a universal-time - of the input file (the .fasl). After some tracing, we determined that for some reason, the .fasl file was one second younger than the time that ASDF thought the load-op had been performed on it. In other words, the compiler had written the file AFTER load had had a chance to read it. ASDF was reading a file from the future.

This was the time when we started scratching our heads.

First, we wrote a little test program to verify we weren’t crazy:

wtf.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <stdio.h>
#include <sys/fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/stat.h>

int main(int argc, char **argv) {
  int fd;
  struct timeval tv1, tv2;
  struct stat sb1, sb2;

  char buf[1024];

  while (1) {
    gettimeofday(&tv1, NULL);
    printf("%u.%u\n", tv1.tv_sec, tv1.tv_usec);

    fd=open(argv[1], O_WRONLY|O_CREAT|O_TRUNC, 0660);

    write(fd, buf, sizeof(buf));
    write(fd, buf, sizeof(buf));
    write(fd, buf, sizeof(buf));

    gettimeofday(&tv1, NULL);
    stat(argv[1], &sb1);
    close(fd);
    gettimeofday(&tv2, NULL);
    stat(argv[1], &sb2);

    /* never seems to be triggered */
    if (sb2.st_mtime != sb1.st_mtime) {
      printf("mtime changed between last write (%d) and close (%u)\n",
	     sb1.st_mtime, sb2.st_mtime);
      exit(1);
    }
    if (sb1.st_mtime > tv1.tv_sec) {
      printf("mtime after last write has a future timestamp (%u > %u)\n",
	     sb1.st_mtime, tv1.tv_sec);
      exit(1);
    }
    if (sb2.st_mtime > tv2.tv_sec) {
      printf("mtime after close has a future timestamp (%u > %u)\n",
	     sb2.st_mtime, tv2.tv_sec);
      exit(1);
    }
  }

  return 0;
}

When we ran it, after a while, at timestamps very close to the boundary to the next second, we’d get “mtime after close has a future timestamp”. What. The.

We checked that all machines were synchronized with NTP. They were, to the same machine on the local network. What is going on?

Luckily, my colleague Ahmon has a lot of experience with NFS. His expertise and ample use of tcpdump finally provided the final puzzle piece: NFS protocol 3 on Linux has a feature called “weak cache consistency”: information can be supplied by servers after most NFS calls (e.g., WRITE) and has the server’s take on file attributes (such as mtime). So if the time on the server is just a tiny bit ahead of the client, the server will report the file that the client just wrote is from the client’s future.

When one apparently time-traveling file appears in the source tree, the traverse method will consider the system to not have been loaded, and will reload the .fasl files starting at the time-traveling file. Anything after that file in the build order could (and did!) potentially mess up the lisp image. In the best case, it would just slow down the build a lot by re-loading a ton of .fasl files. Argh.

Fixing this Mess (aka, the Workaround)

Since ASDF consults a registry of times that a file was loaded, we decided it would be easiest to alter the method that records this timestamp: Instead of the current time, it should record whichever is later: the current time or the timestamp of the file that it loaded.

ASDF work-around for NFS files created in the future
1
2
3
4
(defmethod perform :after ((operation load-op) (c component))
  (setf (gethash (type-of operation) (component-operation-times c))
    (reduce #'max (cons (get-universal-time)
                        (mapcar #'safe-file-write-date (input-files operation c))))))

And that’s it - with this method in place, asdf can now accurately build our system repeatedly, on NFS, even if wtf.c triggers.

Lessons Learned

That was a pretty fun afternoon spent debugging our build process. As a result, we got a working build, and a few shiny new ideas in our heads:

One, a program should never rely on the system time and some file’s creation time being comparable. This just doesn’t work anymore in a distributed system, especially if you’re using full seconds to represent time.

Two, ASDF is pretty flexible (almost to the point of being too flexible). To diagnose ASDF’s internal state, all we had to do was trace some functions it defines, and we managed to put this workaround in without having to deeply modify any of its sources: All it takes is an additional :after method. Sweet.

And three, the Allegro CL fasl loader is very fast (at least it feels so to me, coming from SBCL): In that tiny window (less than 0.07 seconds of real time) it would load a pretty substantial .fasl file and asdf would register it as loaded. That’s pretty impressive! (-: