Category Archives: Uncategorized

Some little-known reasons why D makes day-to-day development easier

When discussing programming languages, most people focus on the big, sexy features: native compilation; functional programming; concurrency. And it makes sense, for the most part these are the features that distinguish programming languages from one another. But the small, little-known features can make regular day-to-day coding a lot easier.

My favourite language is D, as I’ve mentioned before. But languages aren’t like football teams; I’m not a D fan because my dad is or anything like that, I like writing code in the language because I feel I’m more productive. There are several reasons for that, but today I want to talk about the unsexy little things that help me crank out working code faster.

writeln: yes, it’s basically printf. But it’s actually so much more. In most languages, printing out built-in types is easy. “printf(“%d”, intvar)” in C or “cout << intvar” in C++. But you want to actually print one of your own types… you have to write code. Lots of it. Not so in D, the defaults just work. The only real exception are classes, for which you need to override toString. But idiomatic D code doesn’t use many of those, preferring structs. The other good thing about types being easy to print is that you can paste them in to another D source files and compile it. I’ve had to do that a few times.

enums: other languages have enums, they don’t even look very different in D:

enum MyEnum {
    foo,
    bar,
    baz,
}

So how do they help me more than in other languages? First of all, I’ll refer you back to my writeln point: when you print them out you get their string representation, not a number. Internally they’re really just a number like in C, but you never really have to care or worry.

Secondly, “final switch” makes enums a lot more useful by making sure you deal with every single enumeration. Add another value? Your “final switch” code will break, as it should. For those not following the link, in D a final switch on an enum value means that the code will fail to compile unless all enum values have a case statement.

getopt: Defined in std.getopt, it does the very unsexy task of parsing command-line options, but it does it so well. This is the kind of thing that templates allow you to do. In this code:

 int intopt;
 double doubleopt;
 MyEnum enumopt;
 string[] strings;
 auto optInfo = getopt(
     args,
     "i|int", "Int option", &intopt,
     "d|double", "Double option", &doubleopt,
     "e|enum", "Enum option", &enumopt,
     "s|strings", "string list option", &strings,
     );
 if(optInfo.helpWanted) {
     defaultGetoptPrinter("myapp", optInfo.options);
 }

Will do what you expect it to do, and maybe more. It automatically converts the strings given to the program at run-time to the types of the variables that store them. Even better, look at the -s option. If the program is then called with -s string1 -s string2, it’ll hold those two string values. It really is as easy as it looks.

struct constructors: or the lack thereof. Basically, if you write this:

struct Foo {
    int i;
    string s;
    void func() {} //just so it isn't a POD
}

auto foo = Foo(4, "foo");

It’ll compile and work. You can define struct constructors if you want, but it’s the bog-standard one, you don’t have to.

I’m sure there are other lesser-known features in D that make real life programming easier, but these are the ones I can think of of right now.

Valgrind-driven development

At work I’m part of a team that’s responsible for maintaining a C codebase we didn’t write. This is not an uncommon occurrence in real-life software development, and it means that we don’t really know “our” codebase all that well. To make matters worse, there were no unit tests before we took over the project, so knowing what any function is supposed to do is… challenging.

So what’s a coder to do when confronted with a bug? I’ve come to use a technique I call valgrind-driven development. For those not in the know (i.e. not C or C++ programmers), valgrind is a tool that, amongst other things, lets you precisely find out where in the code memory is leaking, conditional jumps are done based on uninitialised values, etc., etc. The usefulness of valgrind and the address sanitizers in clang and gcc cannot be overstated – but what does that have to do with the problem at hand? Say you have this C function:

int do_stuff(struct Foo* foo, int i, struct Bar* bar);

I have no idea how this is supposed to work. The first thing I do? This (using Google Test and writing tests in C++14):

EXPECT_EQ(0, do_stuff(nullptr, 0, nullptr);

99.9% of the time this won’t work. But the reasons why aren’t documented anywhere. Usually passing null pointers is assumed to not happen. So the program blows up, and I add this to the function under test:

int do_stuff(struct Foo* foo, int i, struct Bar* bar) {
    assert(foo != NULL);
    assert(bar != NULL);
    //...

At least now the assumption is documented as assertions, and the next developer who comes along will know that it’s a violation of the contract to pass in NULLs. And, of course, that this function is only used internally. Functions at library boundaries have to check their arguments and fail nicely.

After the assertions have been added, the unit test will look like this:

Foo foo{};
Bar bar{};
EXPECT_EQ(0, do_stuff(&foo, 0, &bar);

This will usually blow up spectacularly as well, but now I have valgrind to tell me why. I carry on like this until there are no more valgrind errors, at which point I can actually start testing the functionality at hand. Along the way I’ve built up all the expected dependencies of the function under test, making explicit what was once implicit in the code. More often than not, I usually find other bugs lurking in the code just by trying to document, via unit tests, what the current behaviour is.

If you have to maintain C/C++ code you didn’t write, give valgrind-driven development a try.

Is there a thing as too many types?

I’m writing a build system at the moment. As it turns out, even the most mundane builds end up having to specify a lot of options. Imagine I want to make life easy for the end-user (and of course I do) and instead of specifying all the files to be built, we just specify directories instead. In pseudo-code:

myApp = build(["path/to/srcs", "other/path/srcs"])

But, for reasons unbeknownt to me as well as a few good ones, it’s often the case that there are files in those directories that aren’t supposed to be built. Or extra files lying around that aren’t in those directories but that do need to be built. And then there’s compiler flags and include directories, so a C++ build might look like this:

myApp = buildC++("name_of_binary",
                 ["path/to/srcs", "other/path/srcs"],
                 ["extrafile.cpp", "otherextra.cpp"],
                 "-g -O0",
                 ["include_from_here", "other_include_dir", "yet_another"],
                 ["badfile.cpp", "otherbadfile.cpp"])

If you’re anything like me you’ll find this confusing. All the parameters (and there are several) are either strings or lists of strings. The compiler flags were passed in as a string so they stand out a bit, but the API could have chosen yet another list and it’s hard to say what is what. This didn’t seem like an API I wanted to use so I definitely wouldn’t expect anyone else to want to use it either.

Then the mental warping caused by me learning Haskell kicked in. What if… I add types to everything? They won’t even really do anything, they’re just there to tag what’s what and cause a compilation error if the wrong one is used. So I added a bunch of wrapper types instead, and in actual D code it started to look like this:

alias cObjs = cObjects!(SrcDirs(["etc/c/zlib"],
                        Flags("-m64 -fPIC -O3"),
                        IncludePaths(["dir1, "dir2"]),
                        SrcFiles("extra_src.c"),
                        ExcludeFiles(["badfile.c", "otherbadfile.c"]);

Better? I think so! You can’t confuse what’s what, and even better, neither can the compiler. After I exposed this to other people, it was pointed out to me that there are many ways to select files and that the API as it was would have to have even more parameters to satisfy all needs. It was big enough as it is so I changed it to something that looks like this now:

alias objs = targetsFromSources!(Sources!(Dirs(["src"]),
                                          Files(["first.d", "second.d"]),
                                          Filter!(a => a != "badfile.d")),
                                 Flags("-g -debug -cov -unitest"),
                                 ImportPaths(["dir1", "dir2"]),
                                 StringImportPaths(["dir1", "dir2"]));

Better? I think so. What do you think?

The Loopers and the Algies

In my opinion there are two broad categories of programmers: the Loopers and the Algies.

The Algies hate duplication. Maybe above all else. Boilerplate is anathema to them, common patterns should be refactored so the parts that look almost-the-same-but-not-quite end up calling the same code. When Algies write C++, they use <algorithm>. They use itertools and functools in Python. They tend to like programming languages with first-class functions, that let you abstract, that let you collapse code. They loathe writing the same code over and over again.

Loopers don’t mind repetition as much. Their name comes from the way they write code: for, while, do/while loops and their equivalents everywhere. They end up writing the same loops over and over again, but they don’t mind. It’s not that they’re bad programmers – it’s just that all the things the Algies like? They think it makes their code more complex. Harder to understand. I actually saw one of them say during code review that this:

stuff = []
for x in xs:
    if condition(x):
        stuff.append(x)

Was simpler than this:

    stuff = [ x for x in xs if condition(x) ]

That’s how their brain works. A loop is… simpler. If it’s not evident by now, I’m an Algie through and through. Somebody asked me one day why I didn’t like C and the first thing that came to my mind was “C makes me repeat myself”.

It’s really hard to get people to agree on what good code actually is. I think most of us agree that we want our software to be maintaiable. Readable. But we all have different ideas on what each of those words mean. Take complexity for instance: we want less of it right? Well, that’s why the Loopers like for loops, in their opinion that reduces complexity. map, filter and reduce are complicated to them, whilst to me, they’re my bread and butter. Not only that, it’s my honest opinion that using algorithms reduces overall complexity. Fewer moving parts. Fewer things to reason about. Fewer bugs.

I think many programming language wars are mostly about people from very different philosophies arguing about things that are important to them that the other side just doesn’t understand. Go, for example, is quite clearly a language for Loopers. They don’t need generics, that would complicate the language. The moment I realised Go wasn’t for me was when I realised that without generics, there could be no Go equivalent of <algorithm> and that I’d have to write for loops. And I really don’t like writing for loops. You shouldn’t either. Unless you’re a Looper of course, and to each his own.

DConf 2015 has come and gone

It was a blast, just like last year was. It really is gratifying to spend 3 days discussing one’s favourite programming language with several people who really know their craft. Of course, attending the presentations and being able to ask questions live isn’t a bad deal either.

I think the highlight for me was Andrei Alexandrescu’s provocatively titled “Generic Programming Must Go”, in which he describes a new way of designing code based on compile-time reflection. He apparently came up with it while designing the new allocators for the D standard library, and I highly recommend watching it (and the other videos!) when they’re online.

I’m actually getting to write D code back at work as well, so that’s always good. In my spare time I’m furiously working on a meta build system, which will definitely feature in a future blog post.

Haskell actually does change the way you think

Last year I started trying to learn Haskell. There have been many ups and downs, but my only Haskell project so far is on hold while I work on other things. I’m not sure yet if I’d choose to use Haskell in production. The problems I had (and the time it’s taken so far) writing a simple server make me think twice, but that’s a story for another blog post.

The thing is, the whole reason I decided to learn Haskell were the many reports that it made me you think differently. As much as I like D, learning it was easy and essentially I’m using it as a better C++. There are things I routinely do in D that I wouldn’t have thought of or bother in C++ because they’re easier. But it’s not really changed my brain.

I didn’t think Haskell had either, until I started thinking of solutions to problems I was having in D in Haskell ways. I’m currently working on a build system, and since the configuration language is D, it has to be compiled. So I have interesting problems to solve with regards to what runs when: compile-time or run-time. Next thing I know I’m thinking of lazy evaluation, thunks, and the IO monad. Some things aren’t possible to be evaluated at compile-time in D. So I replaced a value with a function that when run (i.e. at run-time) would produce that value. And (modulo current CTFE limitations)… it works! I’m even thinking of making a wrapper type that composes nicely… (sound familiar?)

So, thanks Haskell. You made my head hurt more than anything I’ve tried learning since Physics, but apparently you’ve made me a better programmer.

Tagged , , , , , ,

The craziest code I ever wrote

A few years ago at work my buddy Jeff was as usual trying to do something in Go. I can’t remember why, but he wanted to arrange text strings in memory so that they were all contiguous. I said something about C++ and he remarked that the only thing C++11 could do that Go couldn’t would be perhaps to do this work at compile-time. I hadn’t learned D yet (which would have made the task trivial), so I spent the rest of the day writing the monstrosity below for “teh lulz”. It ended up causing my first ever question on Stackoverflow. “Enjoy” the code:

//Arrange strings contiguously in memory at compile-time from string literals.
//All free functions prefixed with "my" to faciliate grepping the symbol tree
//(none of them should show up).

#include <iostream>

using std::size_t;

//wrapper for const char* to "allocate" space for it at compile-time
template<size_t N>
struct String {
    //C arrays can only be initialised with a comma-delimited list
    //of values in curly braces. Good thing the compiler expands
    //parameter packs into comma-delimited lists. Now we just have
    //to get a parameter pack of char into the constructor.
    template<typename... Args>
    constexpr String(Args... args):_str{ args... } { }
    const char _str[N];
};

//takes variadic number of chars, creates String object from it.
//i.e. myMakeStringFromChars('f', 'o', 'o', '') -> String<4>::_str = "foo"
template<typename... Args>
constexpr auto myMakeStringFromChars(Args... args) -> String<sizeof...(Args)> {
    return String<sizeof...(args)>(args...);
}

//This struct is here just because the iteration is going up instead of
//down. The solution was to mix traditional template metaprogramming
//with constexpr to be able to terminate the recursion since the template
//parameter N is needed in order to return the right-sized String<N>.
//This class exists only to dispatch on the recursion being finished or not.
//The default below continues recursion.
template<bool TERMINATE>
struct RecurseOrStop {
    template<size_t N, size_t I, typename... Args>
    static constexpr String<N> recurseOrStop(const char* str, Args... args);
};

//Specialisation to terminate recursion when all characters have been
//stripped from the string and converted to a variadic template parameter pack.
template<>
struct RecurseOrStop<true> {
    template<size_t N, size_t I, typename... Args>
    static constexpr String<N> recurseOrStop(const char* str, Args... args);
};

//Actual function to recurse over the string and turn it into a variadic
//parameter list of characters.
//Named differently to avoid infinite recursion.
template<size_t N, size_t I = 0, typename... Args>
constexpr String<N> myRecurseOrStop(const char* str, Args... args) {
    //template needed after :: since the compiler needs to distinguish
    //between recurseOrStop being a function template with 2 paramaters
    //or an enum being compared to N (recurseOrStop < N)
    return RecurseOrStop<I == N>::template recurseOrStop<N, I>(str, args...);
}

//implementation of the declaration above
//add a character to the end of the parameter pack and recurse to next character.
template<bool TERMINATE>
template<size_t N, size_t I, typename... Args>
constexpr String<N> RecurseOrStop<TERMINATE>::recurseOrStop(const char* str,
                                                            Args... args) {
    return myRecurseOrStop<N, I + 1>(str, args..., str[I]);
}

//implementation of the declaration above
//terminate recursion and construct string from full list of characters.
template<size_t N, size_t I, typename... Args>
constexpr String<N> RecurseOrStop<true>::recurseOrStop(const char* str,
                                                       Args... args) {
    return myMakeStringFromChars(args...);
}

//takes a compile-time static string literal and returns String<N> from it
//this happens by transforming the string literal into a variadic paramater
//pack of char.
//i.e. myMakeString("foo") -> calls myMakeStringFromChars('f', 'o', 'o', '');
template<size_t N>
constexpr String<N> myMakeString(const char (&str)[N]) {
    return myRecurseOrStop<N>(str);
}

//Simple tuple implementation. The only reason std::tuple isn't being used
//is because its only constexpr constructor is the default constructor.
//We need a constexpr constructor to be able to do compile-time shenanigans,
//and it's easier to roll our own tuple than to edit the standard library code.

//use MyTupleLeaf to construct MyTuple and make sure the order in memory
//is the same as the order of the variadic parameter pack passed to MyTuple.
template<typename T>
struct MyTupleLeaf {
    constexpr MyTupleLeaf(T value):_value(value) { }
    T _value;
};

//Use MyTupleLeaf implementation to define MyTuple.
//Won't work if used with 2 String<> objects of the same size but this
//is just a toy implementation anyway. Multiple inheritance guarantees
//data in the same order in memory as the variadic parameters.
template<typename... Args>
struct MyTuple: public MyTupleLeaf<Args>... {
    constexpr MyTuple(Args... args):MyTupleLeaf<Args>(args)... { }
};

//Helper function akin to std::make_tuple. Needed since functions can deduce
//types from parameter values, but classes can't.
template<typename... Args>
constexpr MyTuple<Args...> myMakeTuple(Args... args) {
    return MyTuple<Args...>(args...);
}

//Takes a variadic list of string literals and returns a tuple of String<> objects.
//These will be contiguous in memory. Trailing '' adds 1 to the size of each string.
//i.e. ("foo", "foobar") -> (const char (&arg1)[4], const char (&arg2)[7]) params ->
//                       ->  MyTuple<String<4>, String<7>> return value
template<size_t... Sizes>
constexpr auto myMakeStrings(const char (&...args)[Sizes]) -> MyTuple<String<Sizes>...> {
    //expands into myMakeTuple(myMakeString(arg1), myMakeString(arg2), ...)
    return myMakeTuple(myMakeString(args)...);
}

//Prints tuple of strings
template<typename T> //just to avoid typing the tuple type of the strings param
void printStrings(const T& strings) {
    //No std::get or any other helpers for MyTuple, so intead just cast it to
    //const char* to explore its layout in memory. We could add iterators to
    //myTuple and do "for(auto data: strings)" for ease of use, but the whole
    //point of this exercise is the memory layout and nothing makes that clearer
    //than the ugly cast below.
    const char* const chars = reinterpret_cast<const char*>(&strings);
    std::cout << "Printing strings of total size " << sizeof(strings);
    std::cout << " bytes:\n";
    std::cout << "-------------------------------\n";

    for(size_t i = 0; i < sizeof(strings); ++i) {
        chars[i] == '' ? std::cout << "\n" : std::cout << chars[i];
    }

    std::cout << "-------------------------------\n";
    std::cout << "\n\n";
}

int main() {
    {
        constexpr auto strings = myMakeStrings("foo", "foobar",
                                               "strings at compile time");
        printStrings(strings);
    }

    {
        constexpr auto strings = myMakeStrings("Some more strings",
                                               "just to show Jeff to not try",
                                               "to challenge C++11 again :P",
                                               "with more",
                                               "to show this is variadic");
        printStrings(strings);
    }

    std::cout << "Running 'objdump -t |grep my' should show that none of the\n";
    std::cout << "functions defined in this file (except printStrings()) are in\n";
    std::cout << "the executable. All computations are done by the compiler at\n";
    std::cout << "compile-time. printStrings() executes at run-time.\n";
}
Tagged , , , , , , ,

Emacs as a Python 3 IDE: at last!

Emacs is my editor of choice, I use it to write nearly everything. I have to write Python at work, which I’m ok with since I’m generally a fan of the language. For that I use ropemacs, (with jedi, flycheck and flake8) which makes it possible to use the rope refactoring library from Emacs. Mostly I use it for “go to definition” but its refactoring features are obviously also super useful.

Now, I like new and shiny tech. I run Arch Linux for a reason. I jumped on the C++11 and C++14 bandwagons as soon as I heard about them. So in the Python version debate, if I have a choice I’d go with Python 3 which has been the default on Arch for a while now anyway.

Imagine my dismay when nothing worked anymore in Python 3. My awesome editing environment gone. Rope has a Python 3 version, but ropemacs and ropemode (which ropemacs depends on) don’t. Sadness.

But wait, open source to the rescue! I forked both ropemode and ropemacs and after some porting and much debugging got to something that works: ropemacs_py3k. Enjoy! I know I will.

Type-safety and time intervals in D and Go

My favourite language is now, by far, D. It’s not just not even close. I’ve also been known
to make my opinion on Go be publicly known as “I really don’t like it.”. My work buddy Jeff
is nuts over Go though, and I try not to hold it against him. We keep arguing for “our”
language and disparaging the other guy’s, and it’s all in good fun.

As part of that banter, he sent me a blog post link on Google Plus about flaky tests and what they
tell you about your code. It’s a good read, and exemplifies some of the real-world
engineering problems that happen when developing software. As I read it though, my eyes
roll a bit when I encounter this bit of code:

var Timeout time.Duration
Timeout = time.Duration(conf("timeout", 100)) * time.Millisecond
cache := memcache.New(Servers...)
cache.Timeout = Timeout * time.Millisecond

The first thing I didn’t like about it is that the multiplication by
time.Millisecond looks like C. It’s not that different from
multiplying by a preprocessor macro, with all that entails. Don’t we
know better now? std::chrono from C++ is ugly, but it’s still better
than this.

Related to the C-ness of it, and much more importantly, the second time the code
multiplies by time.Millisecond is the cause of the bug the blog post is about.
And immediately I think: that shouldn’t compile. Maybe it’s my physicist background,
but multiplying time by a time unit shouldn’t work. Or at least you shouldn’t be
able to pass that value to a function expecting a time unit (instead of time squared).

I immediately wrote some D code to make sure I didn’t embarass myself by stating
that in D that would be a compilation error, and to my joy the following code
didn’t compile:

import std.datetime;
void main() {
    auto time = 2.seconds;
    auto oops = time * 3.seconds;
}
foo.d(5): Error: 'time' is not of arithmetic type, it is a Duration
foo.d(5): Error: 'dur(3L)' is not of arithmetic type, it is a Duration

Multiplying time by a scalar works as expected, though, which is what should happen. I showed him the code and he seemed really interested in it, and also wondered
which design decisions led to the current state of affairs. According to him the Go
team likes type-safety, so it seems odd. He also asked me if it would be a compilation
error in Haskell, to which the answer was obviously, in Barney Stinson style, “please!”.

I have to admit letting out a rather childish “sucks to be you” at Jeff today in
the office and basking in my elevated sense of self-worth and computer language choice.

Go D! Pun half-intended.

Haskell monads for C++ programmers

I’m not going to get into the monad tutorial fallacy. Also, I think this blog about another monad fallacy sums it up nicely: the problem isn’t understanding what monads are, but rather understanding how they can be used. Understanding the monad laws isn’t hard. Understanding how to use the Maybe monad isn’t hard either. But things get tricky pretty fast and there’ s a kind of monads that are similar to each other that took me a while to understand how to use. That is, until I recognised what they actually were: C++ template metaprogramming. I guess it’s the opposite realisation that Bartoz Milewski had.

The analogy is only valid for a few monads. The ones I’ve seen that this applies to are IO, State, and Get from Data.Binary. These are the monads that are referred to as computations, which sounds really abstract, but really functions that return these monads return mini-programs. These mini-programs don’t immediately do anything, they need to be executed first. In IO’s case that’s done by the runtime system, for State the runState does that for you (I’m stretching here – only IO really does anything, even runState is pure).

It’s similar to template metaprogramming in C++: at compile-time the programmer has access to a functional language with no side-effects that returns a function that at runtime (i.e. when executed) actually does something. After that realisation I got a lot better at understanding how and why to use them.

The monad issue doesn’t end there, unfortunately. There are many other monads that aren’t like C++ templates at all. But the ones that are – well, at least you’ll be able to recognise them now.

Tagged , , ,