Category Archives: Uncategorized

Emacs as a C++ IDE: headers

So, headers. Because of backward compatibility and the hardware limitations of when C was created, it’s 2016 and we’re still stuck with them. I doubt that modules will make it into C++17, but even if they do headers aren’t going away any time soon. For one, C++ might still need to call C code and that’s one of the languages killer features: no C bindings needed.

If, like me, you’ve created a package to make Emacs a better C++ environment, they present a challenge. My cmake-ide package actually just organises data to pass to the packages that do heavy lifting, it’s just glue code really. And the data to pass are the compiler flags used for any given file. That way, using libclang it’s possible to find and jump to definitions, get autocomplete information and all that jazz. CMake is kind enought to output a JSON compilation database with every file in the project and the exact command-line used. So it’s a question of parsing the JSON and setting the appropriate variables. Easy peasy.

But… headers. They don’t show up in the compilation database. They shouldn’t – they’re usually not directly compiled, only as a result of being included elsewhere. But where? Unlike Python, Java, or D, there’s no way to know where the source files that include a particular header are in the filesystem. They might be in the same directory. They might be nowhere near. To complicate matters further, the same header file might be compiled with different flags in different translation units. Fun.

What’s a package maintainer to do? In the beginning I punted and took the set of unique compiler flags taken from every flag in the project. The reasoning is that most of the time the compiler flags are the same everywhere anyway. For simple projects that’s true, but I quickly ran into limitations of this approach at work.

A quick and easy fix is to check if there’s an “other” file in Emacs parlance. Essentially this means a Foo.cpp file for a Foo.hpp header. If there is, use its compiler flags. This works, but leaves out the other header files that don’t have a corresponding source file out in the cold. There’s also a runtime cost to pay – if no other file is found it takes several seconds to make sure of that by scouring the file system.

I then looked at all source files in the project sorted by levenshtein distance of their directories to the directory the header file is in. If any of them directly includes the header, use its flags. Unfortunately, this only works for direct includes. In many cases a header is included by another header, which includes another header which…

In the end, I realised the only sure way to go about it is to use compiler-computed dependencies. Unfortunately for me, ninja deletes the .d dependency files when it runs. Fortunately for me, you can ask ninja for all the dependencies in the project.

I haven’t written the code for the CMake Makefile generator yet, but I should soon. ninja already works. I’m going to test it myself in “real life” for a week then release it to the world.

Rust impressions from a C++/D programmer, part 2

Following up from my first post on Rust, I thought that after a week running a profiler and trying to optimise my code would have given me new insights into working with the language. That’s why I called it “part 1”, I was looking forward to uncovering more warts.

The thing is… I didn’t. It turns out that the 1st version I cranked out was already optimised. It’s not because I’m a leet coder: I’ve implemented an MQTT broker before and looked at profiler output. I know where the bottlenecks will be for the two benchmarks I use. So my first version was fast enough.

How anti-climatic. The only news-worthy thing that came out of benchmarking it is that the Rust/mio combo is really fast. You’ll have to wait for the official benchmarks comparing all the implementations to know how much though. I’m currently rewriting my problematic C++ version. I have to if I want to measure it: either I give it my best shot of making it fast or the reddit comments will be… interesting to say the least. It got nasty last time.

Tagged ,

Rust impressions from a C++/D programmer, part 1

Discussion on programming reddit

Discussion on Rust reddit

C++ and D aren’t the only languages I know, I labeled myself that way in the title because as far as learning Rust is concerned, I figured they would be the most relevant in terms of the audience knowing where I’m coming from.

Since two years ago, my go-to task for learning a new programming language is to implement an MQTT broker in it. It was actually my 3rd project in D, but my first in Haskell and now that I have some time on my hands, it’s what I’m using to learn Rust. I started last week and have worked on it for about 3 days. As expected, writing an MQTT broker is a great source of insight into how a language really is. You know, the post-lovey-dovey phase. It’s like moving in together straight away instead of the first-date-like “here’s how you write a Scheme interpreter”.

I haven’t finished the project yet, I’m probably somewhere around the 75% mark, which isn’t too shabby for 3 days of work. Here are my impressions so far:

The good

The borrow checker. Not surprising since this is basically the whole point of the language. It’s interesting how much insight it gave me in how broken the code I’m writing elsewhere might be.  This will be something I can use when I write in other systems languages, like how learning Haskell makes you wary of doing IO.

Cargo. Setting up, getting started, using someone’s code and unit testing what you write as you go along is painless and just works. Tests in parallel by default? Yes, please. I wonder where I’ve seen that before…

Traits. Is there another language other than D and Rust that make it this easy to use compile-time polymorphism? If there is, please let me know which one. Rust has an advantage here: as in Dylan (or so I read), the same trait can be used for runtime polymorphism.

Warnings. On by default, and I only had to install flycheck-rust in Emacs for syntax highlighting to just work. Good stuff.

Productivity. This was surprising, given the borrow checker’s infamy. It _does_ take a while to get my code to compile, but overall I’ve been able to get a good amound done with not that much time, given these are the first lines of Rust I’ve ever written.

Algebraic types and pattern matching. Even though I didn’t use the former.

Slices. Non-allocating views into data? Yes, please. Made the D programmer in me feel right at home.

Immutable by default. Need I say more?

Debugging. rust-gdb makes printing out values easy. I couldn’t figure out how to break on certain functions though, so I had to use the source file and line number instead.

No need to close a socket due to RAII. This was nice and even caught a bug for me. The reason being that I expected my socket to close because it was dropped, but my test failed. When I looked into it, the reference count was larger than 1 because I’d forgotten to remove the client’s subscriptions. The ref count was 0, the socket was dropped and closed, and the test passed. Nice.

No parens for match, if, for, …

The bad

The syntax. How many times can one write an ampersand in one’s source code? You’ll break new records. Speaking of which…

Explicit borrows. I really dislike the fact that I have to tell the compiler that I’m the function I’m calling is borrowing a parameter when the function signature itself only takes borrows. It won’t compile otherwise (which is good), but… since I can’t get it wrong what’s the point of having to express intent? In C++:

void fun(Widget& w);
auto w = Widget();
fun(w); //NOT fun(&w) as in Rust

In Rust:

fn fun(w: &mut Widget);
let w = Widget::new();
fun(&mut w); //fun(w) doesn't compile but I still need to spell out &mut. Sigh.

Display vs Debug. Printing out integers and strings with {} is fine, but try and do that with a Vec or HashMap and you have to use the weird {:?}. I kept getting the order of the two symbols wrong as well. It’s silly. Even the documentation for HashMap loops over each entry and prints them out individually. Ugh.

Having to rethink my code. More than once I had to find a different way to do the thing I wanted to do. 100% of the time it was because of the borrow checker. Maybe I couldn’t figure out the magical incantation that would get my code to compile, but in one case I went from “return a reference to an internal object, then call methods on it” to “find object and call method here right now”. Why? So I wouldn’t have to borrow it mutably twice. Because the compiler won’t let me. My code isn’t any safer and it was just annoying.

Rc<RefCell<T>> and Arc<Mutex<T>>. Besides the obvious “‘Nuff said”, why do I have to explicitly call .clone on Rc? It’s harder to use than std::shared_ptr.

Slices. Writing functions that slices and passing them vectors works well enough. I got tired of writing &var[..] though. Maybe I’m doing something wrong. Coming from D I wanted to avoid vectors and just slice arrays instead. Maybe that’s not Rusty. What about appending together some values to pass into a test? No Add impl for Vecs, so it’s massive pain. Sigh.

Statements vs Expressions. I haven’t yet made the mistake of forgetting/adding a semicolon, but I can see it happening.

No function overloading.

Serialization. There’s no way to do it well without reflection, and Rust is lacking here. I just did everything by hand, which was incredibly annoying. I’m spoiled though, in D I wrote what I think is a really good serialization library. Good in the lazy sense, I pretty much never have to write custom serialization code.

The ugly

Hashmaps. The language has operator overloading, but HashMap doesn’t use it. So it’s a very Java-like map.insert(key, value). If you want to create a HashMap with a literal… you can’t. There’s no equivalent macro to vec. You could write your own, but come on, this is a basic type from the standard library that will get used a lot. Even C++ does better!

Networking / concurrent IO. So I took a look at what my options were, and as far as my googling took me, it was to use native threads or a library called mio. mio’s API was… not the easiest to use so I punted and did what is the Rust standard library way of writing a server and used threads instead. I was sure I’d have performance problems down the road but it was something to worry about later. I went on writing my code, TDDed an implementation of a broker that wasn’t connected to the outside world and everything. At one point I realised that holding on to a mutable reference for subscribers wasn’t going to work so I used Rc<RefCell<Subscriber>> instead. It compiled, my tests passed, and all was good in the world. Then I tried actually using the broker from my threaded server. Since it’s not safe to use Rc<RefCell<>> in threads, this failed to compile. “Good!”, I thought, I changed Rc to Arc and RefCell to Mutex. Compile, run, …. deadlock. Oops. I had to learn mio after all. It wasn’t as bad as boost::asio but it wasn’t too far away either.

Comparing objects for identity. I just wanted to compare pointers. It was not fun. I had to write this:

fn is_same<T>(lhs: &T, rhs: &T) -> bool {
    lhs as *const T == rhs as *const T;
}
fn is_same_subscriber<T: Subscriber>(lhs: Rc<RefCell<T>>, rhs: Rc<RefCell<T>>) -> bool {
    is_same(&*lhs.borrow, &*rhs.borrow());
}

Yuck.

Summary

I thought I’d like Rust more than I actually do at this point. I’m glad I’m taking the time to learn it, but I’m not sure how likely I’ll choose to use it for any future project. Currently the only real advantage it has for me over D is that it has no runtime and could more easily be used on bare metal projects. But I’m unlikely to do any of those anytime soon.

I never thought I’d say this a few years back but…I like being able to fall back on a mark-and-sweep GC. I don’t have to use it in D either, so if it ever becomes a performance or latency problem I know how to deal with it. It seems to me to be far easier than getting the borrow checker to agree with me or having to change how I want to write my code.

We’ll see, I guess. Optimising the Rust implementation to be competitive with the D and Java ones is likely to be interesting.

Tagged , , , ,

My first D Improvement Proposal

After over a year of thinking about this (I remember bringing this up at DConf 2014), I finally wrote a DIP about introducing what I call “static inheritance” to D.

The principle is similar to C++ concepts: D already has a way of requiring a certain compile-time interface for templates to be instantiated, the most common being isInputRange. What is lacking right now in my opinion is helpful compiler error messages for when a type was intended to be, e.g. an input range but isn’t due to programmer error.

I tried a library solution first, assuming this would be easier to get accepted than a language change. Since there was nearly 0 interest, I had to write a DIP. Here’s hoping it’s more succesful.

Tagged ,

Abstract Data Types and OOP

I think most of us agree that Abstract Data Types are a good thing. We see the description, we hear about what problems they solve and we nod on in agreement. At least I do. And yet I keep forgetting to actually use them.

Recently I’ve had to or forced myself to write code in imperative and functional styles, in more than one language. As usual, some changes were needed but what was not so usual is that my refactoring was harder than it needed to be. The reason? I didn’t hide the details of how the data were represented. In one example I was parsing JSON in Emacs Lisp to enable IDE-like features in Emacs for C and C++ development. When I tried using it on a largish project at work, it was too slow to be usable. I was naively using assoc lists for JSON objects, and mostly only because it’s the default for the library I was using. A quick fix was to instead use hash tables, but it took me far longer to make that fix than it should have. I’d hard-coded the knowledge of the data being in assoc lists all over the place and even refactoring the unit tests was a pain.

I made a similar mistake again in Haskell. I was writing a networking protocol server and decided that I’d represent replies to requests as a list of (handle, reply) tuples. I haven’t made the refactoring I want to yet, and it won’t be as painful as the Elisp version, but I still hated myself a little bit for encoding that knowledge in the algorithms that were manipulating replies. What if (as I need to for performance) want to change it to a tree?

When I use OOP, or simply the fact that I’m attaching methods to structs and classes, this never seems to happen. There was nothing stopping me from doing the right thing when I made the above mistakes. It just doesn’t seem to happen when I’m writing methods for a class. That boxing up of code seems to make my code better. It just doesn’t seem to happen to me in C++ or D, but just dropping back to C causes me to write absurdities I wouldn’t have otherwise.

I think (and hope) I’ll be able to notice if I ever make this mistake again in other languages now that I’m aware of it. It’s still surprising to me how, after all these years, it’s still possible, and apparently even likely, for me to make mistakes that I’d call basic.

The danger of (over)mocking

I’ve yet to see a mocking framework I actually want to use. Maybe I’ve just not seen enough use cases for one yet, since right now the number of times I would have found one useful is exactly one. Maybe I’m just not a mockist.

The fundamental problem, for me, of how I’ve seen mocking frameworks used in the wild is that it commits one of the gravest sins of software development: it increases coupling. Our tests shouldn’t care how a function/class/method/etc does its job; it need only care that it does in a verifiable manner. After all, one of the reasons to have tests in the first place is to be able to confidently refactor, and we can’t do that if the tests break even though the code’s behaviour hasn’t changed. Let me give an example: last year I saw someone write a test like this (in Python for clarity, the original was in C++):

class Class(object):
    def inc(self): pass
    def dec(self): pass
    def mul(self, x): pass
    def stuff(self, x):
        self.inc()
        self.inc()
        self.mul(x)
        self.dec()

def test_stuff():
    obj = Class()
    mock = MagicMock()
    obj.inc = obj.dec = obj.mul = mock
    obj.stuff()
    assert mock.mock_calls() == [ call.inc(), call.inc(), call.mul(7), call.dec()]

The methods don’t do anything to keep things simple; in real life they were complicated functions. I consider the test above and anything like it to be a complete and utter waste of time and space. This isn’t just an example of increased coupling; it’s literally rewriting the production code in the test. What information hiding? What interface? I really consider asserting that certain calls were made to be an anti-pattern.

So what’s a developer to do? In the case above (and assuming inc, dec and mul do what we expect to self.value), I’d write this intead:

def test_stuff():
    obj = Class(3)
    assert obj.value == 3
    obj.stuff(4)
    assert obj.value == 19

I’m well aware that in real life things are rarely this clean-cut and legacy codebases sometimes have no way of testing for what it does, but that doesn’t excuse introducing coupling into the codebase. It’s hard to present a realistic example in a blog post, but I’ve written many a test for tangled legacy networking code without resorting to the type of testing in the first example. It’s not easy, but it’s definitely doable.

I’m not even interested in mocking code at all except if it’s slow or isn’t absolutely deterministic. The usual culprits are doing anything with the file system, networking or talking to databases. And that’s for unit testing only. So Class.stuff calls some complicated function; that’s its business and I as a unit tester have no right to go poking in its internals. All I care about is the public API and that the behaviour is right. The question should always be: what’s the contract? If you’re asking questions about how... you’re doing it wrong. A good tester is like a mobster’s wife: you don’t want to know.

Now back to that one example of when I wanted to use a mock: In DConf 2013 there was a mocking presentation that expected the logger to be called with an error message if a method was called a certain way. That appealed to me because months before I’d fixed a bug that was “no error message logged when …”. I fixed it but without an accompanying test, which left a bad taste in my mouth. I just didn’t know how to write the test I needed. A mocking framework would have let me do it easily.

C++ can’t get that static analyser from Microsoft soon enough

One of the main announcements at CppCon 2015 was a tool being developed by Microsoft that will help catch bugs at “compile-time”, the idea being that backwards compatibility won’t be touched but we get help from a tool that understands the code. It can’t come soon enough.

I just started writing C++ again after quite a bit of a hiatus and pretty much immediately suffered from bullet-in-the-foot syndrome. In retrospect (as always) I was doing something stupid but the fact is my bug wouldn’t have happened in a garbage collected language or, I guess, Rust. C++ was more than happy to compile, run, and crash. I wrote something like this (not production code, it was a mock implementation so I could more easily test):

struct Outer;
struct Inner { Outer& outer; };
struct Outer {
    Outer():_inner{&this} {}
    //...
    Inner _inner;
};
void func(const Outer& outer);
int main() {
    Outer outer;
    setThings(outer);
    func(outer);
}

All that was fine. But the setup I needed to do on the outer struct kept growing, a weekend came in between, and I even completely forgot I had a reference to Outer in there (the commented-out ellipsis was 10-20 lines of code). I thought it’d be much better if instead I returned one of these objects from a function, by value, and pass it to func:

Outer createOuter() {
    Outer outer;
    //several lines of setup on outer
    return outer;
}
int main() {
    func(createOuter); //moves happen in here
}

Oh look, it crashed. Not only that, but I’d forgotten to use std::move several times along the way. Once that was fixed, the program was still crashing, but (of course) in a different way. A better way, mind, it was dereferencing an null pointer. But crashing anyway.

If you haven’t seen the stupid mistake I made by now, I’ll tell you what took me 30min-1h to find out: the compiler-generated move constructor (the copy constructor was similarly flawed) copied the Outer pointer from the moved-to object and set the new inner to point to an object that was about to be destroyed… the solution is to write a boilerplate-heavy move contructor to make the new inner point to the new outer. Fun! The reason it worked before is that the object I was passing in was still alive and would continue to stay that way until far after func had returned.

When Microsoft’s tool comes out, I really want to try it on the code above and see what happens. Sure, it worked fine for std::unique_ptr in the demo, but let’s see what happens in real life. In the meanwhile, I wish I was writing in another language.

cppcon day 5 and wrap-up

There was a talk about the games industry and its low-latency needs. I like the fact that instead of assuming that RTTI and exception handling was unacceptable for games, that they’re working on measuring the impact and seeing how it goes in modern compiler implementations.

The biggest thing about the last day was Eric Niebler’s template talk. He acknowledged that his example was taken from the D wiki and it was quite interesting to see his C++ take on it. It looks really good and gets C++ a tad closer to how pleasant it is to write D code.

All in all, I had a blast and I hope to make it  back next year. Modern C++: now much more like D and Rust.

CppCon 2015 – Day 4

Titus Winters talked at length about how to write sustainable code. He preaching to the choir as far as I’m concerned, the issue is how to convince coworkers and managers that testing and fast builds are fundamental.

Chandler Carruth showed how to microbenchmark C++ code with perf, which is a tool I already use. A must-see for anyone who’s concerned with performance.

Andrei Alexandrescu did a C++ version of the work he’s done for the D allocators (using my laptop, no less), which was still interesting for me. I might actually get what they’re for now.

My favourite part of the day was the talk about tools for template programming in C++ by Abel Sinkovics. Apparently we now have a C++ compile-time debugger! Ever wondered what templates get instantiated and it what order? Now we can easily find out. I wonder if I can do something similar for D… (as if I wasn’t busy enough already).

CppCon 2015: Day two

So it turns out Herb Sutter’s talk about writing good and correct C++14 by default was an extension of Bjarne’s from the first day. We finally got to see the static analysis tool in real life, and as far the examples they chose go, it seems to work. Colour me impressed.

The really surprising thing was that there will be annotations for establishing lifetimes, just like Rust. Is this the Rustification of C++? Will it eat Rust’s lunch? Time will tell, I guess.

The other main talk was Gabriel dos Reis and the proposed module system for C++. As always, due to backwards compatibility things are… more complicated than I expected. Modules would be a huge deal for C++, but the way things are looking now it’s going to mean all sorts of new complications. I’ll have to play with it but that’ll mean using Visual Studio, which I’m not too keen on.

My Emacs talk was well received as well, which isn’t too bad.