Tag Archives: libclang

Want to call C from Python? Use D!

In my last blog post I wrote about the power of D’s compile-time reflection and string mixins, showing how they could be used to call D from Python so easily it might as well be magic. As amazing as that may be for those of us who have D codebases we want to expose to Python users, this doesn’t help the vastly more numerous programmers who want to call pre-existing C code instead. If C had D’s metaprogramming abilities, imagine seamlessly calling into nanomsg with as much as ease as I showed in my previous blog post. Well… about that.

D can easily interoperate with C, with the only requirement being that the function and data structure declarations be translated into D syntax. But once the translation is done, those declarations are now D code that can be reflected on, fed to autowrap, and automagically wrapped for Python consumption. That would be a pretty powerful combo if not for the boring work of translating all needed declarations, macros included. It’s still a lot easier than talking to the Python C API itself of course, but maybe not quite killer feature material.

However, I wrote a little project called dpp because I’m lazy and don’t want to hand-translate C to D. Envious of C++’s and Objective C’s credible claim to be the only languages that can seamlessly interoperate with C (due to header inclusion and compatible syntax), I tried to replicate the experience in the D world. Using dpp, one can #include C headers in what would otherwise be D code and use it as one would in C++, even going to the point of supporting preprocessor macros. I wrote about the project in a different blog post.

Given this .dpp file:

// nanomsg.dpp
#include "nanomsg/nn.h"
#include "nanomsg/pipeline.h"

And this .d file:

import autowrap;
        LibraryName("nanomsg"), // name of the .so
        Modules(Yes.alwaysExport, "nanomsg") // name of the D module

When we build both of those files above into nanomsg.so, we get to write this Python code that actually sends packets:

from nanomsg import (nn_socket, nn_close, nn_bind, nn_connect,
                     nn_send, nn_recv, AF_SP, NN_PUSH, NN_PULL)
import time

uri = "inproc://test"

pull = nn_socket(AF_SP, NN_PULL)
nn_bind(pull, uri)
time.sleep(0.05)  # give it time to set up (awful I know, but meh)

push = nn_socket(AF_SP, NN_PUSH)
nn_connect(push, uri)
msg = b'abc'
nn_send(push, msg, len(msg), 0)

Python, welcome to C, via D, and without even having to write any code to do it. Did I mention that AF_SP, NN_PUSH, and NN_PULL are all C macros? And yet, look at Python importing and using them like a boss.

Want to try it yourself? It’s on github.

If you want to call C from Python, use D.

Tagged , , , , , ,

Unit Testing? Do As I Say, Don’t Do As I Do

I’m a firm believer in unit testing. I’ve done more tech talks on the subject than I’d care to count, and always tell audiences the same thing: prefer unit tests, here’s a picture of the testing pyramid, keep unit tests pure (no side-effects), avoid end-to-end tests (they’re flaky, people will stop paying attention to red builds since all builds will be red). I tell them about adapters, ports and hexagonal architecture. But when it comes to using libclang to parse and translate C and C++ headers, I end up punting and writing a lot of integration tests instead. Hmm.

I know why people write tests with side-effects, and why they end up writing integration and end-to-end ones instead of the nice pure unit test happy place I advocate. It’s easier. There’s less thinking involved. A lot less. However, taking the easy path has always come back to bite me. Those kinds of tests take longer. They higher up the test pyramid you go, the flakier they get. TCP ports stay open longer than a tester would like, for instance. The network goes down. All sorts of things.

I understand why I wrote integration tests instead of unit tests when interfacing with libclang too. Like it is for everyone else, it was just easier. I failed to come up with a plan to unit test what I was doing. It didn’t help that I’d never used libclang and had no idea what the API looked like or what it allowed me to do. It also doesn’t help that libclang doesn’t have an option to take a string to the code to parse and instead takes a file name, but I can work around that.

Because of this, the dpp codebase currently suffers from that lack of separation of concerns. Code that translates C/C++ to D is now intimately tied to libclang and its quirks. If I ever try to use something other than libclang, I won’t be able to. All of the bad things I caution everybody else about? I guaranteed they happened in one of my newest projects.

Before the code collapses under its own complexity, I’ve decided to do what I should’ve done all along and am rewriting dpp so it uses layers to get away from the libclang mess. I’m still figuring it all out, but the main idea is to have a transformation layer between libclang and my code that takes its data types and converts them to a new set of AST types that are my own. From then on it should be trivial to unit test the translation of those AST types that represent C or C++ code into D. Funnily enough, the fact that I wrote so many integration tests will keep me honest since all of those old tests will still have to pass. I’m not sure how I feel about that.

I might do another blog post covering how I ended up porting a codebase with pretty much only integration tests to the unit variety. It might be of interest to anyone maintaining a legacy codebase (i.e. all of us).

Tagged , , , , , ,

libclang: not as great as I thought

I’ve been hearing about the delights of libclang for a while now. A compiler as a library, what a thought! Never-get-it-wrong again parsing/completion/whathaveyou! Amazing.

Then I tried using it.

If you’re parsing C (and maybe Objective C, but I wouldn’t know), then it’s great. It does what it says on the tin and then some, and all the information is at your fingertips. C++? Not so much.

libclang is the C API of the clang frontend. The “real” code is written in C++, but it’s unstable in the sense that there’s no API guarantees. The C API however is stable. It’s also the only option if you want to use the compiler as a library from a different language.

As I’ve found out, the only C++ entities that are exposed by libclang are the ones that the authors have needed, which leaves a lot to be desired. Do you want to get a list of a struct’s template parameters? You can get the number of them, and you can get a type template argument at a particular index after that. That sounds great, until you realise that some template arguments are values, and you can’t get those. At all. You can’t even tell if they’re values or not. I had to come up with a heuristic where I’d call clang_Type_getTemplateArgumentAsType and then use the type kind to determine if it’s a value or not (`CXType_Invalid` means a value, `CXType_Unexposed` means a non-specialised type argument and anything else is a specialised template type argument. Obviously.). Extracting the value? That involves going through the tokens of the struct and finding the ith token in the angle brackets. Sigh.

And this is because it’s a templated struct/class. Template functions don’t need any of this and are better supported because reasons.

Then there are bugs. Enum constants in template structs show up as 0 for no reason at all, and template argument naming is inconsistent:

template <typename> struct Struct;
template <typename T> struct Struct {
    Struct(const Struct& other) {}

See that copy constructor above? Its only parameter is, technically, of const Struct<T>& type. So you’d think libclang would tell you that the type’s spelling is T, but no, it’s type-parameter-0-0. Remove the first line with the struct declaration? Then the type template argument’s spelling is, as a normal person would have guessed, T. If the declaration names the type as `T` it also works as expected. I assume again it’s because reasons.

It’s bad enough that I’m not the only one to have encountered issues. There’s at least one library I know of written to get around libclang’s problems, but I can’t use it for my project because it’s written in C++.

I’m going to eventually have to submit patches to libclang myself, but I have no idea how the approval process over there is like.

Tagged , , ,