Tag Archives: parsing

libclang: not as great as I thought

I’ve been hearing about the delights of libclang for a while now. A compiler as a library, what a thought! Never-get-it-wrong again parsing/completion/whathaveyou! Amazing.

Then I tried using it.

If you’re parsing C (and maybe Objective C, but I wouldn’t know), then it’s great. It does what it says on the tin and then some, and all the information is at your fingertips. C++? Not so much.

libclang is the C API of the clang frontend. The “real” code is written in C++, but it’s unstable in the sense that there’s no API guarantees. The C API however is stable. It’s also the only option if you want to use the compiler as a library from a different language.

As I’ve found out, the only C++ entities that are exposed by libclang are the ones that the authors have needed, which leaves a lot to be desired. Do you want to get a list of a struct’s template parameters? You can get the number of them, and you can get a type template argument at a particular index after that. That sounds great, until you realise that some template arguments are values, and you can’t get those. At all. You can’t even tell if they’re values or not. I had to come up with a heuristic where I’d call clang_Type_getTemplateArgumentAsType and then use the type kind to determine if it’s a value or not (`CXType_Invalid` means a value, `CXType_Unexposed` means a non-specialised type argument and anything else is a specialised template type argument. Obviously.). Extracting the value? That involves going through the tokens of the struct and finding the ith token in the angle brackets. Sigh.

And this is because it’s a templated struct/class. Template functions don’t need any of this and are better supported because reasons.

Then there are bugs. Enum constants in template structs show up as 0 for no reason at all, and template argument naming is inconsistent:

template <typename> struct Struct;
template <typename T> struct Struct {
    Struct(const Struct& other) {}
};

See that copy constructor above? Its only parameter is, technically, of const Struct<T>& type. So you’d think libclang would tell you that the type’s spelling is T, but no, it’s type-parameter-0-0. Remove the first line with the struct declaration? Then the type template argument’s spelling is, as a normal person would have guessed, T. If the declaration names the type as `T` it also works as expected. I assume again it’s because reasons.

It’s bad enough that I’m not the only one to have encountered issues. There’s at least one library I know of written to get around libclang’s problems, but I can’t use it for my project because it’s written in C++.

I’m going to eventually have to submit patches to libclang myself, but I have no idea how the approval process over there is like.

Advertisement
Tagged , , ,