Python | Átila on Code

2020/02/26

Seriously, just use D to call C from Python

Last week I suggested that if you want to call C code from Python that you should use D. I still think that 4 lines of code (two of which are header includes) and a build system is hard to beat if that indeed is your goal. The internet, of course, had different ideas.

I think most of the comments I got can be summarised as:

Why not just ctypes?
Why not just cffi?
I’ve used Cython / pybind11 / boost::python

In all cases, the general sentiment was that these were easy options to get the task done. Either they didn’t read the blog post, or they’re nowhere near as lazy as I am.

I described how writing basically no code gave you access to nanomsg from Python, including macros. No glue code, no struct definitions, nothing. “I can haz C headers in Python?” and some boring build system things were the long and short of it. 4 lines of code that scale with O(1) is hard to beat.

Let’s look at the suggestions then, starting with ctypes. It can call symbols in loaded shared libraries, which is great if you’re calling int foo(int, int) but not so great for real code (like, you know, nanomsg). No enums, no macros, and you have to declare C structs yourself. I shudder to think of what I’d have to write to get anything done. Next.

Some acknowledged that ctypes wasn’t all that, and that cffi should be used instead. I just tried using it today, thinking that it’d be amazing if I could just give Python some C headers and magic happened. But I can’t. I can pass in C declarations as a Python string, which I can manually copy from a header. But not just tell it what header I want (or worse, two in the case of my nanomsg example). I tried reading the headers myself, concatenating the strings and giving that to cdef, but that doesn’t work, because:

Real headers have pesky things like #include directives in themcausing ParseErrors to be raised. Oops. So much for valid C syntax.
Struct definitions usually have members that are structs defined in another header, which have members that are structs defined in another header, which…
Macros

I soon realised how much work calling nanomsg with cffi would be and gave up. I wanted to show how much more code it takes, but as previously mentioned I’m lazy so no. Just… no. I figured someone would have written a generator for cffi, and I was right. It requires installing something called Irkit. Turns out it’s a parser generator, so cffi-gen doesn’t use a industrial grade C compiler. Good luck compiling C code in the wild with that.

I’ve used Cython before and it’s even more work than cffi. Cython is the reason that I wrote autowrap in the first place.

Both boost::python and pybind11 are to C++ what pyd is to D. And pyd is another reason I wrote autowrap, because it has the gall to require users to specify one by one all of the things (functions, classes, etc.) they want to make available. With D’s compile-time reflection, I find that requirement quite insulting.

Last week I presented a way of calling C from Python which I find incredibly easy. I got told there were viable alternatives, and since I didn’t know any better at the time I stroked my beard and thought “iiiinteresting”.

This week I can confidently state my belief that the easiest way to call C code from Python is D. Disagree? Show me how you can call nanomsg from Python your way. Github or it didn’t happen.

Tagged c++, c++11, cffi, ctypes, cython, D, ffi, Python

2020/02/19

17 Comments

Programming

Want to call C from Python? Use D!

In my last blog post I wrote about the power of D’s compile-time reflection and string mixins, showing how they could be used to call D from Python so easily it might as well be magic. As amazing as that may be for those of us who have D codebases we want to expose to Python users, this doesn’t help the vastly more numerous programmers who want to call pre-existing C code instead. If C had D’s metaprogramming abilities, imagine seamlessly calling into nanomsg with as much as ease as I showed in my previous blog post. Well… about that.

D can easily interoperate with C, with the only requirement being that the function and data structure declarations be translated into D syntax. But once the translation is done, those declarations are now D code that can be reflected on, fed to autowrap, and automagically wrapped for Python consumption. That would be a pretty powerful combo if not for the boring work of translating all needed declarations, macros included. It’s still a lot easier than talking to the Python C API itself of course, but maybe not quite killer feature material.

However, I wrote a little project called dpp because I’m lazy and don’t want to hand-translate C to D. Envious of C++’s and Objective C’s credible claim to be the only languages that can seamlessly interoperate with C (due to header inclusion and compatible syntax), I tried to replicate the experience in the D world. Using dpp, one can #include C headers in what would otherwise be D code and use it as one would in C++, even going to the point of supporting preprocessor macros. I wrote about the project in a different blog post.

Given this .dpp file:

// nanomsg.dpp
#include "nanomsg/nn.h"
#include "nanomsg/pipeline.h"

And this .d file:

import autowrap;
mixin(
    wrapDlang!(
        LibraryName("nanomsg"), // name of the .so
        Modules(Yes.alwaysExport, "nanomsg") // name of the D module
    )
);

When we build both of those files above into nanomsg.so, we get to write this Python code that actually sends packets:

from nanomsg import (nn_socket, nn_close, nn_bind, nn_connect,
                     nn_send, nn_recv, AF_SP, NN_PUSH, NN_PULL)
import time

uri = "inproc://test"

pull = nn_socket(AF_SP, NN_PULL)
nn_bind(pull, uri)
time.sleep(0.05)  # give it time to set up (awful I know, but meh)

push = nn_socket(AF_SP, NN_PUSH)
nn_connect(push, uri)
msg = b'abc'
nn_send(push, msg, len(msg), 0)

Python, welcome to C, via D, and without even having to write any code to do it. Did I mention that AF_SP, NN_PUSH, and NN_PULL are all C macros? And yet, look at Python importing and using them like a boss.

Want to try it yourself? It’s on github.

If you want to call C from Python, use D.

Tagged c++, clang, D, gcc, libclang, Python, reflection

2020/01/22

2 Comments

Programming

The power of reflection

When I was at CppCon 2016 I overheard someone ask “Everyone keeps talking about reflection, but why do we actually need it?”. A few years before that, I also would have had difficulty understanding why it would be useful. After years of writing D, it’s hard to imagine life without it.

Serialisation is an obvious use-case and almost always the first one anyone comes up with if pressed for an example. But there’s a lot more, like Design by Instrospection. It allows one to write a mocking framework. You start seeing applications everywhere. My favourite way to use it is to make the compiler write code for me.

Let’s say one wants to write a Python extension in native code. Each top-level Python function must have a corresponding C function (well, C ABI at least) that looks something like this:

PyObject* myfunc(PyObject* self, PyObject* args, PyObject* kwargs) {
    // ...
    return result;
}

There are a lot of details to take care of. There’s error handling, managing ref counts, and in all likelihood conversion from and to Python types since in most cases one is usually interested in calling existing pre-written code and make it available to Python. It’s tedious, and I haven’t even shown all the boilerplate to initialise the Python module and register the functions. The code for two simple functions ends up looking like this. Just thinking of clicking that link makes me sigh. Imagine what making calls into a real codebase would look like. We can do better:

import autowrap;
mixin(
    wrapDlang!(
        LibraryName("mylib"),
        Modules(
            Module("mymodule"),
            Module("myothermodule"),
        )
    )
);

The code above, when compiled, will generate a Python extension (shared library) that exposes every D function marked as “export” in the modules “mymodule” and “myothermodule” as Python functions. It’ll even convert their names from camelCase to snake_case. Any D exceptions thrown will become Python exceptions. D structs and classes become Python classses. If the original D functions take a D string, you’ll be able to pass Python strings to them in user code. Modulo bugs, this… works! The code shown above is the only code that needs to be written. Setting up the build system takes more work!

“Only” two D features are used here: the ability to do reflection at compile-time (and therefore to know which functions are in those modules and what types they take and return), and being able to mix in strings at compile-time. All the boilerplate is written for the user and inserted inline as if written by hand, but it’s the compiler that’s doing the heavy lifting.

Imagine now that your boss, pleased with these results, now wants you to also make the same D code avaiable to Excel users. The code changes not one bit, those lines above also work for Excel (the trick is telling the build system to depend on the autowrap:excel dub package instead of autowrap:python). Instead of snake_case functions, one now gets PascalCase as per Excel convention.

Same API, same functionality, different implementation. And no code to write for the user. The curious can see how it’s done on github.

Tagged automagically, c++, coding, Python, reflection

2019/04/24

4 Comments

Programming

Type inference debate: a C++ culture phenomenon?

I read two C++ subreddit threads today on using the auto keyword. They’re both questions: the first one asks why certain people seem to dislike using type inference, while the second asks about what commonly taught guidelines should be considered bad practice. A few replies there mention auto.

This confuses me for more than one reason. The first is that I’m a big proponent of type inference. Having to work to tell a computer what it already knows is one of my pet peeves. I also believe that wanting to know the exact type of a variable is a, for lack of a better term, “development smell”, especially in typed languages with generics. I think that the possible operations on a type are what matters, and figuring out the exact type if needed is a tooling problem that is solved almost everywhere. If you don’t at least have “jump to definition” in your development environment, you should.

The other main source of confusion for me is why this debate seems restricted to the C++ community, at least according to my anectodal evidence. As far as I’m aware of, type inference is taken to be a good thing almost universally in Go, D, Rust, Haskell, et al. I don’t have statistics on this at all, but from what I’ve seen so far it seems to be the case. Why is that?

One reason might be change. As I learned reading Peopleware, humans aren’t too fond of it. C++ lacked type inference until C++11, and whoever learned it then might not like using auto now; it’s “weird”, so it must be bad. I’ve heard C programmers who “like” looping over a collection using an integer index and didn’t understand when I commented on their Python code asking them to please not do that. “Like” here means “this is what I’m used to”. Berating people for disliking change is definitely not the way forward, however. One must take their feelings into account if one wants to effect change. Another lesson learned from Peopleware: the problem is hardly ever technical.

Otherwise, I’m not sure. I’m constantly surprised by what other people find to be more or less readable when it comes to code. There are vocal opponents to operator overloading, for instance. I have no idea why.

What do you think? Where do you stand on auto?

Tagged c++, c++11, D, dlang, Go, golang, haskell, Python, rust, type inference

2017/10/24

1 Comment

Programming

Operator overloading is a good thing (TM)

Brains are weird things. I used to be a private maths tutor, and I always found it amazing how a little change in notation could sometimes manage to completely confuse a student. Notation itself seems to me to be a major impediment for the majority of people to like or be good at maths. I had fun sometimes replacing the x in an equation with a drawing of an apple to try and get the point across that the actual name (or shape!) of a variable didn’t matter, that it was just standing in for something else.

Programmers are more often than not mathematically inclined, and yet a similar phenomenon seems to occur with the “shape” of certain functions, i.e. operators. For reasons that make us much sense to me as x confusing maths students, the fact that a function has a name that has non-alphanumeric characters in them make them particularly weird. So weird that programmers shouldn’t be allowed to defined functions with those names, only the language designers. That’s always a problem for me – languages that don’t give you the same power as the designers are Blub as far as I’m concerned. But every now and again I see a blost post touting the advantages of some language or other, listing the lack of operator overloading as a bonus.

I don’t even understand the common arguments against operator overloading. One is that somehow “a + b” is now confusing, because it’s not clear what the code does. How is that different from having to read the documentation/implementation of “a.add(b)”? If it’s C++ and “a + b” shows up, anyone who doesn’t read it as “a.operator+(b)” or “operator+(a, b)” with built-in implementations of operator+ for integers and floating point numbers needs to brush up on their C++. And then there’s the fact that that particular operator is overloaded anyway, even in C – the compiler emits different instructions for floats and integers, and its behaviour even depends on the signedness of ints.

Then there’s the complaint that one could make operator+ do something stupid like subtract. Because, you know, this is totally impossible:

int add(int i, int j) {
    return i - j;}

Some would say that operator overloading is limited in applicability since only numerical objects and matrices really need them. But used with care, it might just make sense:

auto path = "foo" / "bar" / "baz";

Or in the C++ ranges by Eric Niebler:

using namespace ranges;
int sum = accumulate(view::ints(1)
                   | view::transform([](int i){return i*i;})
                   | view::take(10), 0);

I’d say both of those previous examples are not only readable, but more readable due to use of operator overloading. As I’ve learned however, readability is in the eye of the beholder.

All in all, it confuses me when I hear/read that lacking operator overloading makes a language simpler. It’s just allowing functions to have “special” names and special syntax to call them (or in Haskell, not even that). Why would the names of functions make code so hard to read for some people? I guess you’d have to ask my old maths students.

Tagged c++, coding, Go, Programming, Python

2017/08/08

4 Comments

Programming

API clarity with types

API design is hard. Really hard. It’s one of the reasons I like TDD – it forces you to use the API as a regular client and it usually comes out all the better for it. At a previous job we’d design APIs as C headers, review them without implementation and call it done. Not one of those didn’t have to change as soon as we tried implementing them.

The Win32 API is rife with examples of what not to do: functions with 12 parameters aren’t uncommon. Another API no-no is several parameters of the same type – which means which? This is ok:

auto p = Point(2, 3);

It’s obvious that 2 is the x coordinate and 3 is y. But:

foo("foo", "bar", "baz", "quux", true);

Sure, the actual strings passed don’t help – but what does true mean in this context? Languages like Python get around this by naming arguments at the call site, but that’s not a feature of most curly brace/semicolon languages.

I semi-recently forked and extended the D wrapper for nanomsg. The original C API copies the Berkely sockets API, for reasons I don’t quite understand. That means that a socket must be created, then bound or connect to another socket. In an OOP-ish language we’d like to just have a contructor deal with that for us. Unfortunately, there’s no way to disambiguate if we want to connect to an address or bind to it – in both cases a string is passed. My first attempt was to follow in Java’s footsteps and use static methods for creation (simplified for the blog post):

struct NanoSocket {
    static NanoSocket createBound(string uri) { /* ... */ }
    static NanoSocket createConnected(string uri) { /* ... */ }
    private this() { /* ... */ } // constructor
}

I never did feel comfortable: object creation shouldn’t look *weird*. But I think Haskell has forever changed by brain, so types to the rescue:

struct NanoSocket {
    this(ConnectTo connectTo) { /* ... */ }
    this(BindTo bindTo) { /* ... */ }
}

struct ConnectTo {
    string uri;
}

struct BindTo {
    string uri;
}

I encountered something similar when I implemented a method on NanoSocket called trySend. It takes two durations: a total time to try for, and an interval to wait to try again. Most people would write it like so:

void trySend(ubyte[] data, 
             Duration totalDuration, 
             Duration retryDuration);

At the call site clients might get confused about which order the durations are in. I think this is much better, since there’s no way to get it wrong:

void trySend(ubyte[] data, 
             TotalDuration totalDuration, 
             RetryDuration retryDuration);

struct TotalDuration {
    Duration duration;
}

struct RetryDuration {
    Duration duration;
}

What do you think?

Tagged api, coding, design, dlang, Java, nanomsg, networking, Programming, Python

2014/10/16

4 Comments

Programming, Uncategorized

Computer languages: ordering my favourites

This isn’t even remotely supposed to be based on facts, evidence, benchmarks or anything like that. You could even disagree with what are “scripting languages” or not. All of the below just reflect my personal preferences. In any case, here’s my list of favourite computer languages, divided into two categories: scripting and, err… I guess “not scripting”.

My favourite scripting languages, in order:

Python
Ruby
Emacs Lisp
Lua
Powershell
Perl
bash/zsh
m4
Microsoft batch files
Tcl

I haven’t written enough Ruby yet to really know. I suspect I’d like it more than Python but at the moment I just don’t have enough experience with it to know its warts. Even I’m surprised there’s something below Perl here but Tcl really is that bad. If you’re wondering where PHP is, well I don’t know because I’ve never written any but from what I’ve seen and heard I’d expect it to be (in my opinion of course) better than Tcl and worse than Perl. I’m surprised how high Perl is given my extreme dislike for it. When I started thinking about it I realised there’s far far worse.

My favourite non-scripting languages, in order:

D
C++
Haskell
Common Lisp
Rust
Java
Go
Objective C
C
Pascal
Fortran
Basic / Visual Basic

I’ve never used Scheme, if that explains where Common Lisp is. I’m still learning Haskell so not too sure there. As for Rust, I’ve never written a line of code in it and yet I think I can confidently place it in the list, especially with respect to Go. It might place higher than C++ but I don’t know yet.

Tagged c++, D, dlang, Fortran, golang, haskell, Java, Lisp, Pascal, Perl, Python, Ruby, rust, rustlang

	Tabitha Levine on The main function should be…
	Project Highlight: D… on The joys of translating C++…
	ricardo on Seriously, just use D to call…
	Daniel Stracaboško on Type inference debate: a C++ c…
	Martyn S on Boden, Flutter, React Nat…

Átila on Code

Code, testing and systems programming

Tag Archives: Python

Seriously, just use D to call C from Python

Want to call C from Python? Use D!

The power of reflection

Type inference debate: a C++ culture phenomenon?

Operator overloading is a good thing (TM)

API clarity with types

Computer languages: ordering my favourites