Pholyglot is a PHP-to-PHP+C transpiler. The output is C code that’s also runnable by PHP, so called polyglot code.

Pholly is the PHP dialect that’s supported by Pholyglot (mostly a subset + some required annotations).

This blog post describes the features needed inside the Pholyglot compiler to complete the nbody benchmark from benchmarksgame.

Topics:

  • Polymorphic arrays
  • Class/struct base with “methods”
  • Loops
  • Some kind of generics for array_slice function

Arrays

The array is a very basic struct like so:

struct array {
    uintptr_t* thing;
    size_t length;
};

All it contains is a pointer to the actual array data and its length.

The struct makes it very easy to create a count macro that mirrors the PHP function:

#define count(x) x.length

It is possible to make a C macro to init an array similar as to PHP, but allocation would then only be on stack:

#define array(...) {__VA_ARGS__}

Maybe in some cases this can be used, but in general I’ll use a helper macro/function instead:

#define array_make(type, i, ...) {.thing = (type[]) array(__VA_ARGS__), .length = i}

(Note: This is still a stack allocation, it will have to be rewritten with a malloc etc.)

The corresponding PHP code just discards the first two arguments:

function array_make($type, $length, ...$values) { return $values; }

This makes array init a bit more akward but still C+PHP compatible:

#if __PHP__
define("int", "int");
#endif
#__C__ array
$arr = array_make(int, 3, 1, 2 3);

Arrays in PHP have value semantics. To avoid implementing this in C, I enforce all arrays to be passed by reference in Pholly.

I’m limiting myself to fixed-size arrays here. Linked list and hash tables will be a fun exercise for the future (hellooo SplDoublyLinkedList).

To get proper type-casting in C, and because I’m using a struct for arrays, I’m using a helper macro for array access instead of the built-in syntax:

#define array_get(type, arr, i) ((type*) arr.thing)[i]

Obviously this means a performance overhead when run as PHP, with a function call at each array access.

Class/struct

If you don’t do inheritance, a class is basically a bucket of data with some function pointers using “this” or “self” as first implicit argument. That’s what I go with here.

First of all, just macro the “class” keyword:

#define class struct

We’ll make all properties public and redefine the “public” keyword at each property to its proper type:

class Body {
#define public float
#define __prop_vx $__prop_vx
public $__prop_vx;
#undef public
};

The clunky __prop_vx is needed since PHP implies the dollar sign at property access and C of course does not.

Methods are not really needed for the nbody benchmark, but for completeness:

#__C__ void (*offsetMomentum) (Body $__self, float $px, float $py, float $pz); 

Function pointer struct members only in C.

Pass around $__self explicitly since C has no this concept.

The important part is that the method body is the same in both C and PHP. The function signature is duplicated, though.

#__C__ void Body__offsetMomentum (Body $__self, float $px, float $py, float $pz)
#if __PHP__
public function offsetMomentum(Body $__self, float $px, float $py, float $pz): void
#endif
{
    #__C__ float
    $pi = 3.1415926535897931;
    #__C__ float
    $solarmass = 4. * $pi * $pi;
    $__self->__prop_vx = (0. - $px) / $solarmass;
    $__self->__prop_vy = (0. - $py) / $solarmass;
    $__self->__prop_vz = (0. - $pz) / $solarmass;
}

Method calling is then polyglot, like so:

$b->offsetMomentum($b, $px, $py, $pz);

Fun fact, the new keyword in PHP can also be called with parenthesis and a string, which we’ll abuse for a new C macro:

#define new(x) x ## __constructor(malloc(sizeof(struct x)))

This assumes that a constructor function will exist, e.g. Body__constructor used to init function pointers.

Thanks to these solutions, we finally get code like:

#if __PHP__
define("Body", "Body");
#endif
#__C__ array
$bodies = array_make(Body, 2, new(Body), new(Body));

which is pretty readable, I’d say.

Looping

The PHP foreach loop can simply transpile down to a classic for-loop that runs in both PHP and C. Same goes for do-while.

foreach ($bodies as $body) { ... }

will transpile to:

#__C__ int
$i = 0;
for (; $i < count($bodies); $i = $i + 1) {
    #__C__ Body
    $body = array_get(Body, $bodies, $i);
}

Generics

I didn’t really add support for generics, just the needed internal parts to tell the compiler that array_slice expects the same type out as it gets in. Future development would adapt the @template T notation from Psalm and other tools.

Performance

Well, obviously compiled C will be faster than PHP in numerical calculations, that’s trivially true. Even more so when the polyglot PHP code has a couple of slowdowns, like the array_get access function. More interesting benchmarks would be with proper database and file IO, etc.

Code

Full code listing of the Pholly code

Full code listing of the transpiled PHP+C code

Future milestones

I’d like to do one of the following next:

  • A-star algorithm, testing dynamic memory allocation strategies
    • Especially interested in if per-variable memory allocation is feasible, like $body = /** @alloc stack */ new Body();, allowing programmers to opt-out of the default GC when needed. Odin has something similar.
  • A simple REST API call, using MySQL, reading a config file, perhaps curl

Stay tuned for the next version: pholyglot-0.0.-2-betachicken.