Pholyglot is a transpiler that compiles a subset of PHP into PHP-and-C compatible code, so called polyglot code.

This blog post describes the new features added into version 0.2 of the Pholyglot transpiler:

  • Two different memory allocation strategies
  • A memory-polymorph linked list

Memory allocation

One of the reason I started this project was to experiment with an opt-out-of-GC kind of system. In Pholyglot, the Boehm GC will be used as default, but you can now also choose to use an arena. The interaction between these two memory systems in the same program is not safe yet, but the idea is to add alias control and escape analysis to enforce a clear separation.

Memory-polymorphism

Not sure if this is an established word, but what it means in Pholyglot is that you can tell an object to use the same allocation strategy as another object, without knowing exactly which strategy was used.

This example adds a new Point to a list of points, using the same memory-strategy as the list, using a new type of annotation @alloc:

/**
 * @param SplDoublyLinkedList<Point> $list
 */
function addPointToList(SplDoublyLinkedList $list): void
{
    // Use same memory allocation strategy as $list
    $p = /** @alloc $list */ new Point();
    $list->push($p);
}

Obviously, at a later stage, $list->push($p) must be type-checked so that two different memory strategies aren’t being used in the same collection.

The above snippet compiles to this1 (and yes, this is valid vanilla PHP):

#define function void
function addPointToList(SplDoublyLinkedList $list)
#undef function
{
    #__C__ Point
    $p = new(Point
        #__C__, $list->mem
    );
    $list->push(
        #__C__ $list,
        $p
    );
}

where new is a macro taking two arguments: the object and a memory allocation strategy struct:

#define new(x, m) x ## __constructor((x) m.alloc(m.arena, sizeof(struct x)), m)

m is defined as:

struct mem {
    uintptr_t* (*alloc) (void* a, size_t size);
    void* arena;
};

Meaning, it contains a pointer to an allocation function (currently to either the Boehm GC alloc or arena alloc), and a pointer to the arena (not used for Boehm).

I hope this makes sense. :)

Other possible memory strategies could be unsafe that just mallocs and never frees (possibly useful for global variables); malloc that mallocs and is not allowed to escape scope (because it’s freed at end of scope); or stack that allocs on the stack instead of heap, and is also not allowed to escape. I’ve written more about my thoughts here.

A full example with points and lists can be found in this gist.

Notes

  1. The #__C__ word is removed before compiling with gcc using sed