The why

Writing mocks and stubs and spys is a total PITA, and I’m looking for new ways to avoid it. This is one possible concept, with a couple of other benefits (and some drawbacks).

(Obviously I do not recommend 100% test coverage, this is just to prove a point. Your test coverage should be defined by your risk analysis.)

Intro

A pipeline1 is a certain design pattern to deal with processes where each output becomes the input for the next process step, like:

input -> f -> g -> h -> output

Many, many things are implicit pipelines in web development, so you’d think it’d be a more established pattern.

The middleware pattern2 is a pipeline of a sort, but its “big” design limits its applicability, especially when it comes to eradicating mock code in tests.

Also note that the pipeline design pattern is not the same thing as the pipe operator: |>3. The pipe operator is type-safe but cannot be configured the same way a pipe object can, as I will demonstrate below.

A pipeline class

All IO is put into invokable Effect classes. Also Read, Write, possibly Rand or even Exception. A database query would implement the read or write interface depending on if it’s a select or update/insert. Plugin events could be effects too, opening up for customization (and spaghetti code…).

Adapting

Example use-case

As an example, fetch data from a URL and process it:

URL -> fetch -> html2markdown -> first_paragraph

Or in PHP;

$result = pipe(             // pipe() is a helper function that returns a pipeline object
    new FileGetContents(),  // file_get_contents is wrapped in an invokable class
    htmlToMarkdown(...),    // Using the League\HTMLToMarkdown library
    firstText(...)          // Just a substring call
)
    ->from($url);           // ->from() defines the start value of the pipe
    ->run();                // Runs the pipeline

To test this piece of code, we need to mock out FileGetContents to return different test values instead. But, since replacing IO effects is supported by the pipeline class already, it’s enough for us to do:

$result = pipe(
    new FileGetContents(),
    htmlToMarkdown(...),
    firstText(...)
)
    // The magic part: all side-effects can be lifted out with a simple method call.
    ->replaceEffect('Query\Effects\FileGetContents', $dummyContent)
    ->from($dummyUrl)
    ->run();

Since functions will return pipelines without running them, the effects are deferred until the calling code runs it.

function getUrl(string $url): Pipeline  // Only thing missing here is the generic notation Pipeline<string>
{
    return pipe(
        new FileGetContents(),
        htmlToMarkdown(...),
        firstText(...)
    )->from($url);
}

Other benefits

Some natural benefits occur when structuring your code as pipelines:

  1. You can easily cache side-effects using a Cache effect class
  2. You can easily fork when your input is a bigger array

The following code forks into two processes4 and also caches the result from FileGetContents:

$result = pipe(
    // The file read effect is wrapped in a cache effect
    new Cache(new FileGetContents()),
    htmlToMarkdown(...),
    firstText(...)
)
    // At-your-fingertips parallelism
    ->fork(2)
    // The Cache effect class uses the injected cache
    ->setCache($cache)
    // Using map() here; foreach() or fold() are other possible iterations
    ->map($urls);

The same technique can be used to replace the cache effect as above, using replaceEffect.

The pipeline can recursively run pipelines returned by any of the processing steps. In this way, a computer program is structured like a tree of read-process-write pipelines5, and nothing gets executed until the top layer calls run; you separate - at least to a higher degree - what to do from how to do it6.

Drawbacks

There are some drawbacks with this approach of course.

  • Performance might take a hit if you replace normal function calls with invokable classes. A compiled language might deal with this better than PHP.
  • Type-safety is recuced. Instead of compile time errors for passing the wrong argument around, the pipeline will throw runtime exceptions.
  • The lack of generic notation in PHP obfuscates the return values of functions. string becomes Pipeline as a return type, but what we want is Pipeline<string>.
  • Implicit glue also means the pipeline payload is implicit, which can obfuscate the code a bit. Compare with Forth and its implicit stacks.

Going forward

It would be nice with a programming language that has native support for pipelines and effects. I was considering if Forth could be transpiled to PHP. or perhaps used as a meta-language to generate PHP code, assuming certain design principles. A Phorth, if you will.

Other resources

I made a simple application to query Google from command-line that uses this pipeline-oriented design: https://github.com/olleharstedt/query/blob/main/query.php

Footnotes