PlanOut: A language for online experiments

December 12, 2014 by Eytan Bakshy


Earlier this year, we released a reference implementation of PlanOut to provide a detailed blueprint for how we design and deploy complex online experiments at Facebook and make it easy for first-time experimenters to get up and running. This framework was designed from scratch so that any PlanOut experiment could be written entirely in Python.

When we first started developing PlanOut, we intended for it to be a programming language optimized for designing experiments. Instead of writing error-prone, difficult-to-test hashing procedures in PHP, we could instead express our designs in terms of a language that is natural enough to work with on a whiteboard or include in an academic paper. This language would then be compiled down into a serialized format that could later be executed by an interpreter.

Today, we are officially announcing the release of PlanOut 0.5, which includes a React-based PlanOut language editor, and brings the interpreter into feature-parity with the latest version of PlanOut we use internally at Facebook. This includes tighter integration with the Experiment classes, a safer runtime environment, improved support for testing experiments, the ability to programmatically disable logging, and a rewrite of the interpreter that makes it easier to port to other languages, including strongly-typed languages.

Why a new language?

There are many advantages of having a language and interpreter for online experiments. These aspects include:

Bridging the gap between ideation and implementation. By having our own language, we can have a sort of lingua franca for describing experiments that is accessible not only to engineers, but also to researchers, data scientists, and managers. In fact, this language is so parsimonious that many of us on the Facebook Data Science team often whiteboard out experimental designs in PlanOut code during brainstorming sessions and meetings.

Safe execution. Representing experiments in terms of a language with a limited number of operations makes it much less likely that one makes a mistake. PlanOut has a very primitive syntax, no computationally expensive built-in functions, nor does not allow developers to define their own functions (at least, within the script itself), use loops, or make arbitrary calls to functions that could behave in an erratic way. In this way, PlanOut scripts are safe, and when they do have problems, there are a limited number of ways things could go wrong. And because PlanOut is so basic, the interpreter is easy to port to other languages.

Serializable. The real power of the PlanOut language is that it can be compiled into a JSON-representation that can read by a simple, portable interpreter. This part is key to scaling experimentation.

For example, the following PlanOut language experiment,

button_text = uniformChoice(choices=['Purchase', 'Buy'], unit=userid);
has_discount = bernoulliTrial(p=0.3, unit=userid);

can be compiled down to the following JSON blob:

{
  "op": "seq",
  "seq": [
    {
      "op": "set",
      "var": "button_text",
      "value": {
        "choices": {
          "op": "array",
          "values": [
            "Purchase",
            "Buy"
          ]
        },
        "unit": {
          "op": "get",
          "var": "userid"
        },
        "op": "uniformChoice"
      }
    },
    {
      "op": "set",
      "var": "has_discount",
      "value": {
        "p": 0.3,
        "unit": {
          "op": "get",
          "var": "userid"
        },
        "op": "bernoulliTrial"
      }
    }
  ]
}

This code can then be stored in a centralized system that manages simultaneous and follow-on experiments (e.g., via namespaces), and executed on multiple platforms.

Developer friendly. Serialized PlanOut code is easy to parse, making it possible to infer which variables are inputs, and which parameters rely on external services. This makes it possible to construct highly interactive editors that know about how you’d like to test out your code. And because PlanOut runs under an interpreter, it is possible to “freeze” (override) certain parameter values to do testing. This allows you to test complex experimental designs without having to enter in random userids until you find the right combination that triggers the designed logic.

PlanOut Editor

PlanOut 0.5 includes a PlanOut editor, built on React and Flux. The PlanOut editor lets you interactively write PlanOut scripts and immediately see what your experiment is doing. As you type, you fill find that undefined variables appear as inputs in the tester panel on the right hand side. Developers can modify the values of these input variables within a Playground, and immediately see how these units get mapped to parameters. Playgrounds also give you the ability to test your code through the use of overrides, which “freeze” parameters so that they do not change during the execution of your script.

PlanOut Editor

The screenshot above shows a hypothetical experiment that assigns sourceids to a random number between 0.0 and 1.0, and stores that value in the parameter, prob_collapse. This parameter is then read in by the next line, does a Bernoulli trial which sets collapse to 1 with probability prob_collapse, and 0 otherwise. By setting prob_collapse in the override to a high value like 0.9, the developer can see for herself that inputs are more likely to map to a collapse value of 1.

Finally, because PlanOut scripts can be used to launch live experiments, we have built in the ability to add unit tests. These work by letting users enter inputs, overrides, and list assertions about what parameters are expected to be assigned to what values. This ensures that the experiment executes as expected, and that any changes to an experiments’ definition do not result in breaking the expected randomization. Because overrides maintain their value throughout the execution, they can act as a sort of mock in cases where your experiment depends on a custom operator that interfaces with an external service.

The PlanOut editor is available as part of the PlanOut Github repository, and we encourage developers to fork it or contribute changes. Ports of PlanOut to Go, PHP, and Hack will be available in early 2015. If you are interested in porting PlanOut to other languages, please feel free to reach out to us.

Big experiments: Big data’s friend for making decisions

April 3, 2014 by Eytan Bakshy, Dean Eckles, Michael S. Bernstein


When people think about the tools of data science, they often focus on machine learning, statistics, and data manipulation. Modeling massive datasets is indispensable for making predictions – like predicting which set of News Feed stories or search results are most relevant to people. But such models also have limitations in terms of their ability to help with understanding cause-and-effect relationships that lead to building better products and to advancing behavioral sciences.

On the Data Science Team, part of our job is to inform strategic and product decisions. Does a new feature we are testing improve communication? Does having more friends on Facebook increase the value people get out of the service? While a correlation between variables may suggest a particular causal relationship, it is hard to use such data to credibly answer many of these questions because of difficult-to-adjust-for confounding factors. Furthermore, when you change the rules of the game – like launching a completely new feature, it’s often impossible to have any data at all to anticipate the effects of a future change.

Because of this, data scientists, engineers, and managers turn to randomized experiments, which are commonly referred to as “A/B tests”. Typically A/B tests are used as a kind of “bake-off” between proposed alternatives. But experiments can also go beyond bake offs and be used to develop generalizable knowledge that is valuable throughout the design process.

Data science needs better tools for running experiments. Despite the abundance of experimental practices in the Internet industry, there are few tools or standard practices for running online field experiments. And existing tools tend to focus on rolling out new features, or automatically optimizing some outcome of interest.

At Facebook, we run over a thousand experiments each day. While many of these experiments are designed to optimize specific outcomes, others aim to inform long-term design decisions. And because we run so many experiments, we need reliable ways of routinizing experimentation. As Ronald Fisher, a pioneer in statistics and experimental design said, “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.” Many online experiments are implemented by engineers who are not trained statisticians. While experiments are often simple to analyze when done correctly, it can be surprisingly easy to make mistakes in their design, implementation, logging, and analysis. One way to consult a statistician in advance is to have their advice built into tools for running experiments.

PlanOut: a framework for running online field experiments

Good tools not only enable good practices, they encourage them. That’s why we created PlanOut, a set of tools for running online field experiments, and are sharing an open source version of it as part of the Data Science Team’s first software release.

Importantly, PlanOut gives engineers and scientists a language for defining random assignment procedures. Experiments, ranging from simple A/B tests, to factorial designs that decompose large interface changes, to more complex within-subjects designs, can be expressed with only a few lines of code. In this way, PlanOut encourages running experiments that are more akin to the kind you see in the behavioral sciences.

Logging can also be a pain-point for many experiments. Logging is often considered separate from the randomization process, which can make it difficult to keep track or even define exactly which units (e.g., user accounts) are in your experiments. This is especially problematic if engineers change experiments by adding or removing additional treatments mid-way through the experiment.  PlanOut helps reduce these kinds of errors by automatically logging exposures, and by providing a way to tie outcomes to experiments. Finally, this kind of exposure logging increases the precision of experiments, which lowers the risk of coming across a false negatives (“Type II errors”).

A single experiment is rarely definitive. Instead, follow-on experiments might need to be run as development of a new product takes place, or as decision-makers evaluate the effects of a major product rollout. While it is fairly straightforward to run a single experiment, there are few established protocols for how follow-on experiments should be run, and how data for these experiments should be logged.  PlanOut includes a management system that organizes experiments, and the parameters they manipulate, into namespaces. This allows distributed teams to work on related features, and launch follow-on experiments in a way that minimizes threats to validity.

Open sourcing the code.

The code for PlanOut is on Github. We welcome developers and researchers in the open source community who are interested in contributing directly to the original repository, or by forking it.

You can read more about PlanOut in the paper we will be presenting at this year’s ACM International Conference on the World Wide Web (WWW 2014): Designing and Deploying Online Field Experiments/ Eytan Bakshy, Dean Eckles, Michael Bernstein. Proceedings of WWW 2014.