Quasilinear Research is pioneering a new way of programming.

By writing code for a machine that inherently knows how to make intelligent choices, powerful artificial intelligence becomes available in everyday programming languages.

We provide a probabilistic extension to the RISC-V Instruction Set Architecture that offers new machine instructions for expressing nondeterministic/probabilistic choices under constraints/observations. Software programs exploiting these new instructions can express complex search and inference problems without committing to a particular search algorithm. The probabilistic semantics are available to high-level programming languages such as C/C++ and Python.

Generate exercises for adaptive digital education that are guaranteed to practice certain skills. Generate puzzles, maps, and scenarios for games that are guaranteed to exercise key game mechanics or drive narrative progression. Recover most likely solutions and enumerate misconceptions. Use statistical user models in simulation to sample behaviors, allowing analytics-driven design prior to deployment.

The radically new machine instructions are implemented, not in expensive physical hardware, but in software simulators that apply powerful combinatorial search and optimization algorithms behind the scenes. These systems implement advanced techniques from formal verification and probabilistic inference to sample feasible executions with strong assurances.

Suppose we want to generate simple linear equations to be included in
a student's mathematics worksheet. We'll generate equations of the form
*a x + b = c* where `a` and `x` are small positive
numbers and `b` and `c` are positive and possibly bigger.

Let's use the C interface to implement our equation generator. The
abstraction it provides is a random number generator with perfect foresight —
it will never return values that lead to trouble later on.
We'll ask it to uniformly sample values for `a`,
`x`, `b`, and `c` filtered by the condition that the
equation actually holds. Simply sampling random integer coefficients without
constraint doesn't work because the equation is very likely to have a fractional solution
(e.g., `2*x + 1 == 2` implies `x == 0.5`), a situation that might
not be appropriate for students at their current level of knowledge.

int main() { if(oracle_acquire()) { // Fork alternative executions here. int a = oracle_uniform_int(1, 10); // Draw integer values from the int x = oracle_uniform_int(1, 10); // bounded uniform distribution. int b = oracle_uniform_int(1, 100); int c = oracle_uniform_int(1, 100); oracle_ensure( a*x + b == c ); // Prevent executions where condition does not hold. oracle_release(); // Mark this execution as successful. printf("[a=%d]*[x=%d] + [b=%d] == [c=%d]\n", a, x, b, c); } else { printf("no solutions!\n"); } }

Here's a look at how the C compiler encodes the program above. Because our probabilistic extension follows the RISC-V extension guidelines, existing utilities such as compilers, debuggers, profilers, and disassemblers can read and write probabilistic executables without any modifications. Probabilistic instructions are shown here in bold. At this level, it starts to feel like we really have an Infinite Improbability Drive integrated as physical co-processor.

0001a434 <main>: 1a434: 0000778bcustom0 a5,zero,zero,0 # oracle_acquire1a438: 00079863 bnez a5,1a448 <main+0x14> 1a43c: 0001b537 lui a0,0x1b 1a440: a4450513 addi a0,a0,-1468 # 1aa44 <__clz_tab+0x124> 1a444: f9df506f j 103e0 <puts> 1a448: 00100793 li a5,1 1a44c: 00a00613 li a2,10 1a450: 02c7f58bcustom0 a1,a5,a2,1 # oracle_uniform_int1a454: 02c7f60bcustom0 a2,a5,a2,1 # oracle_uniform_int1a458: 06400713 li a4,100 1a45c: 02e7f68bcustom0 a3,a5,a4,1 # oracle_uniform_int1a460: 02e7f70bcustom0 a4,a5,a4,1 # oracle_uniform_int1a464: 02c587b3 mul a5,a1,a2 1a468: 00f687b3 add a5,a3,a5 1a46c: 40f707b3 sub a5,a4,a5 1a470: 0017b793 seqz a5,a5 1a474: 0407f00bcustom0 zero,a5,zero,2 # oracle_ensure1a478: 0600700bcustom0 zero,zero,zero,3 # oracle_release1a47c: 0001b537 lui a0,0x1b 1a480: a2050513 addi a0,a0,-1504 # 1aa20 <__clz_tab+0x100> 1a484: e55f506f j 102d8 <printf>

The compiled program executes normally until the first special instruction is reached.
At this point, various backend system are allowed to examine the processor state and contents of memory. Different backends will work
in different ways, but all will produce sequences of data to be returned
through calls like `oracle_uniform_int` that witness a safe execution path
that passes through `oracle_release` without failing any `oracle_ensure` conditions.
Once such witness values have been found, execution continues normally with full
programmer observability.

From the end-user's (programmer's) perspective, the program simply only outputs valid results, drawing a new sample every time:

$ riscv64-unknown-elf-gcc -O3 example.c -o example $ file example example: ELF 64-bit LSB executable, version 1 (SYSV), statically linked, not stripped $ spike $INFERENCE_CONFIG pk ./example [a=2]*[x=6] + [b=61] == [c=73] $ spike $INFERENCE_CONFIG pk ./example [a=9]*[x=6] + [b=14] == [c=68] $ spike $INFERENCE_CONFIG pk ./example [a=4]*[x=3] + [b=77] == [c=89]

Let's consider two backend systems that are up to the task of sampling
feasible executions. The first, based on Particle Filtering,
is easier to understand because it's Rejection Sampling
approach mimics how we, as humans, might try to draw samples by simply trying
many guesses. Upon reaching
`oracle_acquire`, we'll ask the operating system to `fork` a copy of the
currently running process and set a different random seed. When the forked process
needs to execute `oracle_uniform_int`, it simply uses existing random
number generators. If the `oracle_ensure` check fails, the forked process
exits without yielding a sample. If `oracle_release` is successfully encountered, the
forked process relays the sequence of random values it used to the master
process and then exits as well. Eventually, the master process will
get the data it needs to safely
continue execution where the programmer might be watching it in a debugger or
waiting to capture its output in a file, and samples will have been drawn with
the correct probability.

A refined variant of this approach (Particle Cascade) powers tools like Probabilistic
C. A backend system based on this approach will be very fast to start up and
will quickly return solutions if the search problem is *easy*: when a significant
fraction of the potential executions are feasibile (`oracle_ensure`
checks pass). For *hard* problems, where
the chance of hitting a feasible execution via forward sampling is
astronomically low, an approach that can exploit the deep structure of the search problem is necessary.

In the second approach, we'll use formal verification techniques to seek out those extremely rare solutions. Software model checkers, such as CBMC, work by transforming the program into a constraint satisfaction problem. High-performance solvers for these problems then either find a satisfying solution (the ingredients of a feasible execution trace) or formally prove that no solutions exist. In exchange for longer start-up times, these approaches provide the best available performance on hard search problems. Care is needed to ensure that the solutions resulting from this process arise with the correct distribution, however recent research on hashing-based approximate probabilistic inference provides a solid theoretical foundation here. For resource constrained environments that cannot host the solving process, this computation can be securely delegated to an external service in a new kind of cloud computing market.

Because our users will be expressing both easy and hard search problems
— what matters is that the problems are easy for *human programmers* to express in
high-level languages — Quasilinear Research provides several backend
inference systems that make different design trade-offs. Newly developed
backends will be compatible with existing compiled software that targets the
fixed Probabilistic ISA. Defining search problems at the level of machine
language allows third-party inference providers to exploit a white-box
specification of the search problem without requiring users to reveal their private
source code. Users may program in the high-level language of their choice, even
if that language is internal to their organization (presuming it can be compiled
for or interpreted by our probabilistic machine abstraction).

Quasilinear Research is developing a *probabilistic processor model*, defined
in the Chisel hardware
description language, to capture the execution of user-level software in
terms of the gate-level logic primitives understood by existing search and
inference back-ends. Depending on the type of work performed in the user's
probabilistic program, different probabilistic processor models will be
appropriate (e.g., including or excluding hardware support for complex math
operators, and varying the effective number of parallel hardware threads or
*harts*).