Back
Quside Blog
Can my code be accelerated by an RPU? Part 3: how to leverage the RPU

17 de abril de 2023
2.8 min read

PART 3: Deciding how to leverage the RPU

In our previous post, we introduced you to Quside’s Randomness Processing Unit (RPU) and outlined how you could use it to accelerate your stochastic workloads, whatever type they may be. In this post, we would like to go a bit more into how you can make the most of the RPU depending on what types of loads can be most efficiently accelerated with a device like the RPU. This way, you will be able to take full advantage of the computational capabilities of this device, maximizing the impact it can have on your work times, your energy footprint, or the accuracy of your results.

Out-of-the-box optimized generation of commonly-used probability distributions

By providing optimized generation of probability distributions such as uniform, independent, and coupled Gaussian, Binomial, and random walks, the RPUs accelerate the production and required post-processing of accurate random numbers for simulations and other applications, reducing the computational time needed to simulate and enabling you to explore a broader range of scenarios.

Custom hardware design

For particular types of stochastic and randomized algorithms, it may be possible to design custom hardware optimized for that algorithm. This algorithm and workload optimization procedure can be a more expensive and time-consuming approach, but it can lead to a significant speedup compared to a generic hardware accelerator.
RPUs are primarily focused on these kinds of codes: by offloading the randomness generation and the surrounding logic and leveraging the platform’s reprogrammability, RPUs can provide the maximum effectiveness, efficiency, and economy metrics possible to a given stochastic workload.

Specialized architectures

The RPU architecture & ecosystem are specially tailored for stochastic workloads: having local access to a high-quality, high-speed, quantum entropy source allows for the deployment of specialized architectures that leverage entropy at its maximum.

Approximation techniques & low-precision arithmetic

Many stochastic and randomized algorithms involve calculations that are computationally expensive or difficult to perform precisely. Approximation techniques can speed up these calculations.

For example, you could use the RPU to perform Monte Carlo simulations using low-precision floating-point arithmetic, or even fixed-point arithmetic, which can be orders of magnitude faster than high-precision arithmetic while preserving a similar accuracy in the result

Pipelining

If your algorithms support pipelining by overlapping the execution of different stages of the algorithm, offloading the different pipelining stages to an RPU may lead to higher throughput and lower latency compared to other non-pipelining devices.

Multi-level parallelism

Some stochastic and randomized algorithms can be optimized for multi-level parallelism, which involves using both coarse-grained parallelism (e.g., multiple cores) and fine-grained parallelism (e.g., vectorization or pipelining). This can lead to significant speedup compared to using just one type of parallelism. RPUs support both types: you can instantiate multiple cores within the RPU, each running a vectorized, pipelined version of your algorithm.

Bit-level optimizations

The RPU leverages entropy at its maximum, extracting up to the last bit of entropy available. This fact and its reprogrammability characteristics allow for bit-level optimizations that generate the shortest and most efficient circuits to run your stochastic workloads.

In summary, an RPU is specially tailored to handle many different kinds of optimizations. Knowing which parts of the algorithms are to be improved and how much is critical for successfully integrating the RPU within your production environments. We will cover this topic in a future post. Meanwhile, why don’t you start by exploring the different Use Cases an RPU has been tested into?