Taraṅga

तरंग — wave

The GPU that
computes on
a wave.

Taranga builds StochastiCore: a complete, open, peer-revised graphics pipeline whose arithmetic rides probability instead of place-value. A number becomes the density of ones in a pulse train — and multiplication collapses to a single AND gate.

Honest by construction: stochastic computing is not a throughput or energy win per operation — and this site prints those numbers at full size, beside the ones where it shines.

THE VALUE — A WAVEITS STOCHASTIC ENCODING — DENSITY OF ONESRECOVERED BY COUNTING — WITH HONEST NOISE
Fig. 1 · live — computed with the RTL's LFSR semanticsmultiply two of these streams = one AND gate
1
logic gate per multiply
a single AND — synthesized
$80–150
complete working GPU
off-the-shelf parts, open tools
36.7 dB
render quality at L=1024
bit-accurate LFSR model
0.5%
fault crossover BER
SC degrades gracefully above it

The idea

Arithmetic, priced in single gates.

Binary arithmetic pays for precision with silicon: an 8-bit multiplier is ~320 gates, and accelerators are mostly oceans of them. Stochastic computing pays with time instead — encode values as random bitstreams and the laws of probability do the arithmetic: P(A∧B) = P(A)·P(B), so an AND gate is a multiplier.

That trade — gates for cycles — is wrong for a gaming GPU and exactly right for a different world: minimum-silicon displays, radiation-tolerant compute, voltage-overscaled edge devices, and a GPU cheap enough that a student can build one with their own hands.

OperationStochastiCoreConventional binary
Multiply1 gate
a single AND
~320 gates
8-bit array multiplier
Scaled add1 gate
a single MUX
~40 gates
8-bit adder
4-lane MAC7 cells
4 AND + 3 MUX
1,912 cells
synthesized binary MAC-4
tanh52 cells
5-state FSM
LUT/CORDIC, hundreds
typical implementations

Synthesized counts from the released Yosys flow — reproducible from the public RTL. The fair fine print: an SC result needs L cycles to read out (L = 256–65,536), which is the entire trade. How it works →

The honest line

Where the wave wins — and where it loses.

Wins — peer-reviewed
  • Area density: a full compute unit — 4 multipliers, MAC, tanh, lerp — in fewer cells than ONE binary MAC-4 (1,709 vs 1,912).
  • Marginal multiply = 1 gate: parallelism priced like wire, not silicon.
  • Graceful failure: above ~0.5% bit-error rate, SC compute out-degrades binary — a flipped bit is 1/L of an answer, never the MSB.
  • Free anti-banding: SC's noise is provably white — low-precision gradients render smooth without dithering hardware.
Losses — printed at full size
  • Not a throughput win: useful quality needs 29×–116× longer per operation; the binary baseline renders ~8× more frames.
  • Not an energy win: gate-level measurement says 71 fJ/op binary vs 17.6k–69k fJ/op stochastic. Our own earlier model was wrong; the revision says so.
  • Not a gaming GPU: the niche is minimum-silicon, reliability-critical, display-class compute — and we won't pretend otherwise.

The first product

A real GPU, for the price of a textbook.

The Taranga Dev Kit is the build guide made physical: a complete stochastic GPU on a $25–55 FPGA, driving a real VGA monitor, commanded over SPI by any microcontroller — assembled by you, understood by you, owned by you.

Finished-device specification
Display
320×240 @ 60 Hz (VGA, pixel-doubled to 640×480)
Color
RGB332 — 256 colors via an R-2R resistor DAC
Compute
4 parallel SC compute units (SIMD)
Precision
runtime knob: L = 256 (fast, noisy) → 65,536 (slow, near-exact)
Host
any microcontroller over SPI (Arduino, STM32)
Footprint
~2,400 LUTs — ~30% of a $25–55 iCE40 HX8K board
$80150complete bill of materials, everything from FPGA to VGA cable

Where it serves

Honest use cases.

All six, with their grounding →
Learning

Education & research kits

A GPU whose entire pipeline a student can read, synthesize, probe, and rebuild — multiplication is literally one AND gate you can point at. From Verilog to VGA glow in a weekend, on a fully open toolchain.

Space / industrial

Radiation & harsh-environment compute

Above ~0.5% bit-error rate, stochastic arithmetic degrades gracefully where binary corrupts catastrophically — a single flipped bit is 1/L of an answer, not the MSB. A candidate fabric for CubeSats, high-altitude, and high-EMI industrial settings.

Embedded

Minimum-silicon instrument displays

Gauges, medical monitors, and panel displays need smooth gradients, not teraflops. The SC fabric draws banding-free shading at 3-bit color — the dithering is free, by physics — in a corner of the cheapest FPGAs.

The Signal Lab

Multiply two numbers with one gate. Right now.

The Lab runs the same LFSR semantics as the RTL, live in your tab: watch streams converge, watch banding dissolve, and watch binary shatter under bit-flips while the wave bends.

Open the Signal Lab