It’s been a while since I’ve struggled with Xilinx tools, but I can’t imagine there aren’t any hardware limitations these days. Does this run on a Spartan 6, or do you need the latest UltraScale for it?
This fits and runs in a DE-10 Nano without too much difficulty, uses around 70% of the fabric. I've been working on timing closure and just got it to 50 MHz.
Note that I also implemented cache components not present in the original Voodoo in order to be more flexible in terms of the memory that can be used. So it could be quite a bit smaller, maybe 50% of the fabric if you got rid of that.
That's quite impressive. 70% is obviously way too big for a MiSTer core, but I wonder if one day we will have an affordable FPGA board able to simulate a late '90s PC...
FPGA simulations are a naive attempt to guess at Metastability problems by finding a "steady state" latency after a certain amount of simulation time. Clock domain crossing mitigation only gets folks so far, and state propagation issues often get worse with larger and faster chips.
Note, there are oversized hobby Voodoo cards that max out the original ASIC count and memory limits. There are also emulators like 86box that simulate the hardware just fine for old games.
Each gpu_* call emits SPIR-V and dispatches via Vulkan
compute. Data stays resident in VRAM between calls — no
round-trips to CPU unless you need the result.
No thread_id exposed. The runtime handles thread indexing
internally — gpu_add(a, b) means "one thread per element,
each does a[i] + b[i]." Workgroup sizing and dispatch
dimensions are automatic.
The tradeoff: you can't write custom kernels with shared
memory or warp-level ops. OctoFlow targets the 80% of
GPU work that's embarrassingly parallel. For the other
20% you still want CUDA/Vulkan directly.
Or does this only run in simulation anyway?