Data movement, particularly access to the main memory, has been the bottleneck of most computing problems. Ray tracing is no exception. We propose an unconventional solution that combines a ray ordering scheme that minimizes access to the scene data with a large on-chip buffer acting as near-compute storage that is spread over multiple chips. We demonstrate the effectiveness of our approach by introducing Mach-RT (Many chip - Ray Tracing), a new hardware architecture for accelerating ray tracing. Extending the concept of dual streaming, we optimize the main memory accesses to a level that allows the same memory system to service multiple processor chips at the same time. While a multiple chip solution might seem to imply increased energy consumption as well, because of the reduced memory traffic we are able to demonstrate, performance increases while maintaining reasonable energy usage compared to academic and commercial architectures. This paper extends our previous work [1] with design space exploration of the L3 cache size, more detailed evaluation of energy and memory performance, a discussion of energy delay product, and a brief exploration of boards with 16 chips. We also introduce new treelet enqueueing logic for the predictive scheduler.
@article{Vasiou:2020:MachRT,
author = {Elena Vasiou and Konstantin Shkurko and Erik Brunvand and Cem Yuksel},
title = {Mach-RT: A Many Chip Architecture for High-Performance Ray Tracing},
journal = {IEEE Transactions on Visualization and Computer Graphics},
year = {2020},
numpages = {12},
doi = {10.1109/TVCG.2020.3021048},
issn = {1077-2626},
}
This material is based upon work supported by the National Science Foundation under grant no. 1409129. Scene data: Fairy Forest: U. Utah, Crytek Sponza: F. Meinl at Crytek and M. Dabrovic, Dragon: Stanford CG Lab., Vegetation: S. Laine, and San Miguel: G. Leal Laguno.