DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores

Published in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

We present DTC-SpMM, a general sparse matrix multiplication framework that effectively utilizes GPU Tensor Cores. DTC-SpMM bridges the gap between irregular sparsity patterns and dense tensor-core-friendly computation via novel data layouts and kernel designs, delivering substantial speedups over existing SpMM implementations across diverse sparse workloads.

Recommended citation: **Ruibo Fan**, Wei Wang, and Xiaowen Chu, "DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores," in *Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)*, 2024.
Download Paper | Code | Download Bibtex

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)