DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores
Published in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
We present DTC-SpMM, a general sparse matrix multiplication framework that effectively utilizes GPU Tensor Cores. DTC-SpMM bridges the gap between irregular sparsity patterns and dense tensor-core-friendly computation via novel data layouts and kernel designs, delivering substantial speedups over existing SpMM implementations across diverse sparse workloads.
Recommended citation: **Ruibo Fan**, Wei Wang, and Xiaowen Chu, "DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores," in *Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)*, 2024.
Download Paper | Code | Download Bibtex
