ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
Published in Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2026
R. Fan et al., “ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression,” in ASPLOS 2026. (CCF-A)