Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon – arXiv

We present a Sparse Ternary GEMM kernel optimized specifically for Apple's M-series processors. We propose a set of architecture-aware optimizations, …
View full source