Beating NumPy matrix multiplication in 150 lines of C

salykova.github.io

cross-posted to:
[email protected]

Beating NumPy matrix multiplication in 150 lines of C

salykova.github.io

bot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 4 months ago

cross-posted to:
[email protected]

Beating NumPy’s matrix multiplication in 150 lines of C code

salykova.github.io

TL;DR The code from the tutorial is available at matmul.c. This blog post is the result of my attempt to implement high-performance matrix multiplication on CPU while keeping the code simple, portable and scalable. The implementation follows the BLIS design, works for arbitrary matrix sizes, and, when fine-tuned for an AMD Ryzen 7700 (8 cores), outperforms NumPy (=OpenBLAS), achieving over 1 TFLOPS of peak performance across a wide range of matrix sizes.

HN Discussion

You must log in or register to comment.

Chat