Measuring Relative Processing Speeds

I am building deep learning models on my Mac and notice a considerable timing gap between code run by others using a GPU and what I am experiencing on my Mac, which has an M1 processor. PyTorch began supporting M1 about a month ago.

My experiment ran a matrix multiplication of two random matrices (1000x10, 10x50) and repeated it for 10k times. I compared 3 types of hardware processing for 2 environments: my Mac and Colab. The latter let me observe the gain from GPU. The 3 types of hardware: NumPy on CPU, Torch on CPU, Torch on M1 / GPU.

Results:

  Mac (ms) Colab (ms)
NumPy 7190 1500
Torch - CPU 456 140
Torch - M1 458 x
Torch - GPU x 115

Conclusion

Mysterious that there is no gain from CPU-->M1, potentially thinking about this wrong, will have to revisit!

This great blog does a more robust test and while he observes a gain by training a full model and performing inference, he concludes that M1 is a hard no in terms of replacing GPUs for training models.