The 1B parameter version of Llama 3.2 showed even slower results at 0.0093 tokens per second, based on the partial model run with data stored on disk.
I mean, cool? They got a C interface library to compile using an older C standard, and the 1B model predictably runs like trash. It will take hours to do anything meaningful at that rate.