Run AI Models Locally | Optimize with Expert Tips

How to run a random model on your own computer independently?

You can take an example from Dan Woods, who decided to run Qwen3.5-397B on his M3 Max with 48 GB of RAM (he mentioned this in a tweet).

He chose Claude Code, accessed the autoresearch repository by Andrej Karpathy, read Apple’s article “LLM in a Flash,” and thought: why not do the same?

After about five hours, the system was up and running, producing roughly one token per second. Another three hours of work and optimizations increased the speed almost fivefold — up to 4.74 tokens per second, while using around 6 GB of RAM. It’s worth noting that many improvements have not even been implemented yet.

Now, it can be said — a huge thanks to those who have manually created such optimizations in the past and made them accessible to all of us.

All of this is possible thanks to the efforts of those who have been working on it for a long time.

Created with n8n:
https://cutt.ly/n8n

Created with syllaby:
https://cutt.ly/syllaby

Page view 19.03 14:13 Page view 19.03 14:08 Moderation End 19.03 14:04 Moderation End 19.03 14:04 TG, WP, Linkedin Finish 19.03 14:04 WP Posting Start 19.03 14:03 WP Posting Finish 19.03 14:03 Post published: Israeli Airstrike Near Lebanon | Tense Footage of Near Miss 19.03 14:03 TG, WP, Linkedin Start 19.03 14:03 Page view 19.03 14:03