Frontier AI at Home — Alex Cheema, EXO Labs

Frontier AI at Home — Alex Cheema, EXO Labs https://video.ut0pia.org/videos/watch/f4fec7c6-3b79-4a55-a546-dc6ef5ed35fe Running GLM 5.1, a trillion parameter model released the day before this workshop, across four Mac Studios costs around $40,000 in hardware and tops out at roughly 20 tokens per second. Alex Cheema from EXO Labs thinks both numbers have about 100x left in them. The workshop covers what that 100x looks like across the stack: kernel fusion that recovered 30% performance on Qwen 3.5 from inefficiencies nobody had noticed, RDMA integration that cut node to node latency from 300 microseconds to single digits and made tensor parallelism actually scale, and the case for splitting prefill onto compute dense hardware and decode onto high bandwidth hardware. The live demo runs GLM 5.1 across four Mac Studios connected by Thunderbolt 5 and cuts large prompt inference roughly in half by offloading prefill to an RTX Spark. Speaker info: https://www.linkedin.com/in/alex-cheema, https://github.com/alexcheema Wed, 27 May 2026 17:34:32 GMT https://validator.w3.org/feed/docs/rss2.html PeerTube - https://clip.place Frontier AI at Home — Alex Cheema, EXO Labs https://clip.place/lazy-static/avatars/e63ed084-e5be-423e-a66e-dae98f14f4b6.png https://video.ut0pia.org/videos/watch/f4fec7c6-3b79-4a55-a546-dc6ef5ed35fe All rights reserved, unless otherwise specified in the terms specified at https://clip.place/about and potential licenses granted by each content's rightholder.