<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Frontier AI at Home — Alex Cheema, EXO Labs</title>
        <link>https://video.ut0pia.org/videos/watch/f4fec7c6-3b79-4a55-a546-dc6ef5ed35fe</link>
        <description>Running GLM 5.1, a trillion parameter model released the day before this workshop, across four Mac Studios costs around $40,000 in hardware and tops out at roughly 20 tokens per second. Alex Cheema from EXO Labs thinks both numbers have about 100x left in them. The workshop covers what that 100x looks like across the stack: kernel fusion that recovered 30% performance on Qwen 3.5 from inefficiencies nobody had noticed, RDMA integration that cut node to node latency from 300 microseconds to single digits and made tensor parallelism actually scale, and the case for splitting prefill onto compute dense hardware and decode onto high bandwidth hardware. The live demo runs GLM 5.1 across four Mac Studios connected by Thunderbolt 5 and cuts large prompt inference roughly in half by offloading prefill to an RTX Spark. Speaker info: https://www.linkedin.com/in/alex-cheema, https://github.com/alexcheema</description>
        <lastBuildDate>Wed, 27 May 2026 17:34:32 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>PeerTube - https://clip.place</generator>
        <image>
            <title>Frontier AI at Home — Alex Cheema, EXO Labs</title>
            <url>https://clip.place/lazy-static/avatars/e63ed084-e5be-423e-a66e-dae98f14f4b6.png</url>
            <link>https://video.ut0pia.org/videos/watch/f4fec7c6-3b79-4a55-a546-dc6ef5ed35fe</link>
        </image>
        <copyright>All rights reserved, unless otherwise specified in the terms specified at https://clip.place/about and potential licenses granted by each content's rightholder.</copyright>
        <atom:link href="https://clip.place/feeds/video-comments.xml?videoId=f4fec7c6-3b79-4a55-a546-dc6ef5ed35fe" rel="self" type="application/rss+xml"/>
    </channel>
</rss>