Thumbnail for LLMs on a budget by Alex Ziskind

LLMs on a budget

Alex Ziskind

1m 35s272 words~2 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Pull quotes
[0:00]And I'm going to be doing that one, the M2 MacBook Air, the M3 MacBook Air, and the M4 MacBook Air.
[0:00]This is a brand new paper about small language models in the future of Agentic AI.
[0:00]And this paper goes further into detailing how more efficient Agentic system could actually be designed.
Use this transcript
Related transcript hubs

[0:00]This is the M1 MacBook Air running GPT OSS 20 billion. That's right. I've got a web application architecture prompt in here. Design a scalable web application architecture for an e-commerce platform. It's doing it on an M1 MacBook Air. Which means you can actually run this on budget laptops. And I'm going to be doing that one, the M2 MacBook Air, the M3 MacBook Air, and the M4 MacBook Air. What's the point? I know what you're gonna say. Small models suck! No, no they don't. They don't suck anymore. In fact, Nvidia just published this paper. This is a brand new paper about small language models in the future of Agentic AI. And this paper goes further into detailing how more efficient Agentic system could actually be designed. And speaking of this small model, I actually have a pipeline to process videos, and I use that small model to detect where there's a person that appears on screen and when there's something else that I'm showing in the video and to automatically insert certain sound effects. And this is the 8 billion parameter model, which is 4.62 gigabytes on disk. So it'll have no problem running on the 16 gigabyte machine. Even if we bring up the context to say 50,000, and it's thinking. It's actually working on the 8 GB machine. We are pretty much up against that limit. It does actually look a little bit faster. That just could be me. Uh but look at the memory usage. We are in the green, not even close to orange or red. Lots of memory headroom there, so that's good.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript