Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 9 hours ago

Qwen 2.5 is already amazing for a 14B, so I don’t see how deepseek can improve that much with a new base model, even if they continue train it.

Perhaps we need to meet in the middle, and have quad channel APUs like Strix Halo become more common, and maybe release like 40-80GB MoE models. Perhaps bitnet ones?

Or design them for asynchronous inference.

I just don’t see how 20B-ish models can perform like one orders of magnitude bigger without a paradigm shift.

brucethemoose@lemmy.world · 16 hours ago

Like Russia and Ukraine, basically… But way more direct and overt.

brucethemoose@lemmy.world · 18 hours ago

Troop morale would be incredibly low, too. I would be pissed if I somehow got drafted to invade Canada. It’d be like something out of a cartoon.

brucethemoose@lemmy.world · edit-2 21 hours ago

Already done.

Social media is basically the internet for most of the population, and the biggest ones by far (Meta, Timtok) prostrated themselves.

brucethemoose@lemmy.world · edit-2 2 days ago

+1

Don’t feed the trolls.

Kneejerk reacting to Trump’s stupidity by finding something ancillary to blame it on him only feeds the troll in chief. It’s disrespectful.

Ignore the rage bait, take occam’s razor based on what’s public, and let the investigators do their jobs.

brucethemoose@lemmy.world · 4 days ago

Or gasp advocate for open source development in the US?

Unthinkable, right?

brucethemoose@lemmy.world · edit-2 4 days ago

Running the model can be no more taxing than playing a modern video game, except the load is not constant.

This is not true, Deepseek R1 is huge. There’s a lot of confusion between the smaller distillations based on Qwen 2.5 (some that can run on consumer GPUs), and the “full” Deepseek R1 based on Deepseekv3

Your point mostly stands, but the “full” model is hundreds of gigabytes, and the paper mentioned something like a bank of 370 GPUs being optimal for hosting. It’s very efficient because its only like 30B active, which is bonkers, but still.

brucethemoose@lemmy.world · edit-2 4 days ago

Also, the thing the Chinese govt did probably do is give Deepseek training data.

For all the memes about the NSA, the US govt isn’t really in that position, as whatever the US govt has pales in comparison to Microsoft or Google.

brucethemoose@lemmy.world · edit-2 4 days ago

I had suspicious before, but I knew they were screwed when Qwen 2.5 came out. 32Bs and 72Bs nipping at their heels… O3 was a joke in comparison.

And they probably aren’t fudging anything. Base Deepseek isn’t like crazy or anything, and the way they finetuned it to R1 is public. Researchers are trying to replicate it now.

brucethemoose@lemmy.world · edit-2 4 days ago

Literally thought this was real for a sec. It could be I guess.

Even better that completely fake tweet screenshots are a thing. Let Twitter burn.

brucethemoose@lemmy.world · 4 days ago

Thanks, Disney…

brucethemoose@lemmy.world · edit-2 4 days ago

Deepseek R1 runs with open source code from an American company, specifically Huggingface.

They have their own secret sauce inference code, sure, but they also documented it on a high level in the paper, so a US company can recreate it if they want.

There’s nothing they can do, short of a hitler esque “all open models are banned, you must use these select American APIs by law.” That would be like telling the US “everyone must use Bing and the Bing API for all search queries, anything else is illegal.”

brucethemoose@lemmy.world · edit-2 4 days ago

Everyone in the open LLM community knew this was coming.

We didn’t know the exact timing, but OpenAI is completely stagnant, and it was coming this year or the next.

I don’t think the world still understands how screwed OpenAI is. It isn’t just that their moat is gone, it’s that, even with all that money, their models (for the size\investment) are objectively bad.

brucethemoose@lemmy.world · edit-2 4 days ago

The OpenAI “don’t train on our output” clause is a meme in the open LLM research community.

EVERYONE does it, implicitly or sometimes openly, with chatml formatting and OpenAI specific slop leaking into base models. They’ve been doing it forever, and the consensus seems to be that it’s not enforceable.

OpenAI probably does it too, but incredibly, they’re so obsessively closed and opaque is hard to tell.

So as usual, OpenAI is full of shit here, and don’t believe a word that comes out of Altman’s mouth. Not one.

brucethemoose@lemmy.world · edit-2 4 days ago

Google made an issue of this by officially commenting rather than silently conducting business as usual (which would be changing the name when the govt source does).

They’re trying to have their cake and eat it, I guess? Theoretically this would appease X viewers and Trump (who would move onto the next trending controversy), while stating that they are following usual procedure?

brucethemoose@lemmy.world · edit-2 5 days ago

https://www.axios.com/2025/01/10/mark-zuckerberg-joe-rogan-facebook-censorship-biden

Zuckerberg on Rogan: Facebook’s censorship was “something out of 1984”

“It really is a slippery slope, and it just got to a point where it’s just, OK, this is destroying so much trust, especially in the United States, to have this program.”

He said he was “worried” from the beginning about “becoming this sort of decider of what is true in the world.” Zuckerberg praised X’s “community notes” program as superior to Facebook’s model.

Way to tackle censorship, Zuck…

The irony is Facebook is a major contributor to a lot of open source software, and Zuckerberg in particular publicly praised the “open” approach of Llama and some other projects. Buts it’s clearly all just self serving, huh?

brucethemoose@lemmy.world · 8 days ago

By “his people,” you mean most voters, yeah.

But for high millionaires/billionaires, this is effectively a regressive tax that will benefit them.

brucethemoose@lemmy.world · 8 days ago

This is a fair point.

brucethemoose@lemmy.world · edit-2 8 days ago

Uh… I think it’s never coming back.

We had our chance to course correct, and didn’t. And I suspect dems are either going to learn nothing and flop the next election, literally get suppressed by Trump, or get their own version of a “liberal Trump.”

Our social media is only going to get more toxic now, and it’s already dominating public discourse.

I dunno where you live, friend, but do not assume the US is digging its way out of this hole :(

brucethemoose@lemmy.world · edit-2 9 days ago

Europe is having its issues too. It seems like the people are mad about taxes already, and aren’t keen on the prospect on increasing them for military spending. They’ve already run huge deficits for COVID.

Many other powers are either not interested in this particular fight or can’t afford to be.

I hate to sound so cynical, but I think Zelensky is smart to “work with” Trump (aka manipulate him) instead of denouncing him and kicking him to the curb like his country has every right to, because the drive to fight only goes so far against a truly genocidal adversary.

brucethemoose@lemmy.world · edit-2 4 months ago

Guide to Self Hosting LLMs Faster/Better than Ollama