• 1 Post
  • 54 Comments
Joined 11 months ago
cake
Cake day: March 22nd, 2024

help-circle
  • Qwen 2.5 is already amazing for a 14B, so I don’t see how deepseek can improve that much with a new base model, even if they continue train it.

    Perhaps we need to meet in the middle, and have quad channel APUs like Strix Halo become more common, and maybe release like 40-80GB MoE models. Perhaps bitnet ones?

    Or design them for asynchronous inference.

    I just don’t see how 20B-ish models can perform like one orders of magnitude bigger without a paradigm shift.







  • Running the model can be no more taxing than playing a modern video game, except the load is not constant.

    This is not true, Deepseek R1 is huge. There’s a lot of confusion between the smaller distillations based on Qwen 2.5 (some that can run on consumer GPUs), and the “full” Deepseek R1 based on Deepseekv3

    Your point mostly stands, but the “full” model is hundreds of gigabytes, and the paper mentioned something like a bank of 370 GPUs being optimal for hosting. It’s very efficient because its only like 30B active, which is bonkers, but still.






  • brucethemoose@lemmy.worldtoMicroblog Memes@lemmy.worldOpenAI hard work got stolen...
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    edit-2
    4 days ago

    Deepseek R1 runs with open source code from an American company, specifically Huggingface.

    They have their own secret sauce inference code, sure, but they also documented it on a high level in the paper, so a US company can recreate it if they want.

    There’s nothing they can do, short of a hitler esque “all open models are banned, you must use these select American APIs by law.” That would be like telling the US “everyone must use Bing and the Bing API for all search queries, anything else is illegal.”


  • Everyone in the open LLM community knew this was coming.

    We didn’t know the exact timing, but OpenAI is completely stagnant, and it was coming this year or the next.

    I don’t think the world still understands how screwed OpenAI is. It isn’t just that their moat is gone, it’s that, even with all that money, their models (for the size\investment) are objectively bad.


  • The OpenAI “don’t train on our output” clause is a meme in the open LLM research community.

    EVERYONE does it, implicitly or sometimes openly, with chatml formatting and OpenAI specific slop leaking into base models. They’ve been doing it forever, and the consensus seems to be that it’s not enforceable.

    OpenAI probably does it too, but incredibly, they’re so obsessively closed and opaque is hard to tell.

    So as usual, OpenAI is full of shit here, and don’t believe a word that comes out of Altman’s mouth. Not one.







  • Europe is having its issues too. It seems like the people are mad about taxes already, and aren’t keen on the prospect on increasing them for military spending. They’ve already run huge deficits for COVID.

    Many other powers are either not interested in this particular fight or can’t afford to be.

    I hate to sound so cynical, but I think Zelensky is smart to “work with” Trump (aka manipulate him) instead of denouncing him and kicking him to the curb like his country has every right to, because the drive to fight only goes so far against a truly genocidal adversary.