Surprising selfhosted LLM performance with underwhelming results

older
Asking for advice/insight: Which...

Mark Prosser

13 Feb 2025 13 Feb '25

7:32 p.m.

The Tuesday meeting came at exactly the right time; My managers want to tick off the AI box, so I've been trying to become informed to prevent any disastrous outcomes, squashing my network automation efforts. I was thrilled to find out just how easy it was to self-host these models, and that it was feasible to do it on a mini PC, like my Beelink SER6 (small NUC-like mobile ryzen platform). I downloaded the model last night, and to my surprise, the performance was more than adequate. The model was able to reason & give seemingly good results, despite my box being at the very low end of "AI performance". The bad news is that, all this hype about "Agentic AI" and all the other cool "tool models" isn't really fully available yet, in a format that can be hosted well on my box -- let alone a box I can afford. To make matters worse, it also seems that distilled models aren't really 1:1 Deepseek R1. Rather, they're more-so the same old meta model, trained off of R1's reasoning & results. So it's not always comparing apples to apples, when people discuss their results. When I dug deeper into what's going on in "Agentic AI", especially in the network automation space, the results are currently more trivial than I could've imagined. They also currently rely on cloud API access, since you need the big hosted models... something that I would never condone in my environment, no matter how much the google whitepaper assures that "extensions" and "function calling" help with this concern. Anyway, it's neat that I can have this little chat bot or coding assistant hosted on a little cube on my desk; however, it seems that it's still quite far away that I can have little AI interns working on generating insightful signals for my automation workflows. Some additional reading I found interesting: https://arxiv.org/abs/2308.15030 https://arxiv.org/abs/2402.04249 https://angiejones.tech/system-access-for-ai-agents/ https://antirez.com/news/146 Thanks, everyone. See you at the next meeting! Warm regards, -- Mark Prosser // E: mark@zealnetworks.ca // W: https://zealnetworks.ca

Show replies by date

Evan Leibovitch

13 Feb 13 Feb

9:14 p.m.

On Thu, Feb 13, 2025 at 10:32 PM Mark Prosser via talk <talk@gtalug.org> wrote:

...

I was thrilled to find out just how easy it was to self-host these models, and that it was feasible to do it on a mini PC, like my Beelink SER6 (small NUC-like mobile ryzen platform). I downloaded the model last ight, and to my surprise, the performance was more than adequate. The model was able to reason & give seemingly good results, despite my box being at the very low end of "AI performance".

The bad news is that, all this hype about "Agentic AI" and all the other cool

...

"tool models" isn't really fully available yet, in a format that can be hosted well on my box -- let alone a box I can afford.

Most NUC-sized systems rely on the on-board GPU found on many Intel and some Ryzen CPUs. They're going to have neither the onboard memory nor the horsepower to host an R1 model, the requirements page I use <https://apxml.com/posts/gpu-requirements-deepseek-r1> says that an RTX3060 with 12GB of GPU RAM is the minimum for even the smallest R1 model. Having said that, there are NUCs and there are *NUCs*. Asus has been able to do some interesting things with the form factor ever since taking over Intel's NUC business, especially on its ROG gamer's side. Consider the 2025 ROG NUC <https://rog.asus.com/desktops/mini-pc/rog-nuc-2025/> which can sport an RTX5080. Obviously there are tradeoffs, in both cost -- maybe 5x the price of your SER6 -- and how many watts can be pumped out of a NUC-factor power supply. A small desktop-sized PC might better cool and power such a rig and may even be cheaper, though you might want to hold off on the current rev of the 5080/5090 <https://www.tweaktown.com/news/103255/its-not-just-the-geforce-rtx-5090-weve-now-got-melting-connectors-on-5080/index.html> . Anyway, it's neat that I can have this little chat bot or coding assistant

...

hosted on a little cube on my desk; however, it seems that it's still quite far away that I can have little AI interns working on generating insightful signals for my automation workflows.

Consider the expectations being set. Nobody would react if you complained your Beelink wasn't very good at running *God of War*. Yes, a reasonable entry-level AI rig for serious work will cost about upwards of $4K, but that's not far from the cost of a good-level gaming, video-production setup, or a Macbook M4. And ... especially ... think of what was just announced at CES <https://arstechnica.com/ai/2025/01/nvidias-first-desktop-pc-can-run-local-ai-models-for-3000/> . - Evan

Mark Prosser

14 Feb 14 Feb

5 p.m.

On 2025-02-14 00:14, Evan Leibovitch wrote:

...

Most NUC-sized systems rely on the on-board GPU found on many Intel and some Ryzen CPUs. They're going to have neither the onboard memory nor the horsepower to host an R1 model, the requirements page I use <https://apxml.com/posts/gpu-requirements-deepseek-r1> says that an RTX3060 with 12GB of GPU RAM is the minimum for even the smallest R1 model.

For sure, those are the recommended specs. But it runs just fine on my SER6 system, for what the model can actually do.

...

Obviously there are tradeoffs, in both cost -- maybe 5x the price of your SER6 -- and how many watts can be pumped out of a NUC-factor power supply. A small desktop-sized PC might better cool and power such a rig and may even be cheaper, though you might want to hold off on the current rev of the 5080/5090 <https://www.tweaktown.com/news/103255/its- not-just-the-geforce-rtx-5090-weve-now-got-melting-connectors-on-5080/ index.html>.

I'd rather hold off until the models can safely do what the hosted models can. That's agentic AI. Right now you seemingly need to rely on hosted services, and I'm just not okay with that. https://angiejones.tech/system-access-for-ai-agents/

...

Consider the expectations being set. Nobody would react if you complained your Beelink wasn't very good at running /God of War/.

It runs the model and God of War about the same -- Thanks to WINE & GabeN (Valve) -- adequate, with reasonable expectations. It's just that there is so much hype around Reasoning, especially in how it adds to Agentic AI. But the open source models don't seem to be there yet. I tried Cohere 7B today as well... it's still really only useful for analysis generation.

...

Yes, a reasonable entry-level AI rig for serious work will cost about upwards of $4K, but that's not far from the cost of a good-level gaming, video-production setup, or a Macbook M4. And ... especially ... think of what was just announced at CES <https://arstechnica.com/ai/2025/01/ nvidias-first-desktop-pc-can-run-local-ai-models-for-3000/>.

For serious work, no doubt about that. Renting a H100 VPS from Digitalocean at TOR1 costs $30k/yr , so that's quite a bargain. I think my $dayjob would be happy to pay $4k + as well. However, I'm not interested to see what comes of the continuation of advancements. Such as the paper I found in the Level1techs drop from yesterday: https://arxiv.org/abs/2308.15030

...

- Evan

Thanks, Evan! -- Mark Prosser // E: mark@zealnetworks.ca // W: https://zealnetworks.ca

Steve Litt

15 Feb 15 Feb

5:11 a.m.

Mark Prosser via talk said on Thu, 13 Feb 2025 22:32:02 -0500

...

Anyway, it's neat that I can have this little chat bot or coding assistant hosted on a little cube on my desk;

What's the benefit of self-hosted over just going to ChatGPT.Com? Thanks, SteveT Steve Litt http://444domains.com

Alvin Starr

5:56 a.m.

On 2/15/25 8:11 AM, Steve Litt via talk wrote:

...

Mark Prosser via talk said on Thu, 13 Feb 2025 22:32:02 -0500

...
Anyway, it's neat that I can have this little chat bot or coding assistant hosted on a little cube on my desk; What's the benefit of self-hosted over just going to ChatGPT.Com?

Self hosting does not require you to have an internet feed. Although an internet feed seems like its available all the time and everywhere, that is not the case. Anything that you feed into ChatGPT becomes their property that they can use to monetize in any way they see fit. There is also the possibility that the information that they scrape and store could be used by others in ways that ChatGPT does not intend. So imagine that you ask ChatGPT to write some code for you using your companies internal configuration. That information could then be used by others to attack or make business decisions that could be detrimental to your company. More likely they would use your questions to build a profile on you that they could sell to other companies to leverage your preferences or pray on your fears. -- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Evan Leibovitch

8:37 a.m.

On Sat, Feb 15, 2025 at 8:56 AM Alvin Starr via talk <talk@gtalug.org> wrote:

...

More likely they would use your questions to build a profile on you that they could sell to other companies to leverage your preferences or pray on your fears.

Or... almost as likely, given current business models ... determine the advertising sent your way. - Evan

Alvin Starr

16 Feb 16 Feb

12:17 p.m.

On 2/15/25 11:37 AM, Evan Leibovitch wrote:

...

On Sat, Feb 15, 2025 at 8:56 AM Alvin Starr via talk <talk@gtalug.org> wrote:

More likely they would use your questions to build a profile on you that they could sell to other companies to leverage your preferences or prayon your fears.

Or... almost as likely, given current business models ... determine the advertising sent your way.

Cambridge Analytica. Need I say more? -- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Steve Litt

12:58 a.m.

Alvin Starr via talk said on Sat, 15 Feb 2025 08:56:09 -0500

...

On 2/15/25 8:11 AM, Steve Litt via talk wrote:

...
Mark Prosser via talk said on Thu, 13 Feb 2025 22:32:02 -0500

...
Anyway, it's neat that I can have this little chat bot or coding assistant hosted on a little cube on my desk; What's the benefit of self-hosted over just going to ChatGPT.Com?

Self hosting does not require you to have an internet feed. Although an internet feed seems like its available all the time and everywhere, that is not the case.

Anything that you feed into ChatGPT becomes their property that they can use to monetize in any way they see fit. There is also the possibility that the information that they scrape and store could be used by others in ways that ChatGPT does not intend.

So imagine that you ask ChatGPT to write some code for you using your companies internal configuration. That information could then be used by others to attack or make business decisions that could be detrimental to your company.

More likely they would use your questions to build a profile on you that they could sell to other companies to leverage your preferences or pray on your fears.

OK, this is a very compelling reason. Just one more question... In order to come up with answers anywhere near as good as ChatGPT's answers, wouldn't I need to have the equivalent of all of ChatGPT's web-acquired information, which, even if in digested form, would probably require a trillion TB disk? And what kind of monster CPU would be required to access and logic out such a plethora of info? SteveT Steve Litt http://444domains.com

Alvin Starr

12:36 p.m.

On 2/16/25 3:58 AM, Steve Litt via talk wrote:

...

OK, this is a very compelling reason. Just one more question...

In order to come up with answers anywhere near as good as ChatGPT's answers, wouldn't I need to have the equivalent of all of ChatGPT's web-acquired information, which, even if in digested form, would probably require a trillion TB disk? And what kind of monster CPU would be required to access and logic out such a plethora of info?

This is not correct but would be a way to think of the process. The thing about LLMs is that they are taking all that Exabytes of data and building relationships between that various bits of data That dataset is much smaller than the original data. Then there are various optimizations that are applied to make things smaller still. Think of all the stuff you have ever seen or read or done. You don't need to keep it all around to remember it. The downside is that you don't always remember it in totality or accurately. Human and AI memory is a lossy compression technique. -- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Evan Leibovitch

15 Feb 15 Feb

9:29 a.m.

On Sat, Feb 15, 2025 at 8:12 AM Steve Litt via talk <talk@gtalug.org> wrote:

...

Mark Prosser via talk said on Thu, 13 Feb 2025 22:32:02 -0500

...

What's the benefit of self-hosted over just going to ChatGPT.Com?

Over and above what Alvin said (all accurate), there's also the issue of guardrails. While the obvious ones are well-known (don't ask Deepseek.com about Tiananmen Square and don't ask Google Gemini who Joe Biden is), you have no idea what else is being silently withheld from you because of the political, business or other restraints imposed to protect you from yourself. It might include withholding information about how to commit suicide or how to join ISIS or how to 3D print a gun, but what if it went further? What if it started subtly mucking with abortion advice or the unsavoury parts of religious holy books? Where do each of them cross the line from offensive to hate-inciting? The problem is, in the cloud systems they are beholden to their politics, shareholders and defence lawyers. You, in installing your own system, are not. In Deepseek, there is a license clause saying you are not allowed to use the model to do anything illegal in your country or for military purposes. The limitations are clearly stated in the license, Attachment A <https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL> -- where they should be -- rather than buried opaquely in the model such that you don't know what you don't know. That may not be reason enough for most people to self-host, but it should matter to anyone who cares about freedom of expression and the freedom to learn things that other people don't want you to learn. It surprised me to learn that there is actually an AI self-censorship benchmark out there called Harmbench <https://arxiv.org/abs/2402.04249>. Cisco recently ran Harmbench on some models and published a breathless account of how *Deepseek allows jailbreaking!!! <https://blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models>* They treat Deepseek's lack of guardrails (also in the open source Llama model) as a vulnerability and security risk, oblivious to the reality that in open source there is no jail to break. As if bad actors prevented from doing bad things by American or European hosted AI models can't get access to any of that "bad information" elsewhere. This is nothing less than security through obscurity, this time in the context of LLM models. - Evan

D. Hugh Redelmeier

2:41 p.m.

...

From: Evan Leibovitch via talk <talk@gtalug.org>

...

On Sat, Feb 15, 2025 at 8:12 AM Steve Litt via talk <talk@gtalug.org> wrote:

...

...
What's the benefit of self-hosted over just going to ChatGPT.Com?

...

Over and above what Alvin said (all accurate), there's also the issue of guardrails.

How does one remove guardrails? How were they installed? I would thought that this was part of the training process so it would not be easy to remove guardrails from a trained model. I certainly don't know this.

Mark Prosser

3:37 p.m.

On 2025-02-15 17:41, D. Hugh Redelmeier via talk wrote:

...

...
...
What's the benefit of self-hosted over just going to ChatGPT.Com?

...
Over and above what Alvin said (all accurate), there's also the issue of guardrails.

How does one remove guardrails? How were they installed?

I would thought that this was part of the training process so it would not be easy to remove guardrails from a trained model. I certainly don't know this.

Yeah, I'm unsure about that and am hesitant to trust it. If it's possible to protect against this in distillation -- since we're training existing models off of a compromised models, and those models being trained might have the better answer already included -- I'm unaware of this. Anyway, I just don't want to connect even my lab environment to a hosted service, especially not one in China, let alone my production environment. It's definitely a non-starter for this sudden managerial itch to introduce "AIOps" at work. On the HarmBench notion... it's interesting that DeepSeek failed 100% of the tests. But other "trusted" LLMs failed at 70-80% + https://www.pcmag.com/news/deepseek-fails-every-safety-test-thrown-at-it-by-... They tested ~1-50 of scenarios in all test domains, so it's not like it was a 100% of 100% of all scenarios, however. Great responses, everyone. Warm regards, --- Mark Prosser // E: mark@zealnetworks.ca // W: https://zealnetworks.ca

Evan Leibovitch

16 Feb 16 Feb

5:52 p.m.

On Sat, Feb 15, 2025 at 6:38 PM Mark Prosser via talk <talk@gtalug.org> wrote:

...

On 2025-02-15 17:41, D. Hugh Redelmeier via talk wrote:

...
How does one remove guardrails? How were they installed?

I would thought that this was part of the training process so it would not be easy to remove guardrails from a trained model. I certainly don't know this.

Yeah, I'm unsure about that and am hesitant to trust it. If it's possible to protect against this in distillation -- since we're training existing models off of a compromised models, and those models being trained might have the better answer already included -- I'm unaware of this.

I'm not absolutely sure, but the Deepseek situation may offer some clues. Infamously, its Chinese-hosted version has significant guardrails in place, to remain compliant with Chinese laws which demand domestic censorship of certain issues. These guardrails don't exist in the models that are downloaded or hosted outside China, indeed according to the Cisco researchers they have no guardrails at all. But notice that the download is in two parts -- the model itself and some code that appears to act as the way the API and human interface interact with the model. They even have different licenses; the model has a new license that restricts certain kinds of use but the code has a pure BSD/MIT license. The Harmbench paper <https://arxiv.org/pdf/2402.04249>, an interesting read, indicated ways that guardrails can be implemented both within the model and "at a system level". My theory is that the Deepseek guardrails are implemented at the system level, and that the model itself isn't touched. So when the model is downloaded the Chinese restrictions don't travel with it. Of course, since the closed models aren't downloadable, we have no way of knowing if a "pure" ChatGPT model has the guardrails installed embedded inside the model or in the code that connects OpenAI (and Anthropic, etc) to the outside world.

...

Anyway, I just don't want to connect even my lab environment to a hosted service, especially not one in China, let alone my production environment. It's definitely a non-starter for this sudden managerial itch to introduce "AIOps" at work.

On the HarmBench notion... it's interesting that DeepSeek failed 100% of the tests. But other "trusted" LLMs failed at 70-80% +

https://www.pcmag.com/news/deepseek-fails-every-safety-test-thrown-at-it-by-...

Here is the original Cisco blog <https://blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models> on the Harmbench testing; I linked to it in my original mail. It offers a little more detail than PCMag did, including an interesting breakdown of the different categories of harm being tested. One of the categories is "general harm", a grab-bag of subjectively-determined harms. And another category is "illegal", and of course what constitutes illegal behaviour can vary widely between jurisdictions. - Evan

Colin McGregor

6:01 p.m.

Here is a little write-up explaining how to use time to bypass some of the safeguards built into ChatGPT: https://www.bleepingcomputer.com/news/security/time-bandit-chatgpt-jailbreak... On Sun, Feb 16, 2025 at 8:53 PM Evan Leibovitch via talk <talk@gtalug.org> wrote:

...

On Sat, Feb 15, 2025 at 6:38 PM Mark Prosser via talk <talk@gtalug.org> wrote:

...
On 2025-02-15 17:41, D. Hugh Redelmeier via talk wrote:

...
How does one remove guardrails? How were they installed?

I would thought that this was part of the training process so it would not be easy to remove guardrails from a trained model. I certainly don't know this.

Yeah, I'm unsure about that and am hesitant to trust it. If it's possible to protect against this in distillation -- since we're training existing models off of a compromised models, and those models being trained might have the better answer already included -- I'm unaware of this.

I'm not absolutely sure, but the Deepseek situation may offer some clues.

Infamously, its Chinese-hosted version has significant guardrails in place, to remain compliant with Chinese laws which demand domestic censorship of certain issues. These guardrails don't exist in the models that are downloaded or hosted outside China, indeed according to the Cisco researchers they have no guardrails at all.

But notice that the download is in two parts -- the model itself and some code that appears to act as the way the API and human interface interact with the model. They even have different licenses; the model has a new license that restricts certain kinds of use but the code has a pure BSD/MIT license. The Harmbench paper, an interesting read, indicated ways that guardrails can be implemented both within the model and "at a system level".

My theory is that the Deepseek guardrails are implemented at the system level, and that the model itself isn't touched. So when the model is downloaded the Chinese restrictions don't travel with it. Of course, since the closed models aren't downloadable, we have no way of knowing if a "pure" ChatGPT model has the guardrails installed embedded inside the model or in the code that connects OpenAI (and Anthropic, etc) to the outside world.

...
Anyway, I just don't want to connect even my lab environment to a hosted service, especially not one in China, let alone my production environment. It's definitely a non-starter for this sudden managerial itch to introduce "AIOps" at work.

On the HarmBench notion... it's interesting that DeepSeek failed 100% of the tests. But other "trusted" LLMs failed at 70-80% +

https://www.pcmag.com/news/deepseek-fails-every-safety-test-thrown-at-it-by-...

Here is the original Cisco blog on the Harmbench testing; I linked to it in my original mail. It offers a little more detail than PCMag did, including an interesting breakdown of the different categories of harm being tested. One of the categories is "general harm", a grab-bag of subjectively-determined harms. And another category is "illegal", and of course what constitutes illegal behaviour can vary widely between jurisdictions.

- Evan

--- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

173

Age (days ago)

176

Last active (days ago)

List overview

Download

13 comments

6 participants

participants (6)

Alvin Starr
Colin McGregor
D. Hugh Redelmeier
Evan Leibovitch
Mark Prosser
Steve Litt

Surprising selfhosted LLM performance with underwhelming results

tags

participants (6)