MCP in LM Studio - Readit News

Just ordered a $12k mac studio w/ 512GB of integrated RAM.

Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

zackify · 9 months ago

I love LM studio but I’d never waste 12k like that. The memory bandwidth is too low trust me.

Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better

tymscar · 8 months ago

Why would they pay 2/3 of the price for something with 1/5 of ram?

The whole point of spending that much money for them is to run massive models, like the full R1, which the Pro 6000 cant

marci · 8 months ago

You can't run deepseek-v3/r1 on the RTX Pro 6000, not to mention the upcomming 1 million context qwen models, or the current qwen3-235b.

smcleod · 8 months ago

RTX is nice, but it's memory limited and requires to have a full desktop machine to run it in. I'd take slower inference (as long as it's not less than 15tk/s) for more memory any day!

t1amat · 8 months ago

(Replying to both siblings questioning this)

If the primary use case is input heavy, which is true of agentic tools, there’s a world where partial GPU offload with many channels of DDR5 system RAM leads to an overall better experience. A good GPU will process input many times faster, and with good RAM you might end up with decent output speed still. Seems like that would come in close to $12k?

And there would be no competition for models that do fit entirely inside that VRAM, for example Qwen3 32B.

storus · 8 months ago

RTX Pro 6000 can't do DeepSeek R1 671B Q4, you'd need 5-6 of them, which makes it way more expensive. Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.

chisleu · 8 months ago

Only on HN can buying a $12k badass computer be a waste of money

dchest · 9 months ago

I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).

Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.

minimaxir · 9 months ago

8 GB of RAM with local LLMs in general is iffy: a 8-bit quantized Qwen3-4B is 4.2GB on disk and likely more in memory. 16 GB is usually the minimum to be able to run decent models without compromising on heavy quantization.

imranq · 9 months ago

I'd love to host my own LLMs but I keep getting held back from the quality and affordability of Cloud LLMs. Why go local unless there's private data involved?

diggan · 8 months ago

There are some use cases I use LLMs for where I don't care a lot about the data being private (although that's a plus) but I don't want to pay XXX€ for classifying some data and I particularly don't want to worry about having to pay that again if I want to redo it with some changes.

Using local LLMs for this I don't worry about the price at all, I can leave it doing three tries per "task" without tripling the cost if I wanted to.

It's true that there is an upfront cost but way easier to get over that hump than on-demand/per-token costs, at least for me.

PeterStuer · 8 months ago

Same. For 'sovereignty ' reasons I eventually will move to local processing, but for now in development/prototyping the gap with hosted LLM's seems too wide.

mycall · 8 months ago

Offline is another use case.

noman-land · 9 months ago

I love LM Studio. It's a great tool. I'm waiting for another generation of Macbook Pros to do as you did :).

incognito124 · 9 months ago

> I'm going to download it with Safari

Oof you were NOT joking

noman-land · 9 months ago

Safari to download LM Studio. LM Studio to download models. Models to download Firefox.

whatevsmate · 8 months ago

I did this a month ago and don't regret it one bit. I had a long laundry list of ML "stuff" I wanted to play with or questions to answer. There's no world in which I'm paying by the request, or token, or whatever, for hacking on fun projects. Keeping an eye on the meter is the opposite of having fun and I have absolutely nowhere I can put a loud, hot GPU (that probably has "gamer" lighting no less) in my fam's small apartment.

chisleu · 8 months ago

Right on. I also have a laundry list of ML things I want to do starting with fine tuning models.

I don't mind paying for models to do things like code. I like to move really fast when I'm coding. But for other things, I just didn't want to spend a week or two coming up on the hardware needed to build a GPU system. You can just order a big GPU box, but it's going to cost you astronomically right now. Building a system with 4-5 PCIE 5.0 x16 slots, enough power, enough pcie lanes... It's a lot to learn. You can't go on PC part picker and just hunt a motherboard with 6 double slots.

This is a machine to let me do some things with local models. My first goal is to run some quantized version of the new V3 model and try to use it for coding tasks.

I expect it will be slow for sure, but I just want to know what it's capable of.

datpuz · 8 months ago

I genuinely cannot wrap my head around spending this much money on hardware that is dramatically inferior to hardware that costs half the price. MacOS is not even great anymore, they stopped improving their UX like a decade ago.

chisleu · 8 months ago

How can you say something so brave, and so wrong?

storus · 8 months ago

If the rumors about splitting CPU/GPU in new Macs are true, your MacStudio will be the last one capable of running DeepSeek R1 671B Q4. It looks like Apple had an accidental winner that will go away with the end of unified RAM.

phren0logy · 8 months ago

I have not heard this rumor. Source?

prettyblocks · 9 months ago

I've been using openwebui and am pretty happy with it. Why do you like lm studio more?

prophesi · 9 months ago

Not OP, but with LM Studio I get a chat interface out-of-the-box for local models, while with openwebui I'd need to configure it to point to an OpenAI API-compatible server (like LM Studio). It can also help determine which models will work well with your hardware.

LM Studio isn't FOSS though.

I did enjoy hooking up OpenWebUI to Firefox's experimental AI Chatbot. (browser.ml.chat.hideLocalhost to false, browser.ml.chat.provider to localhost:${openwebui-port})

truemotive · 9 months ago

Open WebUI can leverage the built in web server in LM Studio, just FYI in case you thought it was primarily a chat interface.

s1mplicissimus · 8 months ago

i recently tried openwebui but it was so painful to get it to run with local model. that "first run experience" of lm studio is pretty fire in comparison. can't really talk about actually working with it though, still waiting for the 8GB download

Deleted Comment

karmakaze · 9 months ago

Nice. Ironically well suited for non-Apple Intelligence.

teaearlgraycold · 9 months ago

What are you going to do with the LLMs you run?

chisleu · 9 months ago

Currently I'm using gemini 2.5 and claude 3.7 sonnet for coding tasks.

I'm interested in using models for code generation, but I'm not expecting much in that regard.

I'm planning to attempt fine tuning open source models on certain tool sets, especially MCP tools.

sneak · 9 months ago

I already got one of these. I’m spoiled by Claude 4 Opus; local LLMs are slower and lower quality.

I haven’t been using it much. All it has on it is LM Studio, Ollama, and Stats.app.

> Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

lol, yup. same.

chisleu · 9 months ago

Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to stop using opus for plan mode in my Agent because it is just so expensive. I'm using Gemini 2.5 pro for that now.

I'm considering ordering one of these today: https://www.newegg.com/p/N82E16816139451?Item=N82E1681613945...

It looks like it will hold 5 GPUs with a single slot open for infiniband

Then local models might be lower quality, but it won't be slow! :)

Dead Comment

MCP terminology is already super confusing, but this seems to just introduce "MCP Host" randomly in a way that makes no sense to me at all.

> "MCP Host": applications (like LM Studio or Claude Desktop) that can connect to MCP servers, and make their resources available to models.

I think everyone else is calling this an "MCP Client", so I'm not sure why they would want to call themselves a host - makes it sound like they are hosting MCP servers (definitely something that people are doing, even though often the server is run on the same machine as the client), when in fact they are just a client? Or am I confused?

guywhocodes · 8 months ago

MCP Host is terminology from the spec. It's the software that makes llm calls, build prompts, interprets tool call requests and performs them etc.

sixhobbits · 8 months ago

So it is, I stand corrected. I googled mcp host and the lmstudio link was the first result.

Some more discussion on the confusion here https://github.com/modelcontextprotocol/modelcontextprotocol... where they acknowledge that most people call it a client and that that's ok unless the distinction is important.

I think host is a bad term for it though as it makes more intuitive sense for the host to host the server and the client to connect to it, especially for remote MCP servers which are probably going to become the default way of using them.

qntty · 8 months ago

It's confusing but you just have to read the official docs

https://modelcontextprotocol.io/specification/2025-03-26/arc...