Codestral Mamba - Readit News

What are the steps required to get this running in VS Code?

If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.

(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)

leourbina · a year ago

If you can run this using ollama, then you should be able to use https://www.continue.dev/ with both IntelliJ and VSCode. Haven’t tried this model yet - but overall this plugin works well.

scosman · a year ago

They say no llama.cpp support yet, so no ollama yet (which uses llama.cpp)

osmano807 · a year ago

Unrelated, all my devices freeze when accessing this page, desktop Firefox and Chrome, mobile Firefox and Brave. Is this the best alternative to access code ai helpers besides the GitHub Copilot and Google Gemini on VSCode?

refulgentis · a year ago

"All you need is users" doesn't seem optimal IMHO, Stability.ai providing an object lesson in that.

They just released weights, and being a for profit, need to optimize for making money, not eyeballs. It seems wise to guide people to the API offering.

bhouston · a year ago

On top of Hacker News (the target demographic for coders) without an effective monetizable call to action? What a missed opportunity.

Github Copilot makes +100M/year, if not way way more.

Having a VS Code extension for Mistral would be a revenue stream if it was one-click and better or cheaper than Github Copilot. It is malpractice in my mind to not be doing this if you are investing in creating coding models.

passion__desire · a year ago

But they also signal competence in the space which means M&A. Or big nation states in future would hire them to produce country models once the space matures as was Emad's vision.

DalasNoin · a year ago

I feel like local models could be an amazing coding experience because you could disconnect from the internet. Usually I need to open chatgpt or google every so often to solve some issue or generate some function, but this also introduces so many distractions. imagine being able to turn off internet completely and only have a chat assistant that runs locally. I fear though that it is just going to be a bit to slow at generating tokens on CPU to not be annoying.

regularfry · a year ago

I don't have a gut feel for how much difference the Mamba arch makes to inference speed, nor how much quantisation is likely to ruin things, but as a rough comparison Mistral-7B at 4 bits per param is very usable on CPU.

The issue with using any local models for code generation comes up with doing so in a professional context: you lose any infrastructure the provider might have for avoiding regurgitation of copyright code, so there's a legal risk there. That might not be a barrier in your context, but in my day-to-day it certainly is.

sleepytimetea · a year ago

Looking through the Quickstart docs, they have an API that can generate code. However, I don't think they have a way to do "Day 2" code editing.

Also, doesn't seem to have a freemium tier...need to start paying even before trying it out ?

"Our API is currently available through La Plateforme. You need to activate payments on your account to enable your API keys."

sv123 · a year ago

I signed up when codestral was first available and put my payment details in. Been using it daily since then with continue.dev but my usage dashboard shows 0 tokens, and so far have not been billed for anything... Definitely not clear anywhere, but it seems to be free for now? Or some sort of free limit that I am not hitting.

PufPufPuf · a year ago

Currently the best (most user-friendly) way to run models locally is to use Ollama with Continue.dev. This one is not available yet, though: https://github.com/ggerganov/llama.cpp/issues/8519

yogeshp · a year ago

Website codegpt.co also has a plugin for both VS Code and Intellij. When model becomes available in Ollama, you can connect plugin in VS code to local ollama instance.

antifa · a year ago

Maybe not this model, but checkout TabbyML for offline/selfhostws LLMs in vscode.

sfsylvester · a year ago

Also looks like an older version of Codestral works well with TabbyML: https://tabby.tabbyml.com/blog/2024/07/09/tabby-codestral/

Thank you for sharing, this is almost exactly what I've been looking for, for ages!

I kinda just want something that can keep up with the original version of Copilot. It was so much better than the crap they’re pumping out now (keeps messing up syntax and only completing a few characters at a time).

terhechte · a year ago

Have you tried supermaven? (https://supermaven.com). I find it much better than copilot. Using it daily.

razodactyl · a year ago

Supposedly they were training on feedback provided by the plugin itself but that approach doesn't make sense to me because:

- I don't remember the shortcuts most of the time.

- When I run completions I double take and realise they're wrong.

- I am not a good source of data.

All this information is being fed back into the model as positive feedback. So perhaps reason for it to have gone downhill.

I recall it being amazing at coding back in the day, now I can't trust it.

Of course, it's anecdotal which is also problematic in itself but I have definitely noticed the issue where it will fail and stop autocompleting or provide completely irrelevant code.

theptip · a year ago

It could also be that back in the day they were training with a bit more code than they should have been (eg private repos) and now the lawyers are more involved the training set is smaller/more sanitized.

Pure speculation of course.

heeton · a year ago

Have you tried supermaven? It replaced copilot for me a couple of months ago.

karolist · a year ago

I tried it, uses GPT-4o, the $10 sign up credit dissapeared in a few hours of intense coding, I'm not paying $500/mo for a fancy autocomple. Manual instruct style chat about code with Claude-Sonnet-3.5 is the best price/perf I've tried so far, through poe.com I use around 30k credits per day of coding of the 1M monthly allotment, I think it was $200/y. It's not available directly in my country. I've tried a bunch of local models too but Claude is just next level and inference is very cheap.

function! GetSurroundingLines(n) let l:current_line = line('.') let l:start_line = max([1, l:current_line - a:n]) let l:end_line = min([line('$'), l:current_line + a:n]) let l:lines_before = getline(l:start_line, l:current_line - 1) let l:lines_after = getline(l:current_line + 1, l:end_line) return [l:lines_before, l:lines_after] endfunction function! AIComplete() let l:n = 256 let [l:lines_before, l:lines_after] = GetSurroundingLines(l:n) let l:prompt = '<PRE>' . join(l:lines_before, "\n") . ' <SUF>' . join(l:lines_after, "\n") . ' <MID>' let l:json_data = json_encode({ \ 'model': 'codellama:13b-code-q6_K', \ 'keep_alive': '30m', \ 'stream': v:false, \ 'prompt': l:prompt \ }) let l:response = system('curl -s -X POST -H "Content-Type: application/json" -d ' . shellescape(l:json_data) . ' http://localhost:11434/api/generate') let l:completion = json_decode(l:response)['response'] let l:paste_mode = &paste set paste execute "normal! a" . l:completion let &paste = l:paste_mode endfunction nnoremap <leader>c :call AIComplete()<CR>

solarkraft · a year ago

thot_experiment · a year ago

Does anyone have a favorite FIM capable model? I've been using codellama-13b through ollama w/ a vim extension i wrote and it's okay but not amazing, I definitely get better code most of the time out of Gemma-27b but no FIM (and for some reason codellama-34b has broken inference for me)

trissi · a year ago

I use deepseek-coder-7b-instruct-v1.5 & DeepSeek-Coder-V2-Lite-Instruct when I want speed & codestral-22B-v0.1 when I want smartness.

All of those are FIM capable, but especially deepseek-v2-lite is very picky with its prompt template so make sure you use it correctly...

Depending on your hardware codestral-22B might be fast enough for everything, but for me it's a bit to slow...

If you can run it deepseek v2 non-light is amazing, but it requires loads of VRAM

IIRC the codestral fim tokens aren't properly implemented in llama.cpp/ollama, what backend are you using to run them? id probably have to drop down to iq2_xxs or something for the full fat deepseek but I'll definitely look into codestral, I'm a big fan of mixtral, hopefully a MoE code model with FIM comes along soon.

EDIT: nvm, my mistake looks like it works fine https://github.com/ollama/ollama/issues/5403

xoranth · a year ago

Is the extension you wrote public?

no but it's super super janky and simple hodgepode of stack overflow and gemma:27b generated code, i'll just put it in the comment here, you just need CURL on your path and vim that's compiled with some specific flag

sa-code · a year ago

It's great to see a high-profile model using Mamba2!

imjonse · a year ago

The MBPP column should bold DeepSeek as it has a better score than Codestral.

smith7018 · a year ago

Which means Codestral Mamba and DeepSeek both lead four benchmarks. Kinda takes the air out the announcement a bit.

causal · a year ago

It should be corrected but the interesting aspect of this release is the architecture. To stay competitive while only needing linear inference time and supporting 256k context is pretty neat.

ed · a year ago

They're in roughly the same class but totally different architectures

Deepseek uses a 4k sliding window compared to Codestral Mamba's 256k+ tokens

attentive · a year ago

codegeex4-all-9b beats them "on paper" so that's why it's not in the benchmarks.

magnio · a year ago

They announce the model is on HuggingFace but don't link to it. Here it is: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

dvfjsdhgfv · a year ago

The link is already there in the text, they probably just fixed it.

flakiness · a year ago

So Mamba is supposed to be faster and the article claims that. But they don't have any latency numbers.

Has anyone tried this? And then, is it fast(er)?

monkeydust · a year ago

Any recommended product primers to Mamba vs Transformers - pros/cons etc?

red2awn · a year ago

A very good primer to state-space models (from which Mamba is based on) is The Annotated S4 [1]. If you want to dive into the code I wrote a minimal single-file implementation of Mamba-2 here [2].

[1]: https://srush.github.io/annotated-s4/

[2]: https://github.com/tommyip/mamba2-minimal

This video is good: https://www.youtube.com/watch?v=N6Piou4oYx8. As are the other videos on the same YouTube account.

For those who are text oriented: https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

The paper author has a blog series but I don't think it's for general public https://tridao.me/blog/2024/mamba2-part1-model/

ertgbnm · a year ago

https://www.youtube.com/watch?v=X5F2X4tF9iM

This is what introduced me to them. May be a bit outdated at this point.