What are the steps required to get this running in VS Code?
If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.
(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)
If you can run this using ollama, then you should be able to use https://www.continue.dev/ with both IntelliJ and VSCode. Haven’t tried this model yet - but overall this plugin works well.
Unrelated, all my devices freeze when accessing this page, desktop Firefox and Chrome, mobile Firefox and Brave.
Is this the best alternative to access code ai helpers besides the GitHub Copilot and Google Gemini on VSCode?
"All you need is users" doesn't seem optimal IMHO, Stability.ai providing an object lesson in that.
They just released weights, and being a for profit, need to optimize for making money, not eyeballs. It seems wise to guide people to the API offering.
On top of Hacker News (the target demographic for coders) without an effective monetizable call to action? What a missed opportunity.
Github Copilot makes +100M/year, if not way way more.
Having a VS Code extension for Mistral would be a revenue stream if it was one-click and better or cheaper than Github Copilot. It is malpractice in my mind to not be doing this if you are investing in creating coding models.
But they also signal competence in the space which means M&A. Or big nation states in future would hire them to produce country models once the space matures as was Emad's vision.
I feel like local models could be an amazing coding experience because you could disconnect from the internet. Usually I need to open chatgpt or google every so often to solve some issue or generate some function, but this also introduces so many distractions. imagine being able to turn off internet completely and only have a chat assistant that runs locally. I fear though that it is just going to be a bit to slow at generating tokens on CPU to not be annoying.
I don't have a gut feel for how much difference the Mamba arch makes to inference speed, nor how much quantisation is likely to ruin things, but as a rough comparison Mistral-7B at 4 bits per param is very usable on CPU.
The issue with using any local models for code generation comes up with doing so in a professional context: you lose any infrastructure the provider might have for avoiding regurgitation of copyright code, so there's a legal risk there. That might not be a barrier in your context, but in my day-to-day it certainly is.
I signed up when codestral was first available and put my payment details in. Been using it daily since then with continue.dev but my usage dashboard shows 0 tokens, and so far have not been billed for anything... Definitely not clear anywhere, but it seems to be free for now? Or some sort of free limit that I am not hitting.
Website codegpt.co also has a plugin for both VS Code and Intellij. When model becomes available in Ollama, you can connect plugin in VS code to local ollama instance.
I kinda just want something that can keep up with the original version of Copilot. It was so much better than the crap they’re pumping out now (keeps messing up syntax and only completing a few characters at a time).
Supposedly they were training on feedback provided by the plugin itself but that approach doesn't make sense to me because:
- I don't remember the shortcuts most of the time.
- When I run completions I double take and realise they're wrong.
- I am not a good source of data.
All this information is being fed back into the model as positive feedback. So perhaps reason for it to have gone downhill.
I recall it being amazing at coding back in the day, now I can't trust it.
Of course, it's anecdotal which is also problematic in itself but I have definitely noticed the issue where it will fail and stop autocompleting or provide completely irrelevant code.
It could also be that back in the day they were training with a bit more code than they should have been (eg private repos) and now the lawyers are more involved the training set is smaller/more sanitized.
I tried it, uses GPT-4o, the $10 sign up credit dissapeared in a few hours of intense coding, I'm not paying $500/mo for a fancy autocomple. Manual instruct style chat about code with Claude-Sonnet-3.5 is the best price/perf I've tried so far, through poe.com I use around 30k credits per day of coding of the 1M monthly allotment, I think it was $200/y. It's not available directly in my country. I've tried a bunch of local models too but Claude is just next level and inference is very cheap.
Does anyone have a favorite FIM capable model? I've been using codellama-13b through ollama w/ a vim extension i wrote and it's okay but not amazing, I definitely get better code most of the time out of Gemma-27b but no FIM (and for some reason codellama-34b has broken inference for me)
IIRC the codestral fim tokens aren't properly implemented in llama.cpp/ollama, what backend are you using to run them? id probably have to drop down to iq2_xxs or something for the full fat deepseek but I'll definitely look into codestral, I'm a big fan of mixtral, hopefully a MoE code model with FIM comes along soon.
no but it's super super janky and simple hodgepode of stack overflow and gemma:27b generated code, i'll just put it in the comment here, you just need CURL on your path and vim that's compiled with some specific flag
function! GetSurroundingLines(n)
let l:current_line = line('.')
let l:start_line = max([1, l:current_line - a:n])
let l:end_line = min([line('$'), l:current_line + a:n])
let l:lines_before = getline(l:start_line, l:current_line - 1)
let l:lines_after = getline(l:current_line + 1, l:end_line)
return [l:lines_before, l:lines_after]
endfunction
function! AIComplete()
let l:n = 256
let [l:lines_before, l:lines_after] = GetSurroundingLines(l:n)
let l:prompt = '<PRE>' . join(l:lines_before, "\n") . ' <SUF>' . join(l:lines_after, "\n") . ' <MID>'
let l:json_data = json_encode({
\ 'model': 'codellama:13b-code-q6_K',
\ 'keep_alive': '30m',
\ 'stream': v:false,
\ 'prompt': l:prompt
\ })
let l:response = system('curl -s -X POST -H "Content-Type: application/json" -d ' . shellescape(l:json_data) . ' http://localhost:11434/api/generate')
let l:completion = json_decode(l:response)['response']
let l:paste_mode = &paste
set paste
execute "normal! a" . l:completion
let &paste = l:paste_mode
endfunction
nnoremap <leader>c :call AIComplete()<CR>
It should be corrected but the interesting aspect of this release is the architecture. To stay competitive while only needing linear inference time and supporting 256k context is pretty neat.
A very good primer to state-space models (from which Mamba is based on) is The Annotated S4 [1]. If you want to dive into the code I wrote a minimal single-file implementation of Mamba-2 here [2].
If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.
(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)
They just released weights, and being a for profit, need to optimize for making money, not eyeballs. It seems wise to guide people to the API offering.
Github Copilot makes +100M/year, if not way way more.
Having a VS Code extension for Mistral would be a revenue stream if it was one-click and better or cheaper than Github Copilot. It is malpractice in my mind to not be doing this if you are investing in creating coding models.
The issue with using any local models for code generation comes up with doing so in a professional context: you lose any infrastructure the provider might have for avoiding regurgitation of copyright code, so there's a legal risk there. That might not be a barrier in your context, but in my day-to-day it certainly is.
Also, doesn't seem to have a freemium tier...need to start paying even before trying it out ?
"Our API is currently available through La Plateforme. You need to activate payments on your account to enable your API keys."
Thank you for sharing, this is almost exactly what I've been looking for, for ages!
- I don't remember the shortcuts most of the time.
- When I run completions I double take and realise they're wrong.
- I am not a good source of data.
All this information is being fed back into the model as positive feedback. So perhaps reason for it to have gone downhill.
I recall it being amazing at coding back in the day, now I can't trust it.
Of course, it's anecdotal which is also problematic in itself but I have definitely noticed the issue where it will fail and stop autocompleting or provide completely irrelevant code.
Pure speculation of course.
All of those are FIM capable, but especially deepseek-v2-lite is very picky with its prompt template so make sure you use it correctly...
Depending on your hardware codestral-22B might be fast enough for everything, but for me it's a bit to slow...
If you can run it deepseek v2 non-light is amazing, but it requires loads of VRAM
EDIT: nvm, my mistake looks like it works fine https://github.com/ollama/ollama/issues/5403
Deepseek uses a 4k sliding window compared to Codestral Mamba's 256k+ tokens
Has anyone tried this? And then, is it fast(er)?
[1]: https://srush.github.io/annotated-s4/
[2]: https://github.com/tommyip/mamba2-minimal
The paper author has a blog series but I don't think it's for general public https://tridao.me/blog/2024/mamba2-part1-model/
This is what introduced me to them. May be a bit outdated at this point.