I don't trust the code quality evalution. The other day at work I wanted to split my string by ; but only if it's not within single quotes (think about splitting many SQL statements). I explicitly asked for stdlib python solution and preferrably avoid counting quotes since that's a bit verbose.
GPT4 gave me a regex found on https://stackoverflow.com/a/2787979 (without "), explained it to me and then it successfully added all the necessary unit tests and they passed - I commited all of that to the repo and moved on.
I couldn't get 70B to answer this question even with multiple nudges.
Every time I try something non GPT-4 I always go back - it's feels like a waste of time otherwise. A bit sad that LLMs follow the typical winner-takes-it-all tech curve. However if you could ask the smartest guy in the room your question every time, why wouldn't you?
Thanks for the feedback, could you please post the cached Phind link so we can take a look?
It might also be helpful to try Phind Chat mode in cases like this.
EDIT: It seems like Phind-70B is capable of getting the right regex nearly every time when Chat mode is used or search results are disabled. It seems that the search results are polluting the answer for this example, we'll look into how to fix it.
I've tried it with a question which requires deeper expertise – "What is a good technique for device authentication in the context of IoT?" – and the Search mode is also worse than the Chat mode:
The search was heavily diluted by authentication methods that don't make any sense for machine-to-machine authentication, like multi-factor or biometric authentication, as well as the advice to combine several methods. It also falls into the, admittedly common, trap of assuming that certificate based authentication is more difficult to implement than symmetric key (i.e. pre-shared key) authentication.
The chat answer is not perfect, but the signal-to-noise ratio is much better. The multi-factor authentication advice is again present, but it's the only major error, and it also adds relevant side-topics that point in the right direction (secure credential storage, secure boot, logging of auth attempts). The Python example is cute, but completely useless, though (Python for embedded devices is rare and in any case you wouldn't want a raw TLS socket, but use it in a MQTTS / HTTPS / CoAP+DTLS stack, and last but not least, it provides a server instead of client, even though IoT devices mostly communicate outbound).
I didn't take a look at the code, but to me it sounds quite dangerous to take an implementation AND the unit tests straight from an LLM, commit and move on.
It's very powerful, I can enter implementations for any algorithm by typing 5 words and clicking tab. If I want the AI to use a hashmap to solve my problem in O(n), I just say that. If I need to rewrite a bunch of poorly written code to get rid of dead code, add constants, etc I do that. If I need to convert files between languages or formats, I do that. I have to do a lot more code review than before, and a lot less writing. It saves a huge amount of time, it's pretty easy to measure. Personally, the order of consultation is Github Copilot -> GPT4 -> Grimoire -> Me. If it's going to me, there is a high probability that I'm trying to do too many things at once in an over-complicated function. That or I'm using a relatively niche library and the AI doesn't know the methods.
Hopefully not, I feel it's a waste of time. The time spent on stupid minor mistakes by github copilot I didn't catch probably doesn't really compare to the time I would've spent typing on my own. (I only use that stuff for fancy code completion, nothing more. Every LLM is absolutely moronic. Yesterday I asked chatgpt to convert gohtml to templ, to no avail ...)
Agreed, though i'm _really_ interested in trying 1M token Gemini. The idea of uploading my full codebase for code assist stuff sounds really interesting. If i can ever get access to the damn thing...
Gemini is much better than the free version of GPT 3.5 though. At least in my experience.
Microsoft’s enterprise co-pilot is also fairly decent. It’s really good at providing help to Microsoft related issues or helping you find the right parts of their ridiculously massive documentation site. Which probably isn’t too weird considering.
In my experience, Bing's image search is way better than Google's. Also, I'm not going to use a search engine that I have to log in or do a captcha for.
The time complexity for all matching a string against any fixed regular expression is O(length of string).
If you want to talk about constant factors, we need to leave our comfortable armchairs and actually benchmark.
[Just to be clear, I am talking about real regular expressions, not Franken-xpressions with back-references etc here. But what the original commenter described is well within the realm of what you can do with regular expressions.]
You are right about escaped quotes etc. That's part of why parsing with regular expressions is hard.
"Can you give me an approach for a pathfinding algorithm on a 2D grid that will try to get me from point A to point B while staying under a maximum COST argument, and avoid going into tiles that are on fire, except if no other path is available under the maximum cost?"
I've never found an AI that could solve this, because there's a lot of literature online about A* and tiles with cost, and solving this requires a different approach
I don't care much for benchmarks, many models seems to be contaminated just to approach proprietary models in coding benchmarks.
I had never tried Phind before, but gave Phind-70B a spin today and so far found it to be really good for coding writing and understanding, maybe even GPT-4 level. Hard to tell for sure since I only tested it on a single problem: Writing some web3 code in typescript. This is what I did:
- Gave it some specifications of a react hook that subscribes to a smart contract event and fetches historical events starting from a block number. It completed successfully.
- Took this code and gave it to GPT-4 to explain what it did, as well as finding potential issues. GPT gave a list of potential issues and how to address.
- Then I went back to the Phind and asked it to find potential issues in the code it had just written, and it found more or less the same issues GPT-4 had found.
- Went back to GPT-4 and asked to write a different version of the hook.
- Took the GPT-4 written code and asked it to explain the code, which it did successfully (though I think it lacked more details than the GPT-4 explanation of the code written by Phind).
I will be testing this more over the next days. If this proves to be in the GPT-4 ballpark and the 70b weights are released, I will definitely replace my ChatGPT plus subscription with Phind Pro.
Not an expert at all. But just wanted to let the creators know: I've been using Phind almost daily for some months now and it's been awesome. Whenever I accidentally do a web search I recognize what a game changer this is. (ChatGPT probably as well, but never used it.) Last week I was under pressure at work and I used it for stuff like: "How can i capture output from a command and print it line by line to the console with Rust", and must say that kind of time and energy savings are very significant.
Just wait for people to stop using SO, at which point the LLMs won't have a high quality training set for new questions, so you won't get good answers from the LLMs anymore...
I don't use LLMs a lot, maybe once a week or so. But I always pick Phind as my first choice because it's not behind a login and I can use it without giving my phone number. Hopefully you'll keep it that way!
Phindational models, phintech, Phinterest, phinder… it might be the best startup name of all time. Hell, startup a password manager and call it Phinders’ Keeper.
Very nice. I've been working with GPT4 since it released, and I tried some of my coding tasks from today with Phind-70B. The speed, conciseness, and accuracy are very impressive. Subjectively, the answers it gives just feel better than GPT4, I'm definitely gonna give pro a try this month.
I prefer Phind's web search with LLM to both Google search and GPT-4. I have switched my default search engine, only using Google for finding sites, not for finding information anymore.
GPT-4 might be a better LLM but its search capability is worse, sometimes sends really stupid search keywords that are clearly not good enough.
I tried asking "What is the size of Phind-70B's context window?" and it couldn't answer the question. Strangely, it immediately found the page with the answer (https://www.phind.com/blog/introducing-phind-70b) but refused to acknowledge that the answer was there. I tried asking several ways. It even quoted the exact answer in the displayed snippet, but still said there was no answer!
Since you're here: have you considered moving to other, better generalist base models in the future? Particularly Deepseek or Mixtrals. Natural language foundation is important for reasoning. Codellama is very much a compromise, it has lost some NLP abilities from continued pretraining on code.
I tried a question about Snobol4 and was impressed with what it said (it couldn't provide an exact example due to paucity of examples). When testing more mainstream languages I have found it very helpful.
Hello Michael, lovely to see this, congrats. Do you already have an API? I could not see it on the site. If not, then do you know around when we can expect it? I am building a desktop BI app with hosted and local LLMs (need schema inference and text to SQL). Would be nice to have Phind as an option for users. Thanks
I'd suggest logging in in that case -- you will still get your free uses. The Phind-70B counter for non-logged in users has carried over from when we offered GPT-4 uses without a login. If you've already consumed those uses, you'll need to log in to use Phind-70B.
I have been using Phind almost daily for the past 3-4 weeks and the code it produces is pretty good and it is runnable on the first try more often compared to ChatGPT. Most of the time the answer is somewhat accurate and points me in the right direction.
ChatGPT (with GPT 4) has been slow af for me for the past 2+ months but I like studying a topic using ChatGPT, it is more verbose and explanatory when explaining things to you.
Maybe a purpose-built dedicated AI model is the right path. A model that does well in fixing bugs, writing feature code, and producing accurate code will not be a good tool for or conversational studying. And vice versa.
Also, I don't like that Phind is not handling the follow-up question that well when there are multiple kinds of questions within the same thread. ChatGPT is good at this.
I haven't actually because Phind is working for me so far whenever I have code-related questions or when I need to refactor my code. TIL that I can customize the answer style preference, will give it a try!
I'm impressed with the speed, really impressed, but not so much with the quality of the responses. This is a prompt I usually try with new LLMs:
> Acting as an expert Go developer, write a RoundTripper that retries failed HTTP requests, both GET and POST ones.
GPT-4 takes a few tries but usually takes the POST part into account, saving the body for new retries and whatnot. Phind in the other hand, in the two or three times I tried, ignores the POST part and focus on GET only.
Maybe that problem is just too hard for LLMs? Or the prompt sucks? I'll see how it handle other things since I still have a few tries left.
Thanks, can you send the cached link please? I'd also suggest trying Chat mode for questions like this, where there are unlikely to benefit from an internet search.
Just tried your query now and it seemed to work well -- what are your thoughts?
A fun little challenge I like to give LLMs is to ask some basic logic puzzles, i.e. how can I measure 2 liters using a 3 liter and a 5 liter container? Usually if it's possible, they seem to do ok. When it's not possible, they produce a variety of wacky results. Phind-34B is rather amusing, and seems to get stuck in a loop: https://www.phind.com/agent?cache=clsxpravk0001la081cc9dl45
1. phind was by far the best - gave me solution in just 2 steps
2. Grok was second best - it did arrive at the solution but with additional non-sense step. But the solution was correct.
3. To my surprise GPT-4 could not solve the prompt and in fact gave a wrong answer in 4 steps - "Now you should have exactly 4 liters in the 5-liter container." which is not what I asked
4. As expected Gemini pro was the worst. It asks me to pour completely filled up 3L container into 5L and then you will be left with 2L in 3L container.. LOL that does not even make sense.
These are interesting tests. I wonder how far we are away from AIs solving these (the ones that have no solution) without any special programming to teach them how.
GPT4 gave me a regex found on https://stackoverflow.com/a/2787979 (without "), explained it to me and then it successfully added all the necessary unit tests and they passed - I commited all of that to the repo and moved on.
I couldn't get 70B to answer this question even with multiple nudges.
Every time I try something non GPT-4 I always go back - it's feels like a waste of time otherwise. A bit sad that LLMs follow the typical winner-takes-it-all tech curve. However if you could ask the smartest guy in the room your question every time, why wouldn't you?
---
Edit: USE CODE MODE and it'll actually solve it.
It might also be helpful to try Phind Chat mode in cases like this.
EDIT: It seems like Phind-70B is capable of getting the right regex nearly every time when Chat mode is used or search results are disabled. It seems that the search results are polluting the answer for this example, we'll look into how to fix it.
- Search: https://www.phind.com/search?cache=s4e576jlnp1mpw73n9iy4sqc
- Chat: https://www.phind.com/agent?cache=clsyev95o0006le08b5pjrs14
The search was heavily diluted by authentication methods that don't make any sense for machine-to-machine authentication, like multi-factor or biometric authentication, as well as the advice to combine several methods. It also falls into the, admittedly common, trap of assuming that certificate based authentication is more difficult to implement than symmetric key (i.e. pre-shared key) authentication.
The chat answer is not perfect, but the signal-to-noise ratio is much better. The multi-factor authentication advice is again present, but it's the only major error, and it also adds relevant side-topics that point in the right direction (secure credential storage, secure boot, logging of auth attempts). The Python example is cute, but completely useless, though (Python for embedded devices is rare and in any case you wouldn't want a raw TLS socket, but use it in a MQTTS / HTTPS / CoAP+DTLS stack, and last but not least, it provides a server instead of client, even though IoT devices mostly communicate outbound).
Dead Comment
Is this the new normal now?
Blindly copying code from any source and running it or committing it to your main branch without even the slightest critical glance is foolish.
But if there non-trivial logic in the code of the tests, I agree this is probably a risky approach.
Microsoft’s enterprise co-pilot is also fairly decent. It’s really good at providing help to Microsoft related issues or helping you find the right parts of their ridiculously massive documentation site. Which probably isn’t too weird considering.
If you want to talk about constant factors, we need to leave our comfortable armchairs and actually benchmark.
[Just to be clear, I am talking about real regular expressions, not Franken-xpressions with back-references etc here. But what the original commenter described is well within the realm of what you can do with regular expressions.]
You are right about escaped quotes etc. That's part of why parsing with regular expressions is hard.
Deleted Comment
"Can you give me an approach for a pathfinding algorithm on a 2D grid that will try to get me from point A to point B while staying under a maximum COST argument, and avoid going into tiles that are on fire, except if no other path is available under the maximum cost?"
I've never found an AI that could solve this, because there's a lot of literature online about A* and tiles with cost, and solving this requires a different approach
I see that the future is brighter than ever for the information security industry.
I had never tried Phind before, but gave Phind-70B a spin today and so far found it to be really good for coding writing and understanding, maybe even GPT-4 level. Hard to tell for sure since I only tested it on a single problem: Writing some web3 code in typescript. This is what I did:
- Gave it some specifications of a react hook that subscribes to a smart contract event and fetches historical events starting from a block number. It completed successfully.
- Took this code and gave it to GPT-4 to explain what it did, as well as finding potential issues. GPT gave a list of potential issues and how to address.
- Then I went back to the Phind and asked it to find potential issues in the code it had just written, and it found more or less the same issues GPT-4 had found.
- Went back to GPT-4 and asked to write a different version of the hook.
- Took the GPT-4 written code and asked it to explain the code, which it did successfully (though I think it lacked more details than the GPT-4 explanation of the code written by Phind).
I will be testing this more over the next days. If this proves to be in the GPT-4 ballpark and the 70b weights are released, I will definitely replace my ChatGPT plus subscription with Phind Pro.
Then you're not using AI, you're using your search engine. wink wink
GPT-4 might be a better LLM but its search capability is worse, sometimes sends really stupid search keywords that are clearly not good enough.
And are there plans to release any more weights? Perhaps one or two revisions behind your latest ones?
Here are a couple screenshots:
https://imgur.com/a/u7iKOywhttps://imgur.com/a/aHAto5H
And here's the link to the whole conversation:
https://www.phind.com/search?cache=zlaksmzkm0h5cpx8l95n62tl
Why is this happening? Does it generally have difficulty with reading web pages, or is there something strange about this particular question?
I'm not sure if it's really using the 34B model or if the UI is wrong about which one it used
0 Phind-70B uses left
And I've never made any selection there.
Dead Comment
I have been using Phind almost daily for the past 3-4 weeks and the code it produces is pretty good and it is runnable on the first try more often compared to ChatGPT. Most of the time the answer is somewhat accurate and points me in the right direction.
ChatGPT (with GPT 4) has been slow af for me for the past 2+ months but I like studying a topic using ChatGPT, it is more verbose and explanatory when explaining things to you.
Maybe a purpose-built dedicated AI model is the right path. A model that does well in fixing bugs, writing feature code, and producing accurate code will not be a good tool for or conversational studying. And vice versa.
Also, I don't like that Phind is not handling the follow-up question that well when there are multiple kinds of questions within the same thread. ChatGPT is good at this.
You can tell it to be more explanatory for certain topics.
> Acting as an expert Go developer, write a RoundTripper that retries failed HTTP requests, both GET and POST ones.
GPT-4 takes a few tries but usually takes the POST part into account, saving the body for new retries and whatnot. Phind in the other hand, in the two or three times I tried, ignores the POST part and focus on GET only.
Maybe that problem is just too hard for LLMs? Or the prompt sucks? I'll see how it handle other things since I still have a few tries left.
Just tried your query now and it seemed to work well -- what are your thoughts?
https://www.phind.com/search?cache=tvyrul1spovzcpwtd8phgegj
https://www.phind.com/search?cache=k56i132ekpg43zdc7j5z1h1x
I'll give chat mode a try. Didn't see that it existed until now.
EDIT
Chat mode didn't do much better:
https://www.phind.com/agent?cache=clsxpl4t80002l008v3vjqw5j
For the record, this is the interface I asked it to implement:
https://pkg.go.dev/net/http#RoundTripper
Phind still forgot about POST, but at least now it got the interface right.
https://www.phind.com/search?cache=ipu8z1tb3bnn7nfgfibcix38
1. phind was by far the best - gave me solution in just 2 steps
2. Grok was second best - it did arrive at the solution but with additional non-sense step. But the solution was correct.
3. To my surprise GPT-4 could not solve the prompt and in fact gave a wrong answer in 4 steps - "Now you should have exactly 4 liters in the 5-liter container." which is not what I asked
4. As expected Gemini pro was the worst. It asks me to pour completely filled up 3L container into 5L and then you will be left with 2L in 3L container.. LOL that does not even make sense.
Deleted Comment