Phind-70B: Closing the code quality gap with GPT-4 Turbo while running 4x faster

I don't trust the code quality evalution. The other day at work I wanted to split my string by ; but only if it's not within single quotes (think about splitting many SQL statements). I explicitly asked for stdlib python solution and preferrably avoid counting quotes since that's a bit verbose.

GPT4 gave me a regex found on https://stackoverflow.com/a/2787979 (without "), explained it to me and then it successfully added all the necessary unit tests and they passed - I commited all of that to the repo and moved on.

I couldn't get 70B to answer this question even with multiple nudges.

Every time I try something non GPT-4 I always go back - it's feels like a waste of time otherwise. A bit sad that LLMs follow the typical winner-takes-it-all tech curve. However if you could ask the smartest guy in the room your question every time, why wouldn't you?

---

Edit: USE CODE MODE and it'll actually solve it.

rushingcreek · 2 years ago

Thanks for the feedback, could you please post the cached Phind link so we can take a look?

It might also be helpful to try Phind Chat mode in cases like this.

EDIT: It seems like Phind-70B is capable of getting the right regex nearly every time when Chat mode is used or search results are disabled. It seems that the search results are polluting the answer for this example, we'll look into how to fix it.

Perseids · 2 years ago

I've tried it with a question which requires deeper expertise – "What is a good technique for device authentication in the context of IoT?" – and the Search mode is also worse than the Chat mode:

- Search: https://www.phind.com/search?cache=s4e576jlnp1mpw73n9iy4sqc

- Chat: https://www.phind.com/agent?cache=clsyev95o0006le08b5pjrs14

The search was heavily diluted by authentication methods that don't make any sense for machine-to-machine authentication, like multi-factor or biometric authentication, as well as the advice to combine several methods. It also falls into the, admittedly common, trap of assuming that certificate based authentication is more difficult to implement than symmetric key (i.e. pre-shared key) authentication.

The chat answer is not perfect, but the signal-to-noise ratio is much better. The multi-factor authentication advice is again present, but it's the only major error, and it also adds relevant side-topics that point in the right direction (secure credential storage, secure boot, logging of auth attempts). The Python example is cute, but completely useless, though (Python for embedded devices is rare and in any case you wouldn't want a raw TLS socket, but use it in a MQTTS / HTTPS / CoAP+DTLS stack, and last but not least, it provides a server instead of client, even though IoT devices mostly communicate outbound).

afiodorov · 2 years ago

https://www.phind.com/search?cache=r2a52gs77wtmi277o0xi4z2a

Dead Comment

planb · 2 years ago

I didn't take a look at the code, but to me it sounds quite dangerous to take an implementation AND the unit tests straight from an LLM, commit and move on.

Is this the new normal now?

fileyfood500 · 2 years ago

It's very powerful, I can enter implementations for any algorithm by typing 5 words and clicking tab. If I want the AI to use a hashmap to solve my problem in O(n), I just say that. If I need to rewrite a bunch of poorly written code to get rid of dead code, add constants, etc I do that. If I need to convert files between languages or formats, I do that. I have to do a lot more code review than before, and a lot less writing. It saves a huge amount of time, it's pretty easy to measure. Personally, the order of consultation is Github Copilot -> GPT4 -> Grimoire -> Me. If it's going to me, there is a high probability that I'm trying to do too many things at once in an over-complicated function. That or I'm using a relatively niche library and the AI doesn't know the methods.

swman · 2 years ago

It’s the new boot camp dev. It is still the same as copy pasting SO solutions lol

Xenoamorphous · 2 years ago

I guess most people would review the code as if it had been written by a colleague?

RamblingCTO · 2 years ago

Hopefully not, I feel it's a waste of time. The time spent on stupid minor mistakes by github copilot I didn't catch probably doesn't really compare to the time I would've spent typing on my own. (I only use that stuff for fancy code completion, nothing more. Every LLM is absolutely moronic. Yesterday I asked chatgpt to convert gohtml to templ, to no avail ...)

ugh123 · 2 years ago

Presumably people look at things before committing the code. And code reviews and pull requests are still normal.

Blindly copying code from any source and running it or committing it to your main branch without even the slightest critical glance is foolish.

ogrisel · 2 years ago

Arguably the tests should be easier to review than the implementation.

But if there non-trivial logic in the code of the tests, I agree this is probably a risky approach.

romeros · 2 years ago

it really feels like GPT-4 is Google and Everybody else is Yahoo/Bing. i.e cute but not really

unshavedyak · 2 years ago

Agreed, though i'm _really_ interested in trying 1M token Gemini. The idea of uploading my full codebase for code assist stuff sounds really interesting. If i can ever get access to the damn thing...

devjab · 2 years ago

Gemini is much better than the free version of GPT 3.5 though. At least in my experience.

Microsoft’s enterprise co-pilot is also fairly decent. It’s really good at providing help to Microsoft related issues or helping you find the right parts of their ridiculously massive documentation site. Which probably isn’t too weird considering.

HKH2 · 2 years ago

In my experience, Bing's image search is way better than Google's. Also, I'm not going to use a search engine that I have to log in or do a captcha for.

meindnoch · 2 years ago

Doesn't handle escaped quotes, and the time complexity of that regex is very bad.

eru · 2 years ago

The time complexity for all matching a string against any fixed regular expression is O(length of string).

If you want to talk about constant factors, we need to leave our comfortable armchairs and actually benchmark.

[Just to be clear, I am talking about real regular expressions, not Franken-xpressions with back-references etc here. But what the original commenter described is well within the realm of what you can do with regular expressions.]

You are right about escaped quotes etc. That's part of why parsing with regular expressions is hard.

Deleted Comment

sebstefan · 2 years ago

Can you try this?

"Can you give me an approach for a pathfinding algorithm on a 2D grid that will try to get me from point A to point B while staying under a maximum COST argument, and avoid going into tiles that are on fire, except if no other path is available under the maximum cost?"

I've never found an AI that could solve this, because there's a lot of literature online about A* and tiles with cost, and solving this requires a different approach

jeffbee · 2 years ago

> I wanted to split my ... SQL statements ... avoid counting quotes ... GPT4 gave me a regex ... I commited all of that to the repo

I see that the future is brighter than ever for the information security industry.

xyzzy_plugh · 2 years ago

Sure is! We've got a bright and oh so plentiful road ahead, pending we can avoid blowing up the planet.

ldjkfkdsjnv · 2 years ago

Yup, LLMs broke well known benchmarks

kunalgupta · 2 years ago

same exp

Phind founder here. You can try the model for free, without a login, by selecting Phind-70B from the homepage: https://phind.com.

Fervicus · 2 years ago

I don't use LLMs a lot, maybe once a week or so. But I always pick Phind as my first choice because it's not behind a login and I can use it without giving my phone number. Hopefully you'll keep it that way!

HKH2 · 2 years ago

https://labs.perplexity.ai is the same and it loads much faster than Phind.

worldsayshi · 2 years ago

I don't see how they could. They need to finance it at some point?

bee_rider · 2 years ago

Important and hard-hitting question from me: have you ever considered calling yourself the Phinder or the Phiounder?

bbor · 2 years ago

Phindational models, phintech, Phinterest, phinder… it might be the best startup name of all time. Hell, startup a password manager and call it Phinders’ Keeper.

fragmede · 2 years ago

Find Phounder

ComputerGuru · 2 years ago

And here I was wondering why this service was called pee-hind!

Zacharias030 · 2 years ago

or the PhiTO / PhiEO

carbocation · 2 years ago

It seems unexpected that other people can edit a link to a Phind chat just by getting the URL. It means that if you share a URL with someone, they can change your results: https://www.phind.com/search?cache=k56i132ekpg43zdc7j5z1h1x

goldemerald · 2 years ago

Very nice. I've been working with GPT4 since it released, and I tried some of my coding tasks from today with Phind-70B. The speed, conciseness, and accuracy are very impressive. Subjectively, the answers it gives just feel better than GPT4, I'm definitely gonna give pro a try this month.

visarga · 2 years ago

I prefer Phind's web search with LLM to both Google search and GPT-4. I have switched my default search engine, only using Google for finding sites, not for finding information anymore.

GPT-4 might be a better LLM but its search capability is worse, sometimes sends really stupid search keywords that are clearly not good enough.

declaredapple · 2 years ago

Any chances of an API?

And are there plans to release any more weights? Perhaps one or two revisions behind your latest ones?

parineum · 2 years ago

Ask phind to make you one that screen scrapes

zestyping · 2 years ago

I tried asking "What is the size of Phind-70B's context window?" and it couldn't answer the question. Strangely, it immediately found the page with the answer (https://www.phind.com/blog/introducing-phind-70b) but refused to acknowledge that the answer was there. I tried asking several ways. It even quoted the exact answer in the displayed snippet, but still said there was no answer!

Here are a couple screenshots:

https://imgur.com/a/u7iKOyw https://imgur.com/a/aHAto5H

And here's the link to the whole conversation:

https://www.phind.com/search?cache=zlaksmzkm0h5cpx8l95n62tl

Why is this happening? Does it generally have difficulty with reading web pages, or is there something strange about this particular question?

airgapstopgap · 2 years ago

Since you're here: have you considered moving to other, better generalist base models in the future? Particularly Deepseek or Mixtrals. Natural language foundation is important for reasoning. Codellama is very much a compromise, it has lost some NLP abilities from continued pretraining on code.

shrubble · 2 years ago

I tried a question about Snobol4 and was impressed with what it said (it couldn't provide an exact example due to paucity of examples). When testing more mainstream languages I have found it very helpful.

bobbyi · 2 years ago

I'm selecting 70B and it is coming back with "Answer | Phind-34B Model".

I'm not sure if it's really using the 34B model or if the UI is wrong about which one it used

anter · 2 years ago

You have to click on the "Chat" option at the top left corner, then it'll use the 70B model. I got stuck on that too til I figured that out.

rushingcreek · 2 years ago

Please try logging in in that case, you will still get your 10 free uses.

brainless · 2 years ago

Hello Michael, lovely to see this, congrats. Do you already have an API? I could not see it on the site. If not, then do you know around when we can expect it? I am building a desktop BI app with hosted and local LLMs (need schema inference and text to SQL). Would be nice to have Phind as an option for users. Thanks

petesergeant · 2 years ago

This is good stuff, congrats. Took a little detour, but GPT-4 does too (https://www.phind.com/agent?cache=clsxw1mru0033l908mojpvb3b)

robbomacrae · 2 years ago

Why do none of the graphs show the speed difference? That seems to be your biggest advantage and the subject line...

browningstreet · 2 years ago

Hmm, when I try I see this in the dropdown:

0 Phind-70B uses left

And I've never made any selection there.

rushingcreek · 2 years ago

I'd suggest logging in in that case -- you will still get your free uses. The Phind-70B counter for non-logged in users has carried over from when we offered GPT-4 uses without a login. If you've already consumed those uses, you'll need to log in to use Phind-70B.

justaj · 2 years ago

Are you considering adding more non-US payment methods for Phind Pro?

forevernoob · 2 years ago

For sure this. I've recently found out that you can only pay using credit card, US bank account or Cash App.

coder1001 · 2 years ago

API on the horizon?

acdanger · 2 years ago

Hi, when I try to use the 70B model from the homepage, the response indicates that it's using the 34B model.

rushingcreek · 2 years ago

Please try logging in in that case. You will get 10 free daily 70B uses.

Dead Comment