We’ve found that most technical searches fall into a few categories: ad-hoc how-tos, understanding an API, recalling forgotten details, research, or troubleshooting. Google is too broad and shallow of a search tool to be good at this. Even after sifting through the deluge of spammy, irrelevant sites pumped full of SEO, you still have to manually find your answer through discussion boards or documentation. Their “featured snippet” approach works for simple factoid queries but quickly falls apart if a question requires reasoning about information across multiple webpages.
Our approach is narrow and deep — to retrieve detailed information for topics relevant to developers. When you submit a query, we pull raw site data from Bing, rerank them, and extract understanding and code snippets with our proprietary large language models. We use seq-to-seq transformer models to generate a final explanation from all of this input.
For our honors theses at UT Austin, we researched prototypes of large generative language models that can answer complex questions by combining information from multiple sources. We found that GPT-3, GPT-Neo/J/X, and similar autoregressive language models that predict text from left to right are prone to “hallucinating” and generating text inconsistent with the “ground truth” document. Training a sequence-to-sequence language model (T5 derivative) on our custom dataset designed for factual generation yielded much better results with less hallucination.
After creating this prototype, we started actively developing Hello with the idea that searching should be just like talking to a smart friend. We want to build an engine that explains complex topics clearly and concisely, and lets users ask follow-up questions using the context of their previous searches.
For example, when asked “what type of semaphore can function as a mutex?”, Hello pulls in the raw text from all five search results linked on the search page to generate: “A binary semaphore can be used as a mutex. Mutexes and semaphores are two different types of synchronization mechanisms. A mutex is a lock that prevents two threads from accessing the same resource at the same time. A semaphore is used to signal that a resource has become available.” We're biased, of course, but we think that the ability to reason abstractly about information from multiple web pages is a cool thing in a search engine!
We use BERT-based models to extract and rank code snippets if relevant to the query. Our search engine currently does well at answering applicable how-to questions such as “Sort a list of tuples by the second element”, “Set a response cookie in FastAPI”, “Get value of input in React”, “How to implement Dijkstra's algorithm.” Exclusively using our own models has also freed us from dependence on OpenAI.
Hello is and will always be free for individual devs. We haven’t rolled out any paid plans yet, but we’re planning to charge teams per user/month to use on internal data scattered around in wikis, documentation, slack, and emails.
We started Hello Cognition to scratch our own itch, but now we hope to improve the state of information retrieval for the greater developer community. If you'd like to be part of our product feedback and iteration process, we'd love to have you—please contact us at founders@sayhello.so.
We're looking forward to hearing your ideas, feedback, comments, and what would be helpful for you when navigating technical problems!
I searched the following in say hello.so.
"Service worker fails on request for audio file"
I got back a couple of results related to general service worker use but none that get close to discussing the core problem that lead to the solution.
The same query in Google returns several results that together pointed me to the solution (it was around range headers in requests for media data types).
This is just one example though. I think the problem you are trying to fix is worth the effort. I just wonder if this is where humans are still stronger than computers - gathering unstructured data to use in problem solving.
Then again maybe that's just me.
My co-founder and I were building the same product as you are some time ago [1]. We managed to scale it to around 5k WAU before we decided to pivot for various reasons.
If you think there might be any useful information and experience we could share with you, please shoot me an email - vasek@usedevbook.com. I'd love to help in any way I can to help you guys succeed.
[1] https://www.producthunt.com/products/devbook
I've played around just a bit and clicked some of the preset examples and like what I'm seeing so far. I bookmarked it and will try it out more as I code over the next few days.
Main initial feedback: I'd really like to see version/last-updated-at info accompanying all results. One of the biggest problems with Google for code stuff is finding outdated examples and docs. Even better would be a dropdown that lets me see results depending on the version of the language/framework/tools I'm using.
How do you see navigating this space when this can be considered a nice to have versus a strict need?
1. In my work (also at UT actually: Hook 'em), we've found that the hallucination problem is, in part, lessened by over-parametrizing the model. Places that have the budget to do this have noticed that the performance of ml4code transformers increases linearly for every 1e3 increase in the number of parameters (with no drop off in sight). Love to hear your thoughts on this.
2. I'm concerned that finding code snippets from a short form query is underspecifing the problem too much and may not be the best user-interaction model. Let's compare your system to something like Github Copilot. I pass a query:
> how to normalize the rows of a tensor pytorch
With GitHub Copilot, I can demonstrate intent in the development environment itself with an IO example / comment / both and interact more efficiently. If I see errors in the synthesized snippet, I can change the query in >1 second etc. Etc. This is hard with a search engine style interactive environment. For this query, I had to navigate to the website, type in the query, check the results (which were wrong for me btw. Y'all might need to check correctness of the snippets), copy back the result, maybe go to the relevant thread and parse more closely etc. A good question to keep in mind here would be to figure out how to make this process more interactive.
3. Finally, I just want to say that the website is phenomenal, even on mobile. Kudos on the frontend/backend/architecture side of things.
Also, don't let my or anyone else's comments take away from the awesome work y'all have done!!! I pulled out that example from a paper I read recently called TF-coder. They have a dataset of these examples as part of their supplement material. All the best!
One feature request at first glance: please default to the system font stack for code snippets. I see you're currently using Consolas, a Microsoft typeface, which is not pleasant to see as a mac user.
You can use this to default to the system font on every platform:
Let's say I'm searching for front-end frameworks. Each article has the word "best" in the title, yet doesn't link to resources like State of JS, Stack Overflow Survey or other similar sites. So, in this context "best" is subjective. I can't be bothered with subjective results when I'm trying to find out what is actually considered "best" or in this case popular.
It would be amazing if this could be used for internal documentation however. Like we have so much documentation on our wiki which is just disorganised.
Also, stack overflow's search has always sucked. The way to find stuff on stack overflow has mostly been to use google.