I'm at Sourcegraph (mentioned in the blog post). We obviously have to deal with massive scale, but for anyone starting out adding code search to their product, I'd recommend not starting with an index and just doing on-the-fly searching until that does not scale. It actually will scale well for longer than you think if you just need to find the first N matches (because that result buffer can be filled without needing to search everything exhaustively). Happy to chat with anyone who's building this kind of thing, including with folks at Val Town, which is awesome.
It indeed is hard, and a good code search platform makes life so much easier. If I ever leave Google, the internal code search is for sure going to be the thing I miss the most. It's so well integrated into how everything else works (blaze target finding, guice bindings etc), I can't imagine my life without it.
I remember to appreciate it even more every time I use Github's search. Not that it's bad, it's just inherently so much harder to build a generalized code search platform.
Basic code searching skills seems like something new developers are never explicitly taught, but which is an absolutely crucial skill to build early on.
I guess the knowledge progression I would recommend would look something kind this:
- Learning about Ctrl+F, which works basically everywhere.
- Transitioning to ripgrep https://github.com/BurntSushi/ripgrep - I wouldn't even call this optional, it's truly an incredible and very discoverable tool. Requires keeping a terminal open, but that's a good thing for a newbie!
- Optional, but highly recommended: Learning one of the powerhouse command line editors. Teenage me recommended Emacs; current me recommends vanilla vim, purely because some flavor of it is installed almost everywhere. This is so that you can grep around and edit in the same window.
- In the same vein, moving back from ripgrep and learning about good old fashioned grep, with a few flags rg uses by default: `grep -r` for recursive search, `grep -ri` for case insensitive recursive search, and `grep -ril` for case insensitive recursive "just show me which files this string is found in" search. Some others too, season to taste.
- Finally hitting the wall with what ripgrep can do for you and switching to an actual indexed, dedicated code search tool.
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
I remember to appreciate it even more every time I use Github's search. Not that it's bad, it's just inherently so much harder to build a generalized code search platform.
Loading comment...
Loading comment...
I guess the knowledge progression I would recommend would look something kind this:
- Learning about Ctrl+F, which works basically everywhere.
- Transitioning to ripgrep https://github.com/BurntSushi/ripgrep - I wouldn't even call this optional, it's truly an incredible and very discoverable tool. Requires keeping a terminal open, but that's a good thing for a newbie!
- Optional, but highly recommended: Learning one of the powerhouse command line editors. Teenage me recommended Emacs; current me recommends vanilla vim, purely because some flavor of it is installed almost everywhere. This is so that you can grep around and edit in the same window.
- In the same vein, moving back from ripgrep and learning about good old fashioned grep, with a few flags rg uses by default: `grep -r` for recursive search, `grep -ri` for case insensitive recursive search, and `grep -ril` for case insensitive recursive "just show me which files this string is found in" search. Some others too, season to taste.
- Finally hitting the wall with what ripgrep can do for you and switching to an actual indexed, dedicated code search tool.
Loading comment...
Loading comment...
Loading comment...
I've been using Wikimedia's instance ( https://codesearch.wmcloud.org/search/ ) and have generally been pretty happy with what it provides.
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...