Grepping for symbols like function names and class names feels so anemic compared to using a tool that has a syntactic understanding of the code. Just "go to definition" and "find usages" alone reduce the need for text search enormously.
For the past decade-plus I have mostly only searched for user facing strings. Those have the advantage of being longer, so are more easily searched.
Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.
Scenarios where an IDE with full syntactic understanding is better:
- It's your day to day project and you expect to be working in it for a long time.
Scenarios where grepping is more useful:
- Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.
- You just opened the project for the first time.
- It's in a language you don't daily drive (you write backend but have to delve in frontend code, it's a 3rd party library, it's configuration files, random json/xml files or data)
- You're editing or searching through documentation.
- You haven't even downloaded the project and are checking things out in github (or some similar site for your project).
- You're providing remote assistance to someone and you are not at your main development machine.
- You're remoting via SSH and have access to code there (say it's a python server).
Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.
Further important (to me) scenarios that also argue for greppability:
- greppability does not preclude IDE or language server tooling; there's often special cases where only certain e.g. context-dependant usages matter, and sometimes grep is the easiest way to find those.
- projects that include multiple languages, such as for instance the fairly common setup of HTML, JS, CSS, SQL, and some server-side language.
- performance in scenarios with huge amounts of code, or where you're searching very often (e.g. in each git commit for some amount of history)
- ease of use across repositories (e.g. a client app, a spec, and a server app in separate repos).
I treat greppability as an almost universal default. I'd much rather have code in a "weird" naming style in some language but have consistent identifiers across languages, than have normal-style-guide default identifiers in each language, but differing identifiers across languages. If code "looks weird", if anything that's often actually a _benefit_ in such cases, not a downside - most serialization libraries I use for this kind of stuff tend to do a lot of automagic mapping that can break in ways that are sometimes hard to detect at compile time if somebody renames something, or sometimes even just for a casing change or type change. Having a hint as to this fragility immediate at a glance even in dynamically typed languages is sometimes a nice side-effect. Very speculatively, I wouldn't be surprised if AI coding tools can deal with consistent names better than context-dependent ones too; greppability is likely not specifically about merely the tool grep.
And the best part is that there's almost no downside; it's not like you need to pick either a language server, IDE or grep - just use whatever is most convenient for each task.
Grep is also useful when IDE indexing isn't feasible for the entire project. At past employers I worked in monorepos where the sheer size of the index caused multiple seconds of delay in intellisense and UI stuttering; our devex team's preferred approach was to better integrate our IDE experience with the build system such that only symbols in scope of the module you were working on would be loaded. This was usually fine, and it works especially well for product teams, but it's a headache when you're doing cross-cutting work (e.g. for infrastructure projects/overhauls).
We also had a livegrep instance that we could use to grep any corporate repo, regardless of where it was hosted. That was extremely useful for investigating failures in build scripts that spanned multiple repositories (e.g. building a Go sidecar that relies on a service config in the Java monorepo).
> It's your day to day project and you expect to be working in it for a long time.
I don't think we need to restrict the benefits quite that much—if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.
Going further, I'd equally rather have plugins ready to go for every language my company works in and use them for exploring a foreign codebase. The navigation tools all work more or less the same, so it's not like I need to invest effort learning a new tool in order to benefit from navigation.
> Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.
Certainly don't sabotage, but some of these suggestions are bad for other reasons that aren't about grep.
For example: breaking the naming conventions of your language in order to avoid remapping is questionable at best. Operating like that binds your business logic way too tightly to the database representation, and while "just return the db object" sounds like a good optimization in theory, I've never not regretted having frontend code that assumes it's operating directly on database objects.
>It's your day to day project and you expect to be working in it for a long time.
Bold of everyone here to assume that everyone has a day to day project. If you're a consultant or for other reasons you're switching projects on a month to month basis, greppability is probably the top metric second to UT coverage.
> - Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.
LSP-based tools are fine with this, generally. A syntactic understanding is an incomplete solution. I suspect GP meant LSP. (as long as compile_commands.json or equivalent is avilable).
Many of those other caveats are non-issues once LSPs are widespread. Even Github has lsp-like go-to-def/go-to-ref, though it's not perfect.
> Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.
Your other points make sense, but in this case, at least for C/C++, you can generate a compile_commands.json that will let clangd interpret your code accurately.
If building with make just do `bear -- make` instead of `make`. If building with cmake pass `-DCMAKE_EXPORT_COMPILE_COMMANDS=1`.
I abandoned VSCode and went back to vim + ctags + ripgrep after a year with the most popular IDE. I miss some features but it didn’t give me a 10x or even 1.5x improvement in my own work along any dimension.
I attribute that mostly to my several decades of experience with vi(m) and command line tools, not to anything inherently bad about VSCode.
What counts as “better” tools has a lot of subjectivity and circumstances implied. No one set of tools works for everyone. I very often have to work over ssh on servers that don’t allow installing anything, much less Node and npm for VSCode, so I invest my time in the tools that always work everywhere, for the work I do.
The main project I’ve worked on for the last few years has a little less than 500,000 lines of code. VSCode’s LSP takes a few seconds fairly often to maintain the LSP indexes. Running ctags over the same code takes about a second and I can control when that happens. vim has no delays at all, and ripgrep can search all of the files in a second or two.
I have similar feelings... I still use IntelliJ IDEA for JVM languages, but for C, Rust, Go, Python, etc., I've been using vim for years (decades?), and that's just how I prefer to write code in those languages. I do have LSP plugins installed in vim for the languages I work in, and do have a key sequence mapped for jump-to-definition... but I still find myself (rip)grepping through the source at least as often as I j-t-d, maybe more often.
Did you consider Neovim? You get the benefit of vim while also being able to mix in as much LSP tooling as you like. The tradeoff is that it takes some time to set up, although that is getting easier.
That won’t make LSP go any faster though. There’s still something interesting in the fact that a ripgrep of every line in the codebase can still be faster than a dedicated tool.
VSCode is not an IDE, it's an extensible text editor. IDEs are integrated (it's in the name) and get developed as a whole. I'm 99% certain that if you were forced to spend a couple of months in a real IDE (like IDEA or Rider), you would not want to go back to vim, or any other text editor. Speaking as a long time user of both.
A good IDE can be so much better iff it understands the code. However this requires the IDE to be able to understand the project structure, dependencies etc. which can be considerable effort. In a codebase with many projects employing several different languages it becomes hard to get and maintain the IDE understands everything state.
And an IDE would also fail to find references for most of the cases described in the article: name composition/manipulation, naming consistency across language barriers, and flat namespaces in serialization. And file/path folder naming seems to be irrelevant to the smart IDE argument. "Naming things is hard"
And especially in large monorepos anything that understands the code can become quite sluggish. While ripgrep remains fast.
A kind of in-between I've found for some search and replace action is comby (https://comby.dev/). Having a matching braces feature is a godsend for doing some kind of replacements properly.
I think the first sentence of the author counters your comment.
What you described works best in a familiar codebase where the organizing principles have been maintained well and are familiar to the reader and the tools are just the extension of those organizing principles. Even then a deviation from those rules might produce gaps in understanding of what the codebase does.
And grep cuts right through that in a pretty universal way. What the post describes are just ways to not work against grep to optimize for something ephemeral.
Go to definition and find usages only work one symbol at a time. I use both, but I still use global find/replace for groups of symbols sharing the same concept.
For example if I want to rename all “Dog” (DogModel, DogView, DogController) symbols to “Wolf”, find/replace is much better at that because it will tell me about symbols I had forgotten about.
For that use case I think you can use treesitter[1] you can find Dog.* but only if it is a variable name, for example. Avoiding replacement inside of say literals.
Not everything you need to look for is a language identifier. I often grep for configuration option names in the code to see what the option actually does - sometimes it is easy to grep, sometimes there are too many matches, sometimes they cannot be found because option name composed in the code from separate unrepeatable (because of too many matches) parts. It's not hard to make config options greppable but some coders just don't care about this property.
strongly disagree here. This works if
- your IDE/language server is performant
- all the tools are fully set up
- you know how to query the specific semantic entity you're looking for (remembering shortcuts)
- you are only interested in a single specific semantic entity - mixing entities is rarely supported
I dont map out projects in terms of semantics, I map out projects in files and code - That makes querying intuitive and I can easily compose queries that match the specificity of what I care about (e.g. I might want to find a `Server` but I want to show both classes, interfaces and abstract classes).
For the specific toolchain I'm using - typescript - the symbol search is also unusable once it hits a certain project size, it's just way too slow for it to be part of my core workflow
Only thing I can recommend is using C# (obviously not always possible). Never had an issue with these functions in Visual Studio proper no matter how big the project.
On the flipside, IDE's can turn you into lazy, inefficient programmers by doing all the hand-holding for you.
If your feelings are anemic when tasked with doing a grep, its because you have lost a very valuable skill by delegating it to a computer. There are some things the IDE is never going to be able to find - lest it becomes the development environment - so keeping your grep fu sharpened is wise beyond the decades.
(Disclaimer: 40 years of software development, and vim+cscope+grep/silversearcher are all I really need, next to my compiler..)
Since when was that a bad thing? Since time immemorial, it has been hailed as a universal good for programmers to be lazy. I'm pretty sure Larry Wall has lots of jokes about this on Usenet.
Also, I can clearly remember switching from vim/emacs to Microsoft Visual Studio (please, don't throw your tomatoes just yet!). I was blown away by IntelliSense. Suddenly, I was focusing more on writing business logic, and less time searching for APIs.
I count the IDE and stuff like LSP as natural extensions of the compiler. For sure I grep (or equivalent) for stuff, but I highly prefer statically typed languages/ecosystems.
At the end of the day, I'm here to solve problems, and there's no end to them -- might as well get a head start.
I'm not feeling anemic. The tool is anemic, as in, underpowered. It returns crap you don't want, and doesn't return stuff you do want.
My grep-fu is fine. It's a perfectly good tool if you have nothing better. But usually you do have something better.
Using the wrong tool to make yourself feel cool is stupid. Using the wrong tool because a good tool could make you lazy shows a lack of respect for the end result.
Huh? I have an old hand-powered drill from my Grandpa in my workshop. I used it once for fun. For all other tasks I use a powered drill.
Same for IDEs.
They help your refactor and reason about code - both properties I value.
Sure, I could print it and use a textmarker, but I'm not Grandpa
The basis if this article (and its forebear "Too DRY - The Grep Test"[1]) is that grep is fragile. It's just fragile in a way that's different from the way that IDEs are fragile.
Even with IDEs, I find that I grep through source trees fairly often.
Sometimes it's because I don't completely trust the IDE to find everything I'm interested in (justifiably; sometimes it doesn't). Sometimes it's because I'm not looking to dive into the code and do serious work on it; I'm just doing a quick drive-by check/lookup for something. Sometimes it's because I'm ssh'd into another machine and I don't have the ability to easily open the sources in an IDE.
I've come to really like language servers for big personal and work projects where I already have my tools configured and tuned for efficiently working with it.
But being able to grep is really nice when trying to figure out something out about a source tree that I don't yet have set up to compile, nor am I a developer of. I.e., I've downloaded the source for a tool I've been using pre-built binaries of and am now trying to trace why I might be getting a particular error.
posts like this sound like the author routinely solves harder problems than you are, because the solutions you suggest don't work in the cases the post is about. we've had 'go to definition' since 01978 and 'find usages' since 01980, and you should definitely use them for the cases where they work
- dynamically built identifiers is 100% correct, never do this. Breaks both text search and symbol search, results in complete garbage code. I had to deal with bugs in early versions of docker-compose because of this.
- same name for things across the stack? Shouldn't matter, just use find usages on `getAddressById`. Also easy way to bait yourself because database fields aren't 1:1 with front-end fields in anything but the simplest of CRUD webshit.
- translation example: the fundamental problem is using strings as keys when they should be symbols. Flat vs nested is irrelevant here because you should be using neither.
- react component example: As I mentioned in another comment, trivially managed with Find Usages.
Nothing in here strikes me as "routinely solves harder problems," it's just standard web dev.
with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)
with a legacy codebase, or a fork of a dependency that had to be patched which uses an incompatible buildsystem, or any C/C++/obj-c/etc that heavily uses the preprocessor or nonstandard build practices, or codebases that mix lots of different languages over awkward FFI boundaries and so on and so forth -- there are so many situations where sometimes an IDE just can't get you 100% of the way there and you have to revert to grepping to do any real work
that being said, I don't fully support the idea of handcuffing your code in the name of greppability, but I think dismissing it as a metric under the premise that IDEs make grepping "obsolete" is a little bit hasty
> with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)
I wish, but no. I've found people will make a mess of everything. Which is why I don't trust solutions that rely on humans having more discipline, like what this article advocates.
In any situation where grep is your last saviour, you cannot rely on the greppability of the code. You'll have to check and double check everything, and still accept the risk of errors.
Working on a 32MLOC project, text search is still the quickest way to find a hook that gets you to the deeper investigation. From there, finding definitions/usage definitely matters.
You can maybe skip the greppability if the code base is of a size that you can hold the rough shape and names in your head, but a "get a list of things that sound like they might be related to my problem" operation is still extremely helpful. And it's also worth keeping in mind that greppability matters to onboarding.
Does that mean it should be an overriding design concern? No. But it does mean that if it's cheap to build greppable, you probably should, because it's a net positive.
Sure, if you have the luxury of having a functional IDE for all of your code.
You can't imagine how much faster I was than everybody else at answering questions about a large codebase just because I knew how to use ripgrep (on Windows). "Knowing how to grep" is a superpower.
A bit on the other side of the argument, I use grep plus find plus some shell work to do source code analysis for security reviews. grep doesn't really understand the syntax of languages, and that is mostly OK.
I've used this technique on auditing many code bases including the C family, perl, Visual Basic, C# and SQL.
With this sort of tool, I don't need to look for language-particular parsers--so long as the source is in a text file, this works well.
IDEs are cool and all, but there is no way I'm gonna let VSCode index my 80GB yocto tmp directory. Ctags can crunch the whole thing in a few minutes, and so can grep.
Plus there are cases where grep is really what you need, for example after updating a particular command line tool whose output changed, I was able to find all scripts which grepped the output of the tool in a way that was broken.
It seems like the law of diminishing returns; while I'm sure in a few cases this characteristic of a code writing style is extremely useful, it cuts into other things such as readability and conciseness. Fewer lines can mean fewer bugs, within reason, if you aren't in lisp and are using more than 3 parentheses, you might want to split it up because the compiler/JIT/interpreter is going to anyway.
Interface-heavy languages break IDEs. In .NET at least, "go to definition" jumps you to the interface definition which you probably aren't interested in (vs. the specific implementation you are trying to dig into). Also with .NET specifically XAML breaks IDE traceability as well.
I tried a good IDE recently: Jetbrains IntelliJ and Webstorm. Considered the topdog of IDEs. Was working on a typescript project which uses npm link to symlink another local project into the node_modules of current project.
The great IDEs IntelliJ and Webstorm stopped autosuggesting completions from the symlinked project.
Open up Sublime Text again. Worked perfectly. That is why Jetbrains and their behemoth IDEs are utter shite.
Write your code to have symmetry and make it easy to grep.
By not using literals everywhere.
All literals are defined somewhere (start of function, class etc) as enums or vars and used.
Just because I have 20 usage of 'shipping_address' doesn't mean I'll have this string 20 times in different places.
Grep has its place and I often need to grep code base which have been written without much thoughts towards DX. But writing it nicely allows LSP to take over.
This is what the article starts with: "Even in projects exclusively written by myself, I have to search a lot: function names, error messages, class names, that kind of thing."
All of that is trivial to search for with a tool that understands the language.
> Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.
Completely agreed. The React component example in the article is trivial solvable with any modern IDE; right click on class name, "Find Usages" (or use the appropriate hotkey, of course). Trying to grep for a class name when you could just do that is insane.
I mainly see this from juniors who don't know any better, but as seen in this thread and the article, there are also experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.
> experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.
Before calling people stubborn or assuming they got left behind out of ignorance, consider your assumptions. 40+ years experience, senior in both experience and age at this point. Long-term vim + command line tools user.
Do you have any evidence that shows "A good IDE alone will save you so much time?" Have you seen studies comparing productivity or code quality or any metric written by people using IDEs vs those using a plain editor with grep?
By "so much faster" what do you mean exactly? I have decades of experience with vim + ctags + grep (rg these days, because I don't want to get called a stubborn stick in the mud). I can find and change things in large codebases pretty fast. I used VSCode for a year on the same codebases and I didn't feel "so much faster," and I committed to it and watched numerous how-to videos and learned the tool well enough to train other programmers on it. No 10x improvement, not even 1.5x. For most tasks I would call it close to the same in terms of time taken to write code. After getting burned a couple times with "Replace symbol" in VSCode I stopped trusting it. After noticing the LSP failed to find some references I trusted it less. I know grep/ack/rg/ctags aren't perfect, but I also know their weaknesses and how to work with them to get them to do what I want. After a year I went back to vim + ctags + rg.
We might have more productive (and friendly) interactions as programmers if we remembered that not everyone works the same way, or on the same kind of code and projects. What we call "best practices" or "modern tools" largely come down to familiarity, received wisdom, opinion, and fashion -- almost never from rigorous metrics and testing. You like your IDE? Great! I like my tools too. Would either of us get "so much faster" using a different set of tools? Probably not. Trying to find the silver bullet that reduces accidental complexity in software development presents an ongoing challenge, but history shows that editors and IDEs don't do much because if they did programmers today would outperform old guys like me by 10x in a measurable way.
At the last full-time job I had, at an educational software company with 30+ programmers, everyone used Eclipse. My first day I got a new desktop with two big monitors, Eclipse installed, ready to go. I installed vim and the CLI subversion client and some other stuff and worked from the command line, as I usually do. I left one of the monitors off, I don't need that much screen space, and I don't have Twitter and Facebook and other junk running on a second monitor all day like most of the other people did. I got made fun of, old man using old tools. Then once a week, like clockwork, Eclipse would auto-install some updates and everyone came to a halt trying to resolve plugin version conflicts, getting the team in sync. Hours and hours wasted regularly just getting the IDE to work. That didn't affect me, I never opened Eclipse. Watching the other programmers it seemed really slow. So just maybe Eclipse could jump to a definition faster than vim + ctags (I doubt it), but amortized over a month Eclipse all by itself wasted more time than anyone possibly saved with the more powerful tool. Anecdote, I know, but I've seen this play out in similar ways at more than one shop.
Just last year a new hire at a place I freelance for spent days trying to get Jetbrains PHPStorm working on a shared remote dev server. Like VSCode it runs a heavy process on the server (including the LSP). Unlike VSCode, PHPStorm can actually kill the whole server, wasting everyone's time and maybe losing work. I have never seen vim or grep bring a whole server down. I could add up how much "faster" PHPStorm might turn out compared to vim, but it will have to recoup the days lost trying to get it to work at all first.
The second point here made me realize that it'd be super useful for a grep tool to have a "super case insensitive" mode which expands a search for, say, "FooBar|first_name" to something like /foo[-_]?bar|first[-_]?name/i, so that any camel/snake/pascal/kebab/etc case will match. In fact, I struggle to come up with situations where that wouldn't be a great default.
Hey, I just created a new tool called Super Grep that does exactly what you described.
I implemented a format-agnostic search that can match patterns across various naming conventions like camelCase, snake_case, PascalCase, kebab-case. If needed, I'll integrate in space-separated words.
I've just published the tool to PyPI, so you can easily install it using pip (`pip install super-grep`), and then you just run it from the command line with `super-grep`. You can let me know if you think there's a smarter name for it.
You should post this as a Show HN! But maybe wait a while (like a couple weeks or something) for the current thread to get flushed out of the hivemind cache.
pretty cool and to me a better approach than the prescriptive advice from the OP. to me the crux of the argument is to make the code more readable from a popular tool. but if this can be well-integrated into common ide (or even grep perhaps), it would take away most of the argument down to personal preference.
wow this is so cool!! it feels super amazing to dump a random idea on HN and then somebody makes it! i'm installing python as we speak just so i can use this.
Adding to that, I'm often bitten trying to search for user strings because they're split across lines to adhere to 80 characters.
So if I'm trying to locate the error message "because the disk is full" but it's in the code as:
... + " because the " +
"disk is full")
then it will fail.
So really, combining both our use cases, what would be great is to simply search for a given case-insensitive alphanumeric string in files that skips all non-alphanumeric characters.
So if I search for:
Foobar2
it would match all of:
FooBar2
foo_bar[2]
"Foo " + \
("bar 2")
foo.bar.2
And then in the search results, even if you get some accidental hits, you can be happy knowing that you didn't miss anything.
These are both of the problems I regularly have. The first one I immediately saw when reading the title of this submissionw as the "super case insensitive" that I often see when working on Go Codebases particularly when using a combination of Go Classes and YAML or JSON. Also happens with command line arguments being converted to variables.
But the string split thing you mentioned happens a lot when searching for OpenStack error messages in Python that is often split across lines like you showed. My current solution is to randomly shift what I'm searching for, or try pick the most unique line.
fwiw I pretty frequently use `first.?name` - the odds of it matching something like "FirstSname" are low enough that it's not an issue, and it finds all cases and all common separators in one shot.
(`first\S?name` is usually better, by ignoring whitespace -> better ignores comments describing a thing, but `.` is easier to remember and type so I usually just do that)
lets say someone would make a plugin for their favorite IDE for this kind of search. How would the details look like?
To keep it simple, lets assume we just do the super-case-insensitivity, without the other regex condition. Lets say the user searches for "first_name" and wants to find "FirstName".
one simple solution would be to have a convention where a word starts or ends, e.g. with " ". So the user would enter "first name" into the plugin's search field. The plugin turns it into "/first[-_]?name/i" and gives this regexp to the normal search of the IDE.
another simple solution would be to ignore all word boundaries. So when the user enters "first name", the regexp would become "/f[-_]?i[-_]?r[-_]?s[-_]?t[-_]?n[-_]?a[-_]?m[-_]?e[-_]?/i". Then the search would not only be super-case-insensitive, but super-duper-case-insensitive. I guess the biggest downside would be, that this could get very slow.
I think implementing a plugin like this would be trivial for most IDEs, that support plugins.
Hm I'd go even simpler than that. Notably, I'd not do this:
> So the user would enter "first name" into the plugin's search field.
Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.
Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)
Shame on me for jumping past the simple solutions, but...
If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.
Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.
IIUC, you're not missing anything though your interpretation is off from mine*. He wasn't saying it'd be hard, he was saying it should be done.
* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.
Nim comes bundled with a `nimgrep` tool [0], that is essentially grep on steroids. It has `-y` flag for style insensitive matching, so "fooBar", "foo_bar" and even "Foo__Ba_R" can be matched with a simple "foobar" pattern.
The other killer feature of nimgrep is that instead of regex, you can use PEG grammar [1]
Fuzzy search is not the same. For instance, it might by default match not only “FooBar” and “foo_bar” but also e.g. “FooQux(BarQuux)”, which in a large code base might mean hundreds of false positives.
Let's say you have a FilterModal component and you're using it like this: x-filter-modal
Improving the IDE to find one or the other by searching for one or the other is missing the point or the article, that consistency is important.
I'd rather have a simple IDE and a good codebase than the opposite.
In the example that I gave the worst thing is that it's the framework which forces you do use these two names for the same thing.
My point is that if grep tools were more powerful we wouldn't need this very particular kind of consistency, which gives us the very big benefit of being allowed to keep every part of the codebase in its idiomatic naming convention.
I didn't miss the point, I disagreed with the point because I think it's a tool problem, not a code problem. I agree with most other points in the article.
I advocate for greppability as well – and in Swedish it becomes extra fun – as the equivalent phrase in Swedish becomes "grep-bar" or "grep-barhet" and those are actual words in Swedish – "greppbar" roughly means "understandable", "greppbarhet" roughly means "the possibility to understand"
We do tar, for xfz I think you have to look to the Slavic languages :)
Anyway, to answer your question:
$ grep -Fxf <(ls -1 /bin) /usr/share/dict/swedish
ack
ar
as
black
dialog
dig
du
ebb
ed
editor
finger
flock
gem
glade
grep
id
import
last
less
make
man
montage
pager
pass
pc
plog
red
reset
rev
sed
sort
sorter
split
stat
tar
test
transform
vi
:)
[edit]: Ironically, grep in that list is not the same word as the one OP is talking about. That one is actually based on grepp, with the double p. grep means pitchfork.
Norwegian still translates grep as "grip"/"grab". I always thought of grepping as reaching in with a hand into the text and grabbing lines. That association is close at hand (insert lame chuckle) for German and English speakers too.
Nah, you've got it backwards. The article isn't about dodging understanding - it's about making it way easier to spot patterns in your code. And that's exactly how you start to really get what's going on under the hood. Better searching = faster learning. It's like having a good map when you're exploring a new city
I've seen some pretty wild conditional string interpolation where there were like 3-4 separate phrases that each had a number of different options, something akin to `${a ? 'You' : 'we'} {b ? 'did' : 'will do' } {c ? 'thing' : 'things' }`.
When I was first onboarding to this project, I was tasked with updating a component and simply tried to find three of the words I saw in the UI, and this was before we implemented a straightforward path-based routing system. It took me far too long just to find what I was going to be working on, and that's the day I distinctly remember learning this lesson. I was pretty junior, but I'd later return to this code and threw it all away for a number of easily greppable strings.
I like the more robotic "Objects: 1" or "Objects: 2", since it avoids the pluralization problems entirely (e.g., in French 0 is singular, but in English it's plural; some words have special when pluralized, such as child -> children or attorney general -> attorneys general). And related to this article, it's more greppable/awkable, e.g. `awk /^Objects:/ && $2 > 10`.
This is the reason many coding styles and tools (including the Linux kernel coding style and the default Rust style as implemented in rustfmt) do not break string constants across lines even if they're longer than the desired line length: you might see the string in the program's output, and want to search for the same string in the code to find where it gets shown.
My team drives me bonkers with this. They hear the general principle "really long lines of code are bad", but extrapolate it to "no characters shall pass the soft gutter no matter what".
Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines. Now when you see that list of structs, you wonder "why is this one different?" and you have to read carefully to determine, nope, it just contained one longer string. Or god forbid the reformat all the structs to match, turning a 1-page file into 3 pages, and making it so you have to read and understand each element of each struct just to see what's going on.
If I could have written the rule of thumb, I would have said "No logic or control shall happen after the end of the gutter." But if there's a paragraph-long string on one line- who cares?? We all have a single keystroke that can toggle soft-wrap, and the odds that you're going to need to know anything about that string other than "it's a long string" are virtually nil.
Yep this triggers the fuck out of me too. It drives me absolutely insane when I'm taking the time and effort to write good test cases that use inline per test data that I've taken the time to format so it's nice and readable for the next person, then the next person comes along, spends 30 seconds writing some 2 line rubbish to hit a code coverage metric, then spends another 60 seconds adding a linter rule that blows all the test data out to 400 lines of unreadable dogshit that uses only the left 15% of screen real estate.
My team also had a similar thing in place. I am saving this article in my pocket saves, so that I can give "proofs" of why this is better
From Zen of Python:
```
Special cases aren't special enough to break the rules.
Although practicality beats purity.
```
https://peps.python.org/pep-0020/
This is why autoformatters that frob with line endings are just terrible and fundamentally broken.
I'm fairly firmly in the "wrap at 80" camp by the way; but sometimes a tad longer just makes sense. Or shorter for that matter: forced removal of line breaks is just as bad.
This is world autoformatters have wrought. The central dogma of the autoformatter is that "formatting" is based on dumb syntactic rules with no inflow of imprecise human judgements.
I have been places where we allow long strings, but other things aren’t allowed and generally 80 to 100 char limits otherwise. I like 100 for c++/java and 80 for C. If it gets much longer than that (not being strings) then it’s time for a rethink in most cases, grouping/scoping symbols are getting too deep. I’m sure other languages may or may not have that as a reasonable argument. It is just a rule of thumb though.
If I recall, rustfmt had a bug where long string literals (say, over 120 chars or so — or maybe if it was that the string was long enough to extend beyond the gutter when properly indented?) would prevent formatting of the entire file they were in. Has this been fixed?
Not the whole file, but sufficiently long un-line-breakable code in a complex statement can cause rustfmt to give up on trying to format that statement. That's a known issue that needs fixing.
Rust and Javascript and Lisp all get extra points because they put a keyword in front of every function definition. Searching for “fn doTheThing” or “defun do-the-thing” ensures that you find the actual definition. Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in. Some C coding conventions have you split the definition into two lines, first the return type on a line followed by a second line that starts with the function name. It looks ugly, but at least you can search for “^doTheThing” to find just the definition(s).
The "top-level declarations" in source files are exactly: package, import, const, var, type, func. Nothing else. If you're searching for a function, it's always going to start with "func", even if it's an anonymous function. Searching for methods implemented by a struct similarly only needs one to know the "func" keyword and the name of the struct.
Coming from a background of mostly Clojure, Common Lisp, and TypeScript, the "greppability" of Go code is by far the best I have seen.
Of course, in any language, Go included, it's always better to rely on static analysis tools (like the IDE or LSP server) to find references, definitions, etc. But when searching code of some open source library, I always resort to ripgrep rather than setting up a development environment, unless I found something that I want to patch (which in case I set up the devlopment environment and rely on LSP instead of grep to discover definitions and references).
I'm not so sure about greppability in the context of Go. At least at Google (where Go originates, and whose style guide presumably has strong influence on other organizations' use of the language), we discourage "stuttering":
> A piece of Go source code should avoid unnecessary repetition. One common source of this is repetitive names, which often include unnecessary words or repeat their context or type. Code itself can also be unnecessarily repetitive if the same or a similar code segment appears multiple times in close proximity.
This is the style rule that motivates the sibling comment about method names being split between method and receiver, for what it's worth.
I don't think this use case has received much attention internally, since it's fairly rare at Google to use grep directly to navigate code. As you suggest, it's much more common to either use your IDE with LSP integration, or Code Search (which you can get a sense of via Chromium's public repository, e.g. https://source.chromium.org/search?q=v8&sq=&ss=chromium%2Fch...).
Golang gets zero points from me because function receivers are declared between func and the name of the function. God ai hate this design choice and boy am I glad I can use golsp.
Go is horrible due to the absence of specific "interface implementation" markers. Gets pretty hard to find where or how a type implements an interface.
JavaScript has multiple ways to define a function so you sort of lose that getting the actual definition benefit.
on edit: I see someone discussed that you can grep for both arrow functions and named function at the same time and I suppose you can also construct a query that handles a function constructor as well - but this does not really handle curried functions or similar patterns - I guess at that point one is letting the perfect become the enemy of the good.
Most people grepping know the code base and the patterns in use, so they probably only need to grep for one type of function declaration.
C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {. In order to get a complete list of symbols you need to preprocess it, this requires knowing what the compiler flags are. Nobody really knows what their compiler flags are because they are hidden between multiple levels of indirection and a variety of build systems.
> C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {.
That’s not really C; that’s a C-based DSL. The same problem exists with Lisp, except even worse, since its preprocessor is much more powerful, and hence encourages DSL-creation much more than C does. But in fact, it can happen with any language - even if a language lacks any built-in processor or macro facility, you can always build a custom one, or use a general purpose macro processor such as M4.
If you are creating a DSL, you need to create custom tooling to go along with it - ideal scenario, your tools are so customisable that supporting a DSL is more about configuration than coding something from scratch.
Yes, the usefulness of macros always has to be balanced against their cost. I know of only one codebase that does this particular thing though, Emacs. It is used to define Lisp functions that are implemented in C.
Not JavaScript. Cool kids never write “function” any more, it’s all arrow functions. You can search for const, which will typically work, but not always (could be a let, var, or multi-const intializer).
I want to talk to the developer who considers greppability when deciding whether to use the "function" keyword but requires his definitions to be greppable by distancing them from their call locations. I just have a few questions for him.
You can search for both: "function" and "=>" to find all function expressions and arrow function expressions.
All named functions are easily searchable.
All anonymous functions are throw away functions that are only called in one place so you don't need to search for them in the first place.
As soon as an anonymous function becomes important enough to receive a label (i.e. assigning it to a variable, being assigned to a parameter, converting to function expression), it has also become searchable by that label too.
I used to define functions as `funcname (arglist)`
And always call the function as `funcname(args)`
So definitions have a space between the name and arg parentheses, while calls do not. Seemed to work well, even in languages with extraneous keywords before definitions since space + paren is shorter than most keywords.
Now days I don’t bother since it really isn’t that useful especially with tags or LSP.
I still put the return type on a line of its own, not for search/grep, but because it is cleaner and looks nice to me—overly long lines are the ugliest of coding IMO. Well that and excessive nesting.
Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in
That’s why in my personal projects I follow classic “type\nname” and grep with “^name\>”.
looks ugly
Single line definitions with long, irregular type names and unaligned function names look ugly. Col 1 names are not only greppable but skimmable. I can speedscroll through code and still see where I am.
Yet you reply to an article that defines functions as variables, which I've seen a lot of developers do usually for no good reason at all.
To me, that's a much common and worse practice with regards to greppability than splitting identifiers using string which I haven't seen much in the wild.
Although in rust, function like macros make it super hard to trace code. I like them when I am writing the code and hate then when I have to read others macros.
In the bygone days of ctags, C function definitions included a space before opening parenthesis, while function calls never had that space. I have a hard time remembering that modern coding styles never have that space and my IDE complains about it. (AFAIK, the modern gtags doesn't rely on that space to determine definitions.) Even without *tags, the convention made it easy to grep for definitions.
One thing which works for C is to search something like `[a-z] foo\(.+\) \{` assuming that spacing matches the coding style, often the shorter form `[a-z] foo\(` works well, which tries to ensure there is a type definition and bin assignment or something before name. Then there is only a handful false positives.
Not sure this is very true for Common Lisp. Classic example are accessor functions where the generic function is created by whichever class is defined first and the method where the class is defined. Other macros will construct new symbols for function names (or take them from the macro arguments).
That’s true, but I regard it as fairly minor. Accessor functions don't have any logic in them, so in practice you don’t have to grep for them. But it can be confusing for new players, since they don't know ahead of time which ones are accessors and which are not.
This is why in C projects libs go in "lib/" and sources go in "src/". If your header files have the same directory structure as libs, then "include/" is a also a decent way to find definitions.
C has "classical" tooling like Cscope and Exuberant Ctags. The stuff works very well, except on the odd weird code that does idiotic things that should not be done with preprocessing.
Even for Lisp, you don't want to be grepping, or at least not all the time for basic things.
For TXR Lisp, I provide a program that will scan code and build (or add to) your tags file (either a Vim or Emacs compatible one).
Given
(defstruct point ()
x
y)
it will let your editor jump to the definition of point, x and y.
Python is the only one mentioned that “actually works” without endless exceptions to the rule in the normal case. The ones mentioned (Rust/Javascript/Lisp/Go) all have specific syntax that is commonly enough used which makes it harder to search. Possible, absolutely, but still harder.
Do people really use text search for this rather than an IDE that parses all of the code and knows exactly where each declaration is, able to instantly jump to them from a key press on any usage...? Wild.
Yes. Not everyone uses or likes an IDE. Also, when you lean on an IDE for navigation, there is a tendency to write more complicated code, since it feels easy to navigate, you don't feel the pain.
That’s right, not everyone uses an LSP. Nothing wrong with LSPs, very useful tools. I use ripgrep, or plain grep if I have to, far more often than an LSP.
Working with legacy code — the scenario the author describes — I often can’t install anything on the server.
Rust though does lose some of those points by more or less forcing[1] snake_case. It's really annoying to navigate bindings which are converted from camelCase.
I don't care which case is used. It's a trivial superficial thing, and tribal zealotry about such doesn't reflect well on the language and community.
[1] The warnings can be turned off, but in some cases it requires ugly hacks, and the community seems to be actively hostile to making it easier)
The Rust community is no more zealous about naming conventions than any other language which has naming conventions. Perhaps you're arguing against the concept of naming conventions in general, but that's not a Rust thing, every language of the past 20 years suggests naming conventions if for no other reason than every language provides a standard library which needs to follow some sort of naming conventions itself. Turning off the warnings emitted by the Rust compiler takes two lines of code, either at the root of the crate or in the crate manifest.
Hard agree with the idea of greppability, but hard disagree about keeping names the same across boundaries.
I think the benefit of having one symbol exist in only one domain (e.g. “user_request” only showing up in the database-handling code, where it’s used 3 times, and not in the UI code, where it might’ve been used 30 times) reduces more cognitive load than is added by searching for 2 symbols instead of 1 common one.
I’ve also found that I sometimes really like when I grep for a symbol and hit some mapping code. Just knowing that some value goes through a specific mapping layer and then is never mentioned again until the spot where it’s read often answers the question I had by itself, while without the mapping code there’d just be no occurrences of the symbol in the current code base and I’d have no clue which external source it’s coming from.
Probably depends on how your system is structured. if you know you only want to look in the DB code, hopefully it is either all together or there is something about the folder naming pattern you can take advantage of when saying where to search to limit it.
The upside to doing it this way is it makes your grepping more flexible by allowing you to either only search the one part of the codebase to see say DB code or see all the DB and UI things using the concept.
Not to mention the readability hit from identifiers like foo.user_request in JavaScript, which triggers both linters and my own sense of language convention.
Both of those are easy to fix. You'll adapt quickly if you pick a different convention.
Additionally, I find that in practice such "unusual" code is actually beneficial - it often makes it easy to see at a glance that the code is somehow in sync with some external spec. Especially when it comes to implicit usages such as in (de)serialization, noticing that quickly is quite valuable.
I'd much rather trash every languages' coding conventions than use subtly different names for objects serialized and shared across languages. It's just a pain.
I agree that code searchability is a good thing but I disagree with those examples. They intentionally increase the chance of errors.
Maybe there’s an alternative way to achieve what the author set out but increasing searchability at the cost of increasing brittleness isn’t it for me.
The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.
const getTableName = (addressType: 'shipping' | 'billing') => {
if (addressType === 'shipping') {
return 'shipping_addresses'
}
if (addressType === 'billing') {
return 'billing_addresses'
}
throw new TypeError('addressType must be billing or shipping')
}
Similarly, flattening dictionaries for readability introduces the chance of a random typo making our lives hell. A single typo in the repetitions below will be awful.
Typos aren’t unlikely. In a codebase I work with, we have a perpetually open ticket about how ARTISTS is mistyped as ATRISTS in a similarly flat enum.
The issue can’t be solved easily because the enum is now copied across several codebases. But the ticket has a counter for the number of developers that independently discovered the bug and it’s in the mid two digits.
Typos are find-and-fix-once, while unsearchability is a maintenance burden forever.
I don't think coupling variable names by making sure they contain the same strings is the best way to show they're related, compared to an actual map from address type to table name. There might be a lot of things called 'shipping' in my app, only some of which are coupled to `shipping_addresses`.
Shouldn't a linter be able to catch that there is no enum member called MyEnum.ATRISTS, or is it not an actual enum?
> The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.
I think it depends on whether the repetition is accidental or intrinsic. Does the table name happen to contain the address type as a prefix, or does it intrinsically have to? Greppability aside, when things are incidentally related, it's often better to repeat yourself to not give the wrong impression that they're intrinsically related. Conversely, if they are intrinsically related (i.e. it's an invariant of the system that the table name starts with the address type as a prefix) then it's better for the code to align with that.
What happens when translation files get too big and you want to split and send only relevant parts?
Like send only auth keys when user is unauthenticated?
`return translations[auth][login]` is no longer possible.
Or just imagine you want to iterate through `auth` keys. _shudders_
Entrenched typos like ATRISTS are actually a greppability goldmine. Chances are there are more occurrences of pluralized people who are making art in the codebase, but only ATRISTS is the one from that enum.
I certainly would not suggest deliberately mistyping, but there are places where the benefit is approaching the cost. Certain log messages can absolutely benefit from subtle letter garbling that retains readability while adding uniqueness.
For the past decade-plus I have mostly only searched for user facing strings. Those have the advantage of being longer, so are more easily searched.
Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.
- It's your day to day project and you expect to be working in it for a long time.
Scenarios where grepping is more useful:
- Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.
- You just opened the project for the first time.
- It's in a language you don't daily drive (you write backend but have to delve in frontend code, it's a 3rd party library, it's configuration files, random json/xml files or data)
- You're editing or searching through documentation.
- You haven't even downloaded the project and are checking things out in github (or some similar site for your project).
- You're providing remote assistance to someone and you are not at your main development machine.
- You're remoting via SSH and have access to code there (say it's a python server).
Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.
- greppability does not preclude IDE or language server tooling; there's often special cases where only certain e.g. context-dependant usages matter, and sometimes grep is the easiest way to find those.
- projects that include multiple languages, such as for instance the fairly common setup of HTML, JS, CSS, SQL, and some server-side language.
- performance in scenarios with huge amounts of code, or where you're searching very often (e.g. in each git commit for some amount of history)
- ease of use across repositories (e.g. a client app, a spec, and a server app in separate repos).
I treat greppability as an almost universal default. I'd much rather have code in a "weird" naming style in some language but have consistent identifiers across languages, than have normal-style-guide default identifiers in each language, but differing identifiers across languages. If code "looks weird", if anything that's often actually a _benefit_ in such cases, not a downside - most serialization libraries I use for this kind of stuff tend to do a lot of automagic mapping that can break in ways that are sometimes hard to detect at compile time if somebody renames something, or sometimes even just for a casing change or type change. Having a hint as to this fragility immediate at a glance even in dynamically typed languages is sometimes a nice side-effect. Very speculatively, I wouldn't be surprised if AI coding tools can deal with consistent names better than context-dependent ones too; greppability is likely not specifically about merely the tool grep.
And the best part is that there's almost no downside; it's not like you need to pick either a language server, IDE or grep - just use whatever is most convenient for each task.
We also had a livegrep instance that we could use to grep any corporate repo, regardless of where it was hosted. That was extremely useful for investigating failures in build scripts that spanned multiple repositories (e.g. building a Go sidecar that relies on a service config in the Java monorepo).
I don't think we need to restrict the benefits quite that much—if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.
Going further, I'd equally rather have plugins ready to go for every language my company works in and use them for exploring a foreign codebase. The navigation tools all work more or less the same, so it's not like I need to invest effort learning a new tool in order to benefit from navigation.
> Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.
Certainly don't sabotage, but some of these suggestions are bad for other reasons that aren't about grep.
For example: breaking the naming conventions of your language in order to avoid remapping is questionable at best. Operating like that binds your business logic way too tightly to the database representation, and while "just return the db object" sounds like a good optimization in theory, I've never not regretted having frontend code that assumes it's operating directly on database objects.
Bold of everyone here to assume that everyone has a day to day project. If you're a consultant or for other reasons you're switching projects on a month to month basis, greppability is probably the top metric second to UT coverage.
You need a better IDE.
> - You just opened the project for the first time.
Go grab a coffee
> - It's in a language you don't daily drive
Jetbrains all products pack, baby.
> - You haven't even downloaded the project and are checking things out in github (or some similar site for your project).
On GitHub, press `.` to open it in a web-based vscode. Download it & open it in your IDE while you are doing this.
> - You're remoting via SSH and have access to code there (say it's a python server).
Don't do this. Check the git hash that was deployed and checkout the code locally.
LSP-based tools are fine with this, generally. A syntactic understanding is an incomplete solution. I suspect GP meant LSP. (as long as compile_commands.json or equivalent is avilable).
Many of those other caveats are non-issues once LSPs are widespread. Even Github has lsp-like go-to-def/go-to-ref, though it's not perfect.
Your other points make sense, but in this case, at least for C/C++, you can generate a compile_commands.json that will let clangd interpret your code accurately.
If building with make just do `bear -- make` instead of `make`. If building with cmake pass `-DCMAKE_EXPORT_COMPILE_COMMANDS=1`.
- the project is large enough that the IDE can't cope.
- you want to also match comments, commented out code or in-project documentation
- you want fuzzy search and match similarly named functions
I use clangd integration in my IDE all the time, but often brute force is the right solution.
I attribute that mostly to my several decades of experience with vi(m) and command line tools, not to anything inherently bad about VSCode.
What counts as “better” tools has a lot of subjectivity and circumstances implied. No one set of tools works for everyone. I very often have to work over ssh on servers that don’t allow installing anything, much less Node and npm for VSCode, so I invest my time in the tools that always work everywhere, for the work I do.
The main project I’ve worked on for the last few years has a little less than 500,000 lines of code. VSCode’s LSP takes a few seconds fairly often to maintain the LSP indexes. Running ctags over the same code takes about a second and I can control when that happens. vim has no delays at all, and ripgrep can search all of the files in a second or two.
That won’t make LSP go any faster though. There’s still something interesting in the fact that a ripgrep of every line in the codebase can still be faster than a dedicated tool.
A kind of in-between I've found for some search and replace action is comby (https://comby.dev/). Having a matching braces feature is a godsend for doing some kind of replacements properly.
And grep cuts right through that in a pretty universal way. What the post describes are just ways to not work against grep to optimize for something ephemeral.
For example if I want to rename all “Dog” (DogModel, DogView, DogController) symbols to “Wolf”, find/replace is much better at that because it will tell me about symbols I had forgotten about.
[1] https://www.youtube.com/watch?v=MZPR_SC9LzE
Some language servers support modifying the symbols in contexts like docstrings as well.
However, it does suggest that there is an opportunity for factoring "Dog" out in the code, at least by name spacing (e.g. Dog.Model).
I dont map out projects in terms of semantics, I map out projects in files and code - That makes querying intuitive and I can easily compose queries that match the specificity of what I care about (e.g. I might want to find a `Server` but I want to show both classes, interfaces and abstract classes).
For the specific toolchain I'm using - typescript - the symbol search is also unusable once it hits a certain project size, it's just way too slow for it to be part of my core workflow
They're either incomplete (you don't get ALL references or you get false references) or way too slow (>10 seconds when rg takes 1-2).
Recommendations are most welcome.
If your feelings are anemic when tasked with doing a grep, its because you have lost a very valuable skill by delegating it to a computer. There are some things the IDE is never going to be able to find - lest it becomes the development environment - so keeping your grep fu sharpened is wise beyond the decades.
(Disclaimer: 40 years of software development, and vim+cscope+grep/silversearcher are all I really need, next to my compiler..)
Also, I can clearly remember switching from vim/emacs to Microsoft Visual Studio (please, don't throw your tomatoes just yet!). I was blown away by IntelliSense. Suddenly, I was focusing more on writing business logic, and less time searching for APIs.
At the end of the day, I'm here to solve problems, and there's no end to them -- might as well get a head start.
I'm not feeling anemic. The tool is anemic, as in, underpowered. It returns crap you don't want, and doesn't return stuff you do want.
My grep-fu is fine. It's a perfectly good tool if you have nothing better. But usually you do have something better.
Using the wrong tool to make yourself feel cool is stupid. Using the wrong tool because a good tool could make you lazy shows a lack of respect for the end result.
1. <http://jamie-wong.com/2013/07/12/grep-test/>
Sometimes it's because I don't completely trust the IDE to find everything I'm interested in (justifiably; sometimes it doesn't). Sometimes it's because I'm not looking to dive into the code and do serious work on it; I'm just doing a quick drive-by check/lookup for something. Sometimes it's because I'm ssh'd into another machine and I don't have the ability to easily open the sources in an IDE.
But being able to grep is really nice when trying to figure out something out about a source tree that I don't yet have set up to compile, nor am I a developer of. I.e., I've downloaded the source for a tool I've been using pre-built binaries of and am now trying to trace why I might be getting a particular error.
- dynamically built identifiers is 100% correct, never do this. Breaks both text search and symbol search, results in complete garbage code. I had to deal with bugs in early versions of docker-compose because of this.
- same name for things across the stack? Shouldn't matter, just use find usages on `getAddressById`. Also easy way to bait yourself because database fields aren't 1:1 with front-end fields in anything but the simplest of CRUD webshit.
- translation example: the fundamental problem is using strings as keys when they should be symbols. Flat vs nested is irrelevant here because you should be using neither.
- react component example: As I mentioned in another comment, trivially managed with Find Usages.
Nothing in here strikes me as "routinely solves harder problems," it's just standard web dev.
with a legacy codebase, or a fork of a dependency that had to be patched which uses an incompatible buildsystem, or any C/C++/obj-c/etc that heavily uses the preprocessor or nonstandard build practices, or codebases that mix lots of different languages over awkward FFI boundaries and so on and so forth -- there are so many situations where sometimes an IDE just can't get you 100% of the way there and you have to revert to grepping to do any real work
that being said, I don't fully support the idea of handcuffing your code in the name of greppability, but I think dismissing it as a metric under the premise that IDEs make grepping "obsolete" is a little bit hasty
I wish, but no. I've found people will make a mess of everything. Which is why I don't trust solutions that rely on humans having more discipline, like what this article advocates.
In any situation where grep is your last saviour, you cannot rely on the greppability of the code. You'll have to check and double check everything, and still accept the risk of errors.
You can maybe skip the greppability if the code base is of a size that you can hold the rough shape and names in your head, but a "get a list of things that sound like they might be related to my problem" operation is still extremely helpful. And it's also worth keeping in mind that greppability matters to onboarding.
Does that mean it should be an overriding design concern? No. But it does mean that if it's cheap to build greppable, you probably should, because it's a net positive.
You can't imagine how much faster I was than everybody else at answering questions about a large codebase just because I knew how to use ripgrep (on Windows). "Knowing how to grep" is a superpower.
I've used this technique on auditing many code bases including the C family, perl, Visual Basic, C# and SQL.
With this sort of tool, I don't need to look for language-particular parsers--so long as the source is in a text file, this works well.
Plus there are cases where grep is really what you need, for example after updating a particular command line tool whose output changed, I was able to find all scripts which grepped the output of the tool in a way that was broken.
I only use grep to filter the output of CLI tools.
For code, I use my IDE or repository features.
Unfortunately sometimes you can't, and sometimes you can but people can't be arsed, so this is still a consideration.
I am also waiting for world peace! ; )
The great IDEs IntelliJ and Webstorm stopped autosuggesting completions from the symlinked project.
Open up Sublime Text again. Worked perfectly. That is why Jetbrains and their behemoth IDEs are utter shite.
Write your code to have symmetry and make it easy to grep.
Having dealt with IntelliJ for 3 years due to education stuff - I laughed out here. Even VS is better than ideaj.
Dead Comment
Just because I have 20 usage of 'shipping_address' doesn't mean I'll have this string 20 times in different places.
Grep has its place and I often need to grep code base which have been written without much thoughts towards DX. But writing it nicely allows LSP to take over.
All of that is trivial to search for with a tool that understands the language.
Completely agreed. The React component example in the article is trivial solvable with any modern IDE; right click on class name, "Find Usages" (or use the appropriate hotkey, of course). Trying to grep for a class name when you could just do that is insane.
I mainly see this from juniors who don't know any better, but as seen in this thread and the article, there are also experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.
Before calling people stubborn or assuming they got left behind out of ignorance, consider your assumptions. 40+ years experience, senior in both experience and age at this point. Long-term vim + command line tools user.
Do you have any evidence that shows "A good IDE alone will save you so much time?" Have you seen studies comparing productivity or code quality or any metric written by people using IDEs vs those using a plain editor with grep?
By "so much faster" what do you mean exactly? I have decades of experience with vim + ctags + grep (rg these days, because I don't want to get called a stubborn stick in the mud). I can find and change things in large codebases pretty fast. I used VSCode for a year on the same codebases and I didn't feel "so much faster," and I committed to it and watched numerous how-to videos and learned the tool well enough to train other programmers on it. No 10x improvement, not even 1.5x. For most tasks I would call it close to the same in terms of time taken to write code. After getting burned a couple times with "Replace symbol" in VSCode I stopped trusting it. After noticing the LSP failed to find some references I trusted it less. I know grep/ack/rg/ctags aren't perfect, but I also know their weaknesses and how to work with them to get them to do what I want. After a year I went back to vim + ctags + rg.
We might have more productive (and friendly) interactions as programmers if we remembered that not everyone works the same way, or on the same kind of code and projects. What we call "best practices" or "modern tools" largely come down to familiarity, received wisdom, opinion, and fashion -- almost never from rigorous metrics and testing. You like your IDE? Great! I like my tools too. Would either of us get "so much faster" using a different set of tools? Probably not. Trying to find the silver bullet that reduces accidental complexity in software development presents an ongoing challenge, but history shows that editors and IDEs don't do much because if they did programmers today would outperform old guys like me by 10x in a measurable way.
At the last full-time job I had, at an educational software company with 30+ programmers, everyone used Eclipse. My first day I got a new desktop with two big monitors, Eclipse installed, ready to go. I installed vim and the CLI subversion client and some other stuff and worked from the command line, as I usually do. I left one of the monitors off, I don't need that much screen space, and I don't have Twitter and Facebook and other junk running on a second monitor all day like most of the other people did. I got made fun of, old man using old tools. Then once a week, like clockwork, Eclipse would auto-install some updates and everyone came to a halt trying to resolve plugin version conflicts, getting the team in sync. Hours and hours wasted regularly just getting the IDE to work. That didn't affect me, I never opened Eclipse. Watching the other programmers it seemed really slow. So just maybe Eclipse could jump to a definition faster than vim + ctags (I doubt it), but amortized over a month Eclipse all by itself wasted more time than anyone possibly saved with the more powerful tool. Anecdote, I know, but I've seen this play out in similar ways at more than one shop.
Just last year a new hire at a place I freelance for spent days trying to get Jetbrains PHPStorm working on a shared remote dev server. Like VSCode it runs a heavy process on the server (including the LSP). Unlike VSCode, PHPStorm can actually kill the whole server, wasting everyone's time and maybe losing work. I have never seen vim or grep bring a whole server down. I could add up how much "faster" PHPStorm might turn out compared to vim, but it will have to recoup the days lost trying to get it to work at all first.
I implemented a format-agnostic search that can match patterns across various naming conventions like camelCase, snake_case, PascalCase, kebab-case. If needed, I'll integrate in space-separated words.
I've just published the tool to PyPI, so you can easily install it using pip (`pip install super-grep`), and then you just run it from the command line with `super-grep`. You can let me know if you think there's a smarter name for it.
Source: https://www.github.com/msmolkin/super-grep
If you do, email a link to hn@ycombinator.com and we'll put it in the second-chance pool (https://news.ycombinator.com/pool, explained at https://news.ycombinator.com/item?id=26998308), so it will get a random placement on HN's front page.
So if I'm trying to locate the error message "because the disk is full" but it's in the code as:
then it will fail.So really, combining both our use cases, what would be great is to simply search for a given case-insensitive alphanumeric string in files that skips all non-alphanumeric characters.
So if I search for:
it would match all of: And then in the search results, even if you get some accidental hits, you can be happy knowing that you didn't miss anything.But the string split thing you mentioned happens a lot when searching for OpenStack error messages in Python that is often split across lines like you showed. My current solution is to randomly shift what I'm searching for, or try pick the most unique line.
(`first\S?name` is usually better, by ignoring whitespace -> better ignores comments describing a thing, but `.` is easier to remember and type so I usually just do that)
lets say someone would make a plugin for their favorite IDE for this kind of search. How would the details look like?
To keep it simple, lets assume we just do the super-case-insensitivity, without the other regex condition. Lets say the user searches for "first_name" and wants to find "FirstName".
one simple solution would be to have a convention where a word starts or ends, e.g. with " ". So the user would enter "first name" into the plugin's search field. The plugin turns it into "/first[-_]?name/i" and gives this regexp to the normal search of the IDE.
another simple solution would be to ignore all word boundaries. So when the user enters "first name", the regexp would become "/f[-_]?i[-_]?r[-_]?s[-_]?t[-_]?n[-_]?a[-_]?m[-_]?e[-_]?/i". Then the search would not only be super-case-insensitive, but super-duper-case-insensitive. I guess the biggest downside would be, that this could get very slow.
I think implementing a plugin like this would be trivial for most IDEs, that support plugins.
Am I missing something?
> So the user would enter "first name" into the plugin's search field.
Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.
Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)
If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.
Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.
* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.
So you's search for "/first\_name/i".
Basically in vim to substitute text you'd usually do something with :substitute (or :s), like:
:%s/textToSubstitute/replacementText/g
...and have to add a pattern for each differently-cased version of the text.
With the :Subvert command (or :S) you can do all three at once, while maintaining the casing for each replacement. So this:
textToSubstitute
TextToSubstitute
texttosubstitute
:%S/textToSubstitute/replacementText/g
...results in:
replacementText
ReplacementText
replacementtext
[1] https://www.gnu.org/software/emacs/manual/html_node/emacs/Re...
:S/textToFind
matching all of textToFind TextToFind texttofind TEXTTOFIND
But not TeXttOfFiND.
Golly!
The other killer feature of nimgrep is that instead of regex, you can use PEG grammar [1]
Improving the IDE to find one or the other by searching for one or the other is missing the point or the article, that consistency is important.
I'd rather have a simple IDE and a good codebase than the opposite. In the example that I gave the worst thing is that it's the framework which forces you do use these two names for the same thing.
I didn't miss the point, I disagreed with the point because I think it's a tool problem, not a code problem. I agree with most other points in the article.
I know that they invented "curl". Do you tar xfz?
Anyway, to answer your question:
:)[edit]: Ironically, grep in that list is not the same word as the one OP is talking about. That one is actually based on grepp, with the double p. grep means pitchfork.
The german equivalent of the word would be probably "greifbar". Being able to hold something, usually used metaphorically.
(Norwegian here. Our languages are similar, but we miss this one.)
Deleted Comment
More customarily: intelligibility.
Grijpbaarheid
I never saw grep as grijp
I guess I do now
(Dutch btw)
When I was first onboarding to this project, I was tasked with updating a component and simply tried to find three of the words I saw in the UI, and this was before we implemented a straightforward path-based routing system. It took me far too long just to find what I was going to be working on, and that's the day I distinctly remember learning this lesson. I was pretty junior, but I'd later return to this code and threw it all away for a number of easily greppable strings.
As opposed to "1 objects" or "1 object(s)". A UI filled with "(s)", ughh
This is roughly the logic:
Basically pluralizing words in Polish is a fizz-buzz problem :) In other Slavic languages it should be similar BTWhttps://www.foo.be/docs/tpj/issues/vol4_1/tpj0401-0013.html
Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines. Now when you see that list of structs, you wonder "why is this one different?" and you have to read carefully to determine, nope, it just contained one longer string. Or god forbid the reformat all the structs to match, turning a 1-page file into 3 pages, and making it so you have to read and understand each element of each struct just to see what's going on.
If I could have written the rule of thumb, I would have said "No logic or control shall happen after the end of the gutter." But if there's a paragraph-long string on one line- who cares?? We all have a single keystroke that can toggle soft-wrap, and the odds that you're going to need to know anything about that string other than "it's a long string" are virtually nil.
Sorry. I got triggered. :-)
From Zen of Python: ``` Special cases aren't special enough to break the rules. Although practicality beats purity. ``` https://peps.python.org/pep-0020/
I'm fairly firmly in the "wrap at 80" camp by the way; but sometimes a tad longer just makes sense. Or shorter for that matter: forced removal of line breaks is just as bad.
The "top-level declarations" in source files are exactly: package, import, const, var, type, func. Nothing else. If you're searching for a function, it's always going to start with "func", even if it's an anonymous function. Searching for methods implemented by a struct similarly only needs one to know the "func" keyword and the name of the struct.
Coming from a background of mostly Clojure, Common Lisp, and TypeScript, the "greppability" of Go code is by far the best I have seen.
Of course, in any language, Go included, it's always better to rely on static analysis tools (like the IDE or LSP server) to find references, definitions, etc. But when searching code of some open source library, I always resort to ripgrep rather than setting up a development environment, unless I found something that I want to patch (which in case I set up the devlopment environment and rely on LSP instead of grep to discover definitions and references).
> A piece of Go source code should avoid unnecessary repetition. One common source of this is repetitive names, which often include unnecessary words or repeat their context or type. Code itself can also be unnecessarily repetitive if the same or a similar code segment appears multiple times in close proximity.
https://google.github.io/styleguide/go/decisions#repetitive-...
(see also https://google.github.io/styleguide/go/best-practices#avoid-...)
This is the style rule that motivates the sibling comment about method names being split between method and receiver, for what it's worth.
I don't think this use case has received much attention internally, since it's fairly rare at Google to use grep directly to navigate code. As you suggest, it's much more common to either use your IDE with LSP integration, or Code Search (which you can get a sense of via Chromium's public repository, e.g. https://source.chromium.org/search?q=v8&sq=&ss=chromium%2Fch...).
on edit: I see someone discussed that you can grep for both arrow functions and named function at the same time and I suppose you can also construct a query that handles a function constructor as well - but this does not really handle curried functions or similar patterns - I guess at that point one is letting the perfect become the enemy of the good.
Most people grepping know the code base and the patterns in use, so they probably only need to grep for one type of function declaration.
That’s not really C; that’s a C-based DSL. The same problem exists with Lisp, except even worse, since its preprocessor is much more powerful, and hence encourages DSL-creation much more than C does. But in fact, it can happen with any language - even if a language lacks any built-in processor or macro facility, you can always build a custom one, or use a general purpose macro processor such as M4.
If you are creating a DSL, you need to create custom tooling to go along with it - ideal scenario, your tools are so customisable that supporting a DSL is more about configuration than coding something from scratch.
Other than that, functions should be defined by the keyword.
You can search for both: "function" and "=>" to find all function expressions and arrow function expressions.
All named functions are easily searchable.
All anonymous functions are throw away functions that are only called in one place so you don't need to search for them in the first place.
As soon as an anonymous function becomes important enough to receive a label (i.e. assigning it to a variable, being assigned to a parameter, converting to function expression), it has also become searchable by that label too.
And always call the function as `funcname(args)`
So definitions have a space between the name and arg parentheses, while calls do not. Seemed to work well, even in languages with extraneous keywords before definitions since space + paren is shorter than most keywords.
Now days I don’t bother since it really isn’t that useful especially with tags or LSP.
I still put the return type on a line of its own, not for search/grep, but because it is cleaner and looks nice to me—overly long lines are the ugliest of coding IMO. Well that and excessive nesting.
That’s why in my personal projects I follow classic “type\nname” and grep with “^name\>”.
looks ugly
Single line definitions with long, irregular type names and unaligned function names look ugly. Col 1 names are not only greppable but skimmable. I can speedscroll through code and still see where I am.
To me, that's a much common and worse practice with regards to greppability than splitting identifiers using string which I haven't seen much in the wild.
int
foo(void) { }
vs the Linux coding style:
int foo(void) { }
The BSD style allows me to find function definitions using git grep ^foo.
Many people insist that IDEs make the entire point moot, but that's the kind of thing that make IDEs easier to write and debug, so I disagree.
Most of us use exuberant ctags to allow jumping to definitions.
This is why in C projects libs go in "lib/" and sources go in "src/". If your header files have the same directory structure as libs, then "include/" is a also a decent way to find definitions.
Even for Lisp, you don't want to be grepping, or at least not all the time for basic things.
For TXR Lisp, I provide a program that will scan code and build (or add to) your tags file (either a Vim or Emacs compatible one).
Given
it will let your editor jump to the definition of point, x and y.It's a hassle. But not the end of the world.
I usually search for "doTheThing\(.+?\) \{" first.
If I don't get a hit, or too many hits I move to "doTheThing\([^\)]*?\) \{" and so on.
Working with legacy code — the scenario the author describes — I often can’t install anything on the server.
...use source code tagging or LSP.
Dead Comment
I don't care which case is used. It's a trivial superficial thing, and tribal zealotry about such doesn't reflect well on the language and community.
[1] The warnings can be turned off, but in some cases it requires ugly hacks, and the community seems to be actively hostile to making it easier)
I think the benefit of having one symbol exist in only one domain (e.g. “user_request” only showing up in the database-handling code, where it’s used 3 times, and not in the UI code, where it might’ve been used 30 times) reduces more cognitive load than is added by searching for 2 symbols instead of 1 common one.
The upside to doing it this way is it makes your grepping more flexible by allowing you to either only search the one part of the codebase to see say DB code or see all the DB and UI things using the concept.
rg -i ‘foo.?bar’ finds all of foo_bar, fooBar, and FooBar.
Additionally, I find that in practice such "unusual" code is actually beneficial - it often makes it easy to see at a glance that the code is somehow in sync with some external spec. Especially when it comes to implicit usages such as in (de)serialization, noticing that quickly is quite valuable.
I'd much rather trash every languages' coding conventions than use subtly different names for objects serialized and shared across languages. It's just a pain.
Maybe there’s an alternative way to achieve what the author set out but increasing searchability at the cost of increasing brittleness isn’t it for me.
In this example:
const getTableName = (addressType: 'shipping' | 'billing') => { return `${addressType}_addresses` }
The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.
const getTableName = (addressType: 'shipping' | 'billing') => { if (addressType === 'shipping') { return 'shipping_addresses' } if (addressType === 'billing') { return 'billing_addresses' } throw new TypeError('addressType must be billing or shipping') }
Similarly, flattening dictionaries for readability introduces the chance of a random typo making our lives hell. A single typo in the repetitions below will be awful.
{ "auth.login.title": "Login", "auth.login.emailLabel": "Email", "auth.login.passwordLabel": "Password", "auth.register.title": "Login", "auth.register.emailLabel": "Email", "auth.register.passwordLabel": "Password", }
Typos aren’t unlikely. In a codebase I work with, we have a perpetually open ticket about how ARTISTS is mistyped as ATRISTS in a similarly flat enum.
The issue can’t be solved easily because the enum is now copied across several codebases. But the ticket has a counter for the number of developers that independently discovered the bug and it’s in the mid two digits.
I don't think coupling variable names by making sure they contain the same strings is the best way to show they're related, compared to an actual map from address type to table name. There might be a lot of things called 'shipping' in my app, only some of which are coupled to `shipping_addresses`.
Shouldn't a linter be able to catch that there is no enum member called MyEnum.ATRISTS, or is it not an actual enum?
I think it depends on whether the repetition is accidental or intrinsic. Does the table name happen to contain the address type as a prefix, or does it intrinsically have to? Greppability aside, when things are incidentally related, it's often better to repeat yourself to not give the wrong impression that they're intrinsically related. Conversely, if they are intrinsically related (i.e. it's an invariant of the system that the table name starts with the address type as a prefix) then it's better for the code to align with that.
What happens when translation files get too big and you want to split and send only relevant parts? Like send only auth keys when user is unauthenticated?
`return translations[auth][login]` is no longer possible.
Or just imagine you want to iterate through `auth` keys. _shudders_
I certainly would not suggest deliberately mistyping, but there are places where the benefit is approaching the cost. Certain log messages can absolutely benefit from subtle letter garbling that retains readability while adding uniqueness.