Readit News logoReadit News
micksmix commented on Show HN: Globstar – Open-source static analysis toolkit    · Posted by u/sanketsaurav
sanketsaurav · 6 months ago
> One of the main benefits of Semgrep is its unified DSL that works across all supported languages.

> People can disagree, but I'm not sure that tree-sitter S-expressions as an upgrade over a DSL.

100% agree — a DSL is a better user experience for sure. But this is a deliberate choice we made of not inventing a new DSL and using tree-sitter natively. We've directly addressed this and agree that the S-expressions are gnarly; but we're optimizing for a scenario that you wouldn't need to write this by hand anyway.

It's a trade-off. We don't want to spend time inventing a DSL and port every language's idiosyncrasies to that DSL — we'd rather improve our runtime and add support for things that other tools don't support, or support only on a paid tier (like cross-file analysis — which you can do on Globstar today).

micksmix · 6 months ago
That makes a lot of sense. I wish you the best of luck and will be happy to try it out as you continue to develop it!
micksmix commented on Show HN: Globstar – Open-source static analysis toolkit    · Posted by u/sanketsaurav
micksmix · 6 months ago
One of the main benefits of Semgrep is its unified DSL that works across all supported languages. In contrast, using the Go module "smacker/go-tree-sitter" can expose you to differences in s-expression outputs due to variations and changes in independent grammars.

I've seen grammars that are part of "smacker/go-tree-sitter" change their syntax between versions, which can lead to broken S-expressions. Semgrep solves that with their DSL, because it's also an abstraction away from those kind of grammar changes.

I'm a bit concerned that tree-sitter s-expressions can become "write-only" and rely on the reader to also understand the grammar for which they've been generated.

For example, here's a semgrep rule for detecting a Jinja2 environment with autoescaping disabled:

  rules:
  - id: incorrect-autoescape-disabled
    patterns:
      - pattern: jinja2.Environment(... , autoescape=$VAL, ...)
      - pattern-not: jinja2.Environment(... , autoescape=True, ...)
      - pattern-not: jinja2.Environment(... , autoescape=jinja2.select_autoescape(...), ...)
      - focus-metavariable: $VAL

  
Now, compare it to the corresponding tree-sitter S-expression (generated by o3-mini-high):

  (
    call
      function: (attribute
                  object: (identifier) @module (#eq? @module "jinja2")
                  attribute: (identifier) @func (#eq? @func "Environment"))
      arguments: (argument_list
                    (_)*
                    (keyword_argument
                      name: (identifier) @key (#eq? @key "autoescape")
                      value: (_) @val
                        (#not-match @val "^True$")
                        (#not-match @val "^jinja2\\.select_autoescape\\("))
                    (_)*)
  ) @incorrect_autoescape

People can disagree, but I'm not sure that tree-sitter S-expressions as an upgrade over a DSL. I'm hoping I'm proven wrong ;-)

micksmix commented on Show HN: Globstar – Open-source static analysis toolkit    · Posted by u/sanketsaurav
micksmix · 6 months ago
This is a really interesting project!

I'd love to hear how this project differs from Bearer, which is also written in Go and based on tree-sitter? https://github.com/Bearer/bearer

Regardless, considering there is a large existing open-source collection of Semgrep rules, is there a way they can be adapted or transpiled to tree-sitter S-expressions so that they may be reused with Globstar?

Dead Comment

u/micksmix

KarmaCake day2February 24, 2025
About
https://github.com/micksmix
View Original