mccanne (u/mccanne) - Readit News

mccanne commented on Examples for the tcpdump and dig man pages jvns.ca/blog/2026/03/10/e... · Posted by u/ibobev

mccanne · 2 days ago

Hey really cool! Tcpdump co-author here. If you're interested in the origin story of the tcpdump language, bpf, and pcap, I had fun putting together a talk a number of years ago... https://www.youtube.com/watch?v=XHlqIqPvKw8

mccanne commented on SQL needed structure scattered-thoughts.net/wr... · Posted by u/todsacerdoti

cousin_it · 6 months ago

Nobody mentioned nested data in BigQuery? It's a superset of SQL where fields can be arrays of structs (so basically nested tables) and there's some extra syntax to nest/unnest things, so you can often get everything you wanted in one result set.

mccanne · 6 months ago

For sure... the use of JSON here seems orthogonal to Jamie's point of constructing complex nested values in SQL with a single backend query. Modern SQLs support all this as native types (presuming the result can fit in a homogeneous relational type, which it can in this case), e.g., Jamie's query can be written with a record expression (duckdb dialect):

  with theTitle as (
    from title.parquet
    where tconst = 'tt3890160'
  ),
  principals as (
    select array_agg({id:principal.nconst,name:primaryName,category:category})
    from principal.parquet, person.parquet
    where principal.tconst = (from theTitle select tconst)
    and person.nconst = principal.nconst
  ),
  characters as (
    select array_agg(c.character) as characters, p.u.name
    from principal_character.parquet c
    join (select unnest((from principals)) as u) p
      on c.character is not null and u.id=c.nconst and c.tconst=(select tconst from theTitle)
    group by p.u
  )
  select {
    title: (select primaryTitle from theTitle),
    director: list_transform(
                list_filter((from principals), lambda elem: elem.category='director'),
                lambda elem: elem.name),
    writer: list_transform(
              list_filter((from principals), lambda elem: elem.category='writer'),
                lambda elem: elem.name),
    genres: (select genres from theTitle),
    characters: (select array_agg({name:name,characters:characters}) from characters),
  } as result

And if you query typeof on the result, you'll get:

  STRUCT(
    title VARCHAR,
    director VARCHAR[],
    writer VARCHAR[],
    genres VARCHAR,
    characters STRUCT(
      "name" VARCHAR,
      characters VARCHAR[]
    )[]
  )

mccanne commented on A love letter to the CSV format github.com/medialab/xan/b... · Posted by u/Yomguithereal

mccanne · a year ago

Relevant discussion from a few years back

https://news.ycombinator.com/item?id=28221654

mccanne commented on Decoding JSON sum types in Go without panicking nicolashery.com/decoding-... · Posted by u/misonic

mccanne · a year ago

Nice article!

Decoding sum types into Go interface values is obviously tricky stuff, but it gets even harder when you have recursive data structures as in an abstract syntax tree (AST). The article doesn't address this. Since there wasn't anything out there to do this, we built a little package called "unpack" as part of the SuperDB project.

The package is here...

https://github.com/brimdata/super/blob/main/pkg/unpack/refle...

and an example use in SuperDB is here...

https://github.com/brimdata/super/blob/main/compiler/ast/unp...

Sorry it's not very well documented, but once we got it working, we found the approach quite powerful and easy.

mccanne · a year ago

And somewhat ironically here, SuperDB not only implements sum-type decoding of JSON in package unpack, but it also implements native sum types in a superset of JSON that we call Super JSON (with a query language that understands how to rip and stitch sum types for columnar analytics... work in progress)

https://superdb.org/docs/formats/data-model/#25-union

mccanne commented on Decoding JSON sum types in Go without panicking nicolashery.com/decoding-... · Posted by u/misonic

mccanne · a year ago

Nice article!

Decoding sum types into Go interface values is obviously tricky stuff, but it gets even harder when you have recursive data structures as in an abstract syntax tree (AST). The article doesn't address this. Since there wasn't anything out there to do this, we built a little package called "unpack" as part of the SuperDB project.

The package is here...

https://github.com/brimdata/super/blob/main/pkg/unpack/refle...

and an example use in SuperDB is here...

https://github.com/brimdata/super/blob/main/compiler/ast/unp...

Sorry it's not very well documented, but once we got it working, we found the approach quite powerful and easy.

mccanne commented on SQL pipe syntax available in public preview in BigQuery cloud.google.com/bigquery... · Posted by u/marcyb5st

mccanne · a year ago

Really cool though typing ">" after "|" is a pain https://github.com/brimdata/super/blob/main/docs/language/pi...

mccanne commented on DeWitt and Stonebraker's "MapReduce: A major step backwards" (2009) craig-henderson.blogspot.... · Posted by u/mooreds

mccanne · 2 years ago

Necessity is the mother of invention. MapReduce-based systems were developed because the state-of-the-art RDBMS systems of that age could not scale to the needs of the Googles/Yahoos/Facebooks during the phenomenal growth spurt of the early Web. The novelty here was the tradeoffs they made to scale out and up using the compute and storage footprints available at the time.

"We thought of that" vs "we built it and made it work".

mccanne commented on Easier data debugging with Zed’s first-class errors brimdata.io/blog/debuggin... · Posted by u/jameskerr

rpxio · 3 years ago

It took me a minute to realize that this isn’t about the Zed text editor.

mccanne · 3 years ago

Yes, the name conflict is unfortunate. We named our project Zed in 2021 before the Zed text editor was a thing.

mccanne commented on Easier data debugging with Zed’s first-class errors brimdata.io/blog/debuggin... · Posted by u/jameskerr

yyyk · 3 years ago

"Wouldn’t it be great if you could see errors in place instead of mysterious NULLs?"

It's not a good ad when the error message is inadequate even in the supplied example and you need to hack around it.

mccanne · 3 years ago

Apologies if our examples in the article weren't rich enough or clear. The beauty of the error type is that you can wrap any value in an error and make the error as rich as you'd like, even stacking errors from different stages of an ingest pipeline so you can see the lineage of an error alongside data that wasn't subject to errors. e.g., imagine an error like this:

  error({
      stage: "transform",
      err: "input error",
      value: {
          stage: "normalize",
          err: "input error",
          value: {
              stage: "metrics",
              err: "divide by zero",
              value: {
                  sum: 123.5,
                  n: 0
              }
          }
      }
  })

... and you can quickly deduce that your "metrics" stage is dividing by "n" even if n is 0 and you can fix up your logic as well as fix the errors in place after fixing the bug in the ingest pipeline.

mccanne commented on Easier data debugging with Zed’s first-class errors brimdata.io/blog/debuggin... · Posted by u/jameskerr

TheAlchemist · 3 years ago

Oh wow ! I was just trying to articulate exactly this kind of approach and looking for it. I would love to see a coherent approach to data errors and this seems like a step in the right direction.

One question - the blog post covers basically debugging the ingestion of data part. My quite usual issue with older data is that at some point, you discover an issue with it (say it's slightly false, but not too much) - so you want to somehow let users know, or allow to select only the data without the issue (but still let them know how much of it they miss) - is this framework helpful in this situation ?

mccanne · 3 years ago

Yes, great point! The idea is that you can fix up a problem with data in place, while you're updating your ingest pipeline to handle whatever is causing the problem. You can do a transform on the errors into clean data, delete the errors, and commit the changes atomically. In the meantime, queries and searches can still run on the data that isn't problematic and even if there are errors inside of a hierarchical value, queries can be run on the portions of the value that are clean and intact while the errors are being addressed.