Readit News logoReadit News
mccanne commented on Examples for the tcpdump and dig man pages   jvns.ca/blog/2026/03/10/e... · Posted by u/ibobev
mccanne · 2 days ago
Hey really cool! Tcpdump co-author here. If you're interested in the origin story of the tcpdump language, bpf, and pcap, I had fun putting together a talk a number of years ago... https://www.youtube.com/watch?v=XHlqIqPvKw8
mccanne commented on SQL needed structure   scattered-thoughts.net/wr... · Posted by u/todsacerdoti
cousin_it · 6 months ago
Nobody mentioned nested data in BigQuery? It's a superset of SQL where fields can be arrays of structs (so basically nested tables) and there's some extra syntax to nest/unnest things, so you can often get everything you wanted in one result set.
mccanne · 6 months ago
For sure... the use of JSON here seems orthogonal to Jamie's point of constructing complex nested values in SQL with a single backend query. Modern SQLs support all this as native types (presuming the result can fit in a homogeneous relational type, which it can in this case), e.g., Jamie's query can be written with a record expression (duckdb dialect):

  with theTitle as (
    from title.parquet
    where tconst = 'tt3890160'
  ),
  principals as (
    select array_agg({id:principal.nconst,name:primaryName,category:category})
    from principal.parquet, person.parquet
    where principal.tconst = (from theTitle select tconst)
    and person.nconst = principal.nconst
  ),
  characters as (
    select array_agg(c.character) as characters, p.u.name
    from principal_character.parquet c
    join (select unnest((from principals)) as u) p
      on c.character is not null and u.id=c.nconst and c.tconst=(select tconst from theTitle)
    group by p.u
  )
  select {
    title: (select primaryTitle from theTitle),
    director: list_transform(
                list_filter((from principals), lambda elem: elem.category='director'),
                lambda elem: elem.name),
    writer: list_transform(
              list_filter((from principals), lambda elem: elem.category='writer'),
                lambda elem: elem.name),
    genres: (select genres from theTitle),
    characters: (select array_agg({name:name,characters:characters}) from characters),
  } as result
And if you query typeof on the result, you'll get:

  STRUCT(
    title VARCHAR,
    director VARCHAR[],
    writer VARCHAR[],
    genres VARCHAR,
    characters STRUCT(
      "name" VARCHAR,
      characters VARCHAR[]
    )[]
  )

mccanne commented on A love letter to the CSV format   github.com/medialab/xan/b... · Posted by u/Yomguithereal
mccanne · a year ago
Relevant discussion from a few years back

https://news.ycombinator.com/item?id=28221654

mccanne commented on Decoding JSON sum types in Go without panicking   nicolashery.com/decoding-... · Posted by u/misonic
mccanne · a year ago
Nice article!

Decoding sum types into Go interface values is obviously tricky stuff, but it gets even harder when you have recursive data structures as in an abstract syntax tree (AST). The article doesn't address this. Since there wasn't anything out there to do this, we built a little package called "unpack" as part of the SuperDB project.

The package is here...

https://github.com/brimdata/super/blob/main/pkg/unpack/refle...

and an example use in SuperDB is here...

https://github.com/brimdata/super/blob/main/compiler/ast/unp...

Sorry it's not very well documented, but once we got it working, we found the approach quite powerful and easy.

mccanne · a year ago
And somewhat ironically here, SuperDB not only implements sum-type decoding of JSON in package unpack, but it also implements native sum types in a superset of JSON that we call Super JSON (with a query language that understands how to rip and stitch sum types for columnar analytics... work in progress)

https://superdb.org/docs/formats/data-model/#25-union

mccanne commented on Decoding JSON sum types in Go without panicking   nicolashery.com/decoding-... · Posted by u/misonic
mccanne · a year ago
Nice article!

Decoding sum types into Go interface values is obviously tricky stuff, but it gets even harder when you have recursive data structures as in an abstract syntax tree (AST). The article doesn't address this. Since there wasn't anything out there to do this, we built a little package called "unpack" as part of the SuperDB project.

The package is here...

https://github.com/brimdata/super/blob/main/pkg/unpack/refle...

and an example use in SuperDB is here...

https://github.com/brimdata/super/blob/main/compiler/ast/unp...

Sorry it's not very well documented, but once we got it working, we found the approach quite powerful and easy.

mccanne commented on DeWitt and Stonebraker's "MapReduce: A major step backwards" (2009)   craig-henderson.blogspot.... · Posted by u/mooreds
mccanne · 2 years ago
Necessity is the mother of invention. MapReduce-based systems were developed because the state-of-the-art RDBMS systems of that age could not scale to the needs of the Googles/Yahoos/Facebooks during the phenomenal growth spurt of the early Web. The novelty here was the tradeoffs they made to scale out and up using the compute and storage footprints available at the time.

"We thought of that" vs "we built it and made it work".

mccanne commented on Easier data debugging with Zed’s first-class errors   brimdata.io/blog/debuggin... · Posted by u/jameskerr
rpxio · 3 years ago
It took me a minute to realize that this isn’t about the Zed text editor.
mccanne · 3 years ago
Yes, the name conflict is unfortunate. We named our project Zed in 2021 before the Zed text editor was a thing.
mccanne commented on Easier data debugging with Zed’s first-class errors   brimdata.io/blog/debuggin... · Posted by u/jameskerr
yyyk · 3 years ago
"Wouldn’t it be great if you could see errors in place instead of mysterious NULLs?"

It's not a good ad when the error message is inadequate even in the supplied example and you need to hack around it.

mccanne · 3 years ago
Apologies if our examples in the article weren't rich enough or clear. The beauty of the error type is that you can wrap any value in an error and make the error as rich as you'd like, even stacking errors from different stages of an ingest pipeline so you can see the lineage of an error alongside data that wasn't subject to errors. e.g., imagine an error like this:

  error({
      stage: "transform",
      err: "input error",
      value: {
          stage: "normalize",
          err: "input error",
          value: {
              stage: "metrics",
              err: "divide by zero",
              value: {
                  sum: 123.5,
                  n: 0
              }
          }
      }
  })
... and you can quickly deduce that your "metrics" stage is dividing by "n" even if n is 0 and you can fix up your logic as well as fix the errors in place after fixing the bug in the ingest pipeline.

mccanne commented on Easier data debugging with Zed’s first-class errors   brimdata.io/blog/debuggin... · Posted by u/jameskerr
TheAlchemist · 3 years ago
Oh wow ! I was just trying to articulate exactly this kind of approach and looking for it. I would love to see a coherent approach to data errors and this seems like a step in the right direction.

One question - the blog post covers basically debugging the ingestion of data part. My quite usual issue with older data is that at some point, you discover an issue with it (say it's slightly false, but not too much) - so you want to somehow let users know, or allow to select only the data without the issue (but still let them know how much of it they miss) - is this framework helpful in this situation ?

mccanne · 3 years ago
Yes, great point! The idea is that you can fix up a problem with data in place, while you're updating your ingest pipeline to handle whatever is causing the problem. You can do a transform on the errors into clean data, delete the errors, and commit the changes atomically. In the meantime, queries and searches can still run on the data that isn't problematic and even if there are errors inside of a hierarchical value, queries can be run on the portions of the value that are clean and intact while the errors are being addressed.

u/mccanne

KarmaCake day412May 23, 2014View Original