Dead Comment
- Parquet metadata is Thrift, but with comments saying "if this field exists, this other field must exist", and no code actually verifying the fact, so I'm pretty sure you could feed it bogus Thrift metadata and crash the reader.
- Parquet metadata must be parsed out, meaning you have to: allocate a buffer, read the metadata bytes, and then dynamically keep allocating a whole bunch of stuff as you parse the metadata bytes, since you don't know the size of the materialized metadata! Too many heap allocations! This file format's Flatbuffers approach seems to solve this as you can interpret Flatbuffer bytes directly.
- The encodings are much more powerful. I think a lot of people in the database community have been saying that we need composable/recursive lightweight encodings for a long time. BtrBlocks was the first such format that was open in my memory, and then FastLanes followed up. Both of these were much better than Parquet by itself, so I'm glad ideas from those two formats are being taken up.
- Parquet did the Dremel record-shredding thing which just made my brain explode and I'm glad they got rid of it. It seemed to needlessly complicate the format with no real benefit.
- Parquet datapages might contain different numbers of rows, so you have to scan the whole ColumnChunk to find the row you want. Here it seems like you can just jump to the DataPage (IOUnit) you want.
- They got rid of the heavyweight compression and just stuck with the Delta/Dictionary/RLE stuff. Heavyweight compression never did anything anyway, and was super annoying to implement, and basically required you to pull in 20 dependencies.
Overall great improvement, I'm looking forward to this taking over the data analytics space.
Deleted Comment
Nowadays I code for a living, but for sure this is the game that started the spark for me.
It was a great time and I feel that I can always run this game and get back to that childhood feeling.
Web apps can ask for your location or microphone the same way native apps can. Just reject it, there’s nothing that says you have to accept on either platform, so to say that’s a negative for native apps is odd.
The biggest downside of native apps is you can’t customize them with extensions or user styles like you can with websites.
On the other hand, for mobile apps, there is still a device-specific mentality.
Imagine web apps being built with a different flavor for all the major browsers...
I hope that the same level of standardization comes to mobile apps too with the option to use more device-specific features on top of the generic UI.
- one single machine - nginx proxy - many services on the same machine; some are internal, some are supposed to be public, are all accessible via the web! - internal ones have a humongous large password for HTTP basic auth that I store in an external password manager (firefox built in one) - public ones are either public or have google oauth
I coded all of them from scratch as that's the point of what I'm doing with homelabbing. You want images? browsers can read them. Videos? Browsers can play them.
The hard part is the backend for me. The frontend is very much "90s html".
It is nice that the author shared the results of his exercise / experiment. Just got sad as I was reminded (when the 100 USD were mentioned) that all this game is 90%+ about money and hardware rather than skills.
That being said I really like the initiative of the author.
A more skilled person that understands all the underlying steps will always be more efficient in scaling up due to knowing where to allocate more.
basically... you always need the skills and the money is the fine tuning.