MechanicalTwerk (u/MechanicalTwerk)

MechanicalTwerk commented on Diskprices.com makes $5k/month with affiliate marketing medium.com/@petervoica/ho... · Posted by u/nomilk

toberoni · 2 years ago

The website isn't the money maker.

It's the backlinks that allow it to rank. Getting them requires a lot of knowledge & work, like publishing articles on Medium or receiving links on HN.

Unsurprisingly, it looks like the creator is an SEO-expert with years of experience and dozens of projects.

MechanicalTwerk · 2 years ago

This seems to be the creator: https://synack.me/projects/. Looks like a lot of hardware and software projects rather than SEO. He’s an infra/systems eng: https://www.linkedin.com/in/synack.

MechanicalTwerk commented on Launch HN: Narrator (YC S19) – a data modeling platform built on a single table · Posted by u/cedricd

ahmedelsama · 5 years ago

Sprawl - YES! I would never put a single time-series table in your BI tool. It is not queryable and you will hate the insane results.

- We actually built our own query layer called Dataset to make sure that the dataset is materialized. This way if you put it in your BI tool, you can always go back to the dataset which points direct to the activity stream.

Single Source of truth. & traceability - 100%. We really aim to have activities be actually different. Each activity is modeled via SQL and often done by a data engineer or analyst. You cannot just create 1000 activities. 90% of our customers have between 20-40. This enables your activities to be unique. Also unlike tables, activities are building block so they map to something real (i.e. "paid invoice", "sent contract").

So far we haven't seen many people struggling with activities being too similar.

Also the modeling of the activity helps clear up the Garbage in -> Garbage out problem that often happens with CDPs (mixpanel, segment, etc..).

In terms of analysis. We did build a tool called Narrative (actionable analysis in a story format). This is designed to get users to write their analysis with CONTEXt built in vs just numbers on a screen. With context + the ability to click to see the activities and relationship people can quickly know what data powers the source. Does this solve the problem 100%? Nope, but it does take us huge steps in the right direction.

Coherent Model - I think our tool Dataset helps with this problem. We started as a consultancy and answered 1000s of questions over 3 years till our tool was able to answer any question. I usually demo by asking the customer to ask any question they have and I try to answer it live. So far, we have been able to answer them all so I am SUPER excited to find the limit of our tools.

Yeah, for data EL via Stitch, Fivetran then this is easy. Dirty data that is a bunch of JSONS etc... take a bit more effort but that building of the activity is done once. You also don't have to deal with how concepts relate or identity resolution or a lot of other things that make SQL complex.

Overall, I love this conversation and would like to continue. I am excited to hear some of your edge cases. Maybe we can even setup some time and talk face to face: https://calendly.com/ahmed-narrator/30min-1

MechanicalTwerk · 5 years ago

I took a deeper look at the Dataset portion of your product this morning and it definitely piqued my interest. It wasn't clear to me from your original post and my initial scan of your site that there was a way to create queries/models/views (whatever folks want to call them, they're essentially the same concept) on top of the single activity table and then either materialize them or integrate them with other services via webhooks or native API integrations. That's definitely super useful. Also, the "Relationship" concept does a nice job of trying to approach joins/window functions in plain english. Query builders are always a difficult UX problem and I think you're onto something interesting. Finally, the validation, identity resolution and spend features are also nice and I could see you adding value via more features in this vein in the future.

The main thing this product strikes me as is "The ETL tool that understands your business". Whereas the domain language of most ETL tools is at the level of DW technologies (rows, columns, schemas, facts, dimensions, indexes, join algos, views, dags, orchestration schedules), the domain language of Narrator is at the level of the business (activities, customers, relationships, spend, etc). In a way it's sort of similar to the old convention over configuration religious war. I could see companies using Narrator for the 80% of ETL that is just plain table stakes in order to compete nowadays and offloading most of the definition and minor customization of this ETL to less technical folks. And maybe in parallel the data engineers would use plain old code to do the last 20% of ETL that is truly proprietary and specific to the business.

Not sure if my biased initial reading of your pitch was off but it seemed like you were focusing heavily on addressing the pain points of the star schema. I've found that most people fall into two camps: either they don't care at all about the kimball star schema world and they're just loading tables however they see fit into their warehouse or they are willing to go to their grave defending the star schema and its variants. In either case, I don't think you gain much by positioning yourself as the antidote to the star schema. I think you could capture customers in both camps by focusing instead on the fact that your ETL tool has a deep understanding of how companies that rely heavily on a web presence work. I think this would also better align you with the ability to increase your customers' revenue as opposed to optimizing engineering/infrastructure concerns which is an easier sell.

Anyway, sorry for the rant. I'm going to shoot you a short email in case you want to connect.

MechanicalTwerk commented on Launch HN: Narrator (YC S19) – a data modeling platform built on a single table · Posted by u/cedricd

ahmedNarrator · 5 years ago

Yeah, Entity modeling was one of the big inspirations to our approach. The main difference is how do you reassemble the single time-series table to create any table.

This was quite a challenge and I think what makes the traceability and source of truth problem a lot simpler.

In Narrator, the data team writes small SQL to create single customer centric business concepts that we call activities. These are around 25 lines and decided to be understood by anyone in the company (i.e. "viewed page", "called us",...).

Now, every question you or a stakeholder has will simply be a rearrangement of these activities. If you can describe what you want, then Narrator can assemble a table that represent it.

Source of truth - What ever is in the activity stream? Tracebility - always Dataset (activities and how they relate), then activities (~25 SQL). Coherent Model - Customers doing actions in time.

Does that make sense? Some of these things are easier to show in a demo then describe in text.

MechanicalTwerk · 5 years ago

> activities and how they relate

This is the problem with EAV/nosql/schemaless/etc and ultimately the problem I think you are going to have to solve. Instead of using ETL to model how the activities relate and reifying that model as database objects, EAV just kicks the can down the road to the query/BI tool.

Sprawl - The BI tool will end up containing most of the real business logic sprawled across many reports.

Single source of truth - A lot of the reports will be very similar but they will be based off slightly different activities or slightly different filtering logic. Which report is the correct one?

Traceability - I think this is more of an end-to-end "garbage-in, garbage-out" problem that all ETL/BI tools have that wouldn't be specific to your tool. It's more of an organizational/people problem.

Coherent model - In my experience, EAV isn't enough to cover the breadth of analyses mature businesses need to do and most business users won't be able to wrap their head around it. There will have to be some data person that creates a more coherent, tabular/spreadsheet-like model and in the case of this tool it looks like that model will have to exist in the BI tool. Which brings us back to sprawl/single source of truth issues.

Just some thoughts. But always glad to see more people working on stuff like this!

Edit - one last thing I wanted to mention. I think in reality you are going to find it takes more than ~25 lines of sql to define activities. That may be the case if the source is a schema that gets spit out of something like Stitch, but many other schemas in the wild will take a lot more than 25 loc to massage into your 11 column schema.

MechanicalTwerk commented on Launch HN: Narrator (YC S19) – a data modeling platform built on a single table · Posted by u/cedricd

MechanicalTwerk · 5 years ago

Wish you the best of luck but isn't this just a fancier version of EAV?

https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...

IMO, it doesn't matter what kind of db technology, schema or query tool that you use. A company will always have analysis sprawl regardless of whether those analyses are represented as data lake files, sql tabes, materialized views, regular views, or (as is often the case with EAV and other such schemas) queries saved in some BI tool. There is no silver bullet and it will always take some work to maintain single source of truth, traceability, a coherent model that is understandable by average business users, etc.

MechanicalTwerk commented on George Floyd Protest – police brutality videos on Twitter docs.google.com/spreadshe... · Posted by u/dtagames

rabidrat · 5 years ago

Which makes these even more egregious because it means there are still more cases that weren't recorded. These are just the ones we know about!

MechanicalTwerk · 5 years ago

I agree we can be almost certain that there are violations that aren't being captured on video and therefore the true number of violations is actually higher (assuming all the instances presented are violations). What matters is how the true number compares to the total number of interactions and if that ratio is acceptable to society. The absolute number of violations caught on video and the comparison of that number to similar numbers in other professions doesn't really say anything.

MechanicalTwerk commented on George Floyd Protest – police brutality videos on Twitter docs.google.com/spreadshe... · Posted by u/dtagames

DeonPenny · 5 years ago

The fact that is even possible is insane. Imagine there being over 700 videos of pilots messing up in one month, 700 crane operator mishaps in a month, 700+ food poising by a chain in a month. The also imagine you believe there's no problem.

This is Ba Sing Se levels of delusion for some people.

MechanicalTwerk · 5 years ago

To be fair there's way more people filming the police right now than any of those professions so I wouldn't expect the number of mishaps on video to be comparable.

MechanicalTwerk commented on Show HN: Visual SQL chartio.com/blog/why-we-m... · Posted by u/thingsilearned

pbreit · 5 years ago

If I understand, it involves changes to the DB server? That would be a show-stopper for me.

MechanicalTwerk · 5 years ago

Haven't used Chartio in a while but when I did the reverse ssh tunnel was just a standard reverse ssh that you could deploy on any server in your vpn. It doesn't have to be on the actual db server, it could be on its own server acting as a bridge between Chartio and your db. Also, you could use whatever ssh client you wanted - you didn't have to install anything from Chartio in your vpn.

MechanicalTwerk commented on Show HN: Open-Source Business Intelligence for BigQuery – Looker Alternative mprove.io... · Posted by u/akalitenya

segah · 6 years ago

@akalitenya are you even in the clear with this? Some of the Old LookML syntax is an exact copy.

But more importantly, the challenge for any such tool is to go beyond use by 2-3 people. At 2-3 people anything will work. Where BI tools (open source and close source) struggle is scale: having all the right features for, essentially, a group of users who actually don't know how to work with data (did I just say that aloud?). Chartio caps at 20 people. RJ capped at 50-100 (and later became Stitch for that reason). We haven't seen where Metabase caps, but I bet it is in a similar range. Very few BI products have actually surpassed 100 users at target installations. And beyond 1,000 is a real challenge that only few, and even then with a lot of assistance, can support: Tableau, Looker, Microstrategy, maybe Birst, maybe Domo.

Also, a combination of BI with LookML is a complicated product. During my days at Looker, we were handling 50+ bugs / week, and filing 1,000+ tickets. Every day we were filing over a 100 new features.

So with all that, the question is, is it really worth the struggle? What's the end vision for supporting this? Why should someone who implements BI for a living bet on this product?

MechanicalTwerk · 6 years ago

I agree with pretty much everything you've said. However, as someone who has used Chartio extensively, I'd say 20 is wayyy too low. It can definitely handle 100s. But, like you said, 1,000s is a struggle for anyone.

Also, if you think this is a rip off of LookML, you should take a look at what GitLab is doing with Meltano. They completely jacked LookML.