- We actually built our own query layer called Dataset to make sure that the dataset is materialized. This way if you put it in your BI tool, you can always go back to the dataset which points direct to the activity stream.
Single Source of truth. & traceability - 100%. We really aim to have activities be actually different. Each activity is modeled via SQL and often done by a data engineer or analyst. You cannot just create 1000 activities. 90% of our customers have between 20-40. This enables your activities to be unique. Also unlike tables, activities are building block so they map to something real (i.e. "paid invoice", "sent contract").
So far we haven't seen many people struggling with activities being too similar.
Also the modeling of the activity helps clear up the Garbage in -> Garbage out problem that often happens with CDPs (mixpanel, segment, etc..).
In terms of analysis. We did build a tool called Narrative (actionable analysis in a story format). This is designed to get users to write their analysis with CONTEXt built in vs just numbers on a screen. With context + the ability to click to see the activities and relationship people can quickly know what data powers the source. Does this solve the problem 100%? Nope, but it does take us huge steps in the right direction.
Coherent Model - I think our tool Dataset helps with this problem. We started as a consultancy and answered 1000s of questions over 3 years till our tool was able to answer any question. I usually demo by asking the customer to ask any question they have and I try to answer it live. So far, we have been able to answer them all so I am SUPER excited to find the limit of our tools.
Yeah, for data EL via Stitch, Fivetran then this is easy. Dirty data that is a bunch of JSONS etc... take a bit more effort but that building of the activity is done once. You also don't have to deal with how concepts relate or identity resolution or a lot of other things that make SQL complex.
Overall, I love this conversation and would like to continue. I am excited to hear some of your edge cases. Maybe we can even setup some time and talk face to face: https://calendly.com/ahmed-narrator/30min-1
The main thing this product strikes me as is "The ETL tool that understands your business". Whereas the domain language of most ETL tools is at the level of DW technologies (rows, columns, schemas, facts, dimensions, indexes, join algos, views, dags, orchestration schedules), the domain language of Narrator is at the level of the business (activities, customers, relationships, spend, etc). In a way it's sort of similar to the old convention over configuration religious war. I could see companies using Narrator for the 80% of ETL that is just plain table stakes in order to compete nowadays and offloading most of the definition and minor customization of this ETL to less technical folks. And maybe in parallel the data engineers would use plain old code to do the last 20% of ETL that is truly proprietary and specific to the business.
Not sure if my biased initial reading of your pitch was off but it seemed like you were focusing heavily on addressing the pain points of the star schema. I've found that most people fall into two camps: either they don't care at all about the kimball star schema world and they're just loading tables however they see fit into their warehouse or they are willing to go to their grave defending the star schema and its variants. In either case, I don't think you gain much by positioning yourself as the antidote to the star schema. I think you could capture customers in both camps by focusing instead on the fact that your ETL tool has a deep understanding of how companies that rely heavily on a web presence work. I think this would also better align you with the ability to increase your customers' revenue as opposed to optimizing engineering/infrastructure concerns which is an easier sell.
Anyway, sorry for the rant. I'm going to shoot you a short email in case you want to connect.
This was quite a challenge and I think what makes the traceability and source of truth problem a lot simpler.
In Narrator, the data team writes small SQL to create single customer centric business concepts that we call activities. These are around 25 lines and decided to be understood by anyone in the company (i.e. "viewed page", "called us",...).
Now, every question you or a stakeholder has will simply be a rearrangement of these activities. If you can describe what you want, then Narrator can assemble a table that represent it.
Source of truth - What ever is in the activity stream? Tracebility - always Dataset (activities and how they relate), then activities (~25 SQL). Coherent Model - Customers doing actions in time.
Does that make sense? Some of these things are easier to show in a demo then describe in text.
This is the problem with EAV/nosql/schemaless/etc and ultimately the problem I think you are going to have to solve. Instead of using ETL to model how the activities relate and reifying that model as database objects, EAV just kicks the can down the road to the query/BI tool.
Sprawl - The BI tool will end up containing most of the real business logic sprawled across many reports.
Single source of truth - A lot of the reports will be very similar but they will be based off slightly different activities or slightly different filtering logic. Which report is the correct one?
Traceability - I think this is more of an end-to-end "garbage-in, garbage-out" problem that all ETL/BI tools have that wouldn't be specific to your tool. It's more of an organizational/people problem.
Coherent model - In my experience, EAV isn't enough to cover the breadth of analyses mature businesses need to do and most business users won't be able to wrap their head around it. There will have to be some data person that creates a more coherent, tabular/spreadsheet-like model and in the case of this tool it looks like that model will have to exist in the BI tool. Which brings us back to sprawl/single source of truth issues.
Just some thoughts. But always glad to see more people working on stuff like this!
Edit - one last thing I wanted to mention. I think in reality you are going to find it takes more than ~25 lines of sql to define activities. That may be the case if the source is a schema that gets spit out of something like Stitch, but many other schemas in the wild will take a lot more than 25 loc to massage into your 11 column schema.
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...
IMO, it doesn't matter what kind of db technology, schema or query tool that you use. A company will always have analysis sprawl regardless of whether those analyses are represented as data lake files, sql tabes, materialized views, regular views, or (as is often the case with EAV and other such schemas) queries saved in some BI tool. There is no silver bullet and it will always take some work to maintain single source of truth, traceability, a coherent model that is understandable by average business users, etc.
This is Ba Sing Se levels of delusion for some people.
But more importantly, the challenge for any such tool is to go beyond use by 2-3 people. At 2-3 people anything will work. Where BI tools (open source and close source) struggle is scale: having all the right features for, essentially, a group of users who actually don't know how to work with data (did I just say that aloud?). Chartio caps at 20 people. RJ capped at 50-100 (and later became Stitch for that reason). We haven't seen where Metabase caps, but I bet it is in a similar range. Very few BI products have actually surpassed 100 users at target installations. And beyond 1,000 is a real challenge that only few, and even then with a lot of assistance, can support: Tableau, Looker, Microstrategy, maybe Birst, maybe Domo.
Also, a combination of BI with LookML is a complicated product. During my days at Looker, we were handling 50+ bugs / week, and filing 1,000+ tickets. Every day we were filing over a 100 new features.
So with all that, the question is, is it really worth the struggle? What's the end vision for supporting this? Why should someone who implements BI for a living bet on this product?
Also, if you think this is a rip off of LookML, you should take a look at what GitLab is doing with Meltano. They completely jacked LookML.
It's the backlinks that allow it to rank. Getting them requires a lot of knowledge & work, like publishing articles on Medium or receiving links on HN.
Unsurprisingly, it looks like the creator is an SEO-expert with years of experience and dozens of projects.