I've seen a lot of products similar to this, and it looks like one of the nicer ones. But the use case in the video and screenshots make me immediately think it wouldn't be viable in any of the organizations I've written SQL for in the past 10 years.
At least 95% of the challenge in querying data doesn't come from the difficulty of writing SQL statements IMO, but from the complexity/brokenness of the data itself. Especially with the move to unstructured data streams, most data warehouses seem to have become pretty bad when it comes to extracting truth out of the information within. Many business users know some SQL but blanch at creating coherent reports from what's available to them. What BI people are paid for seems to be having knowledge about the problems particular to a domain.
If your data is relatively clean and follows a good model, this would be a great way to help someone join, group, and aggregate their data without knowing SQL. I think a lot of people would use Tableau for this if it didn't also have a steep learning curve / high expense. However as soon as you get into data where you're writing case statements, coalescing null fields, matching on different data types, decoding, partitioning over streaming data etc., it doesn't help someone without knowledge of the caveats within the data sources themselves. Show me someone who doesn't know SQL using this to produce insight out of compromised data and I will be impressed.
I fully agree with your experience here. We have a super hard mission at Chartio - it's not just about the interface but also how the data is setup. The interface, being as flexible as it is and also enabling full schema (instead of dataset) browsing is a pretty big part though in also allowing a more agile version of data modeling. It had that very much in mind and we've written a book (soon to be published with Wiley) on proper modern data governance techniques.
Our next phase is to help people get to that cleaner source of truth much more quickly than traditional dimensional modeling approaches. Tools like Visual SQL and DBT (https://www.getdbt.com) are really changing the complexities here.
I'd love to see the data modeling book. I spend a lot of energy shouting the virtues of The Data Warehouse Toolkit into the void. You are right it is outdated but it isn't entirely (or even mostly) wrong.
My coworkers seem much more interested in making a bigger EMR or adding nodes to Redshift than designing a reasonable data mart because "star schemas don't scale". I'm interested to see what you come up with, it is a huge gap in the current literature.
I would love to be involved with that book. I'm a data engineer myself and I have built SQLBucket, a python library I have been told is similar to getdbt.com (although I'm not familiar with DBT, the similarities have been mentioned to me on various occasions).
Very interesting what you wrote in your article. Most interesting is how you seem to realize while designing your product that the spreadsheet surface is the most intuitive to users. They like also the baked results you present quickly. So you can see really the problem of your customer then.
What is really good is to assemble a library of visual queries for the customer. This is a good idea for the reason that many users have the same fundamental types of queries on their data. When finally you have enough of the basic queries that the user can do useful work without programming then you can find a way to customize this yes.
Have you data on how many similar queries customers use? Then you should know how to create the basic set of important operations.
If you aren't investing the work up front to make your data useful then no amount of tooling or magic beans will make writing meaningful SQL easy. If your reports (SQL) are hard to write it's because you have bad data.
Having said that making data exploration easier is always a worthwhile exercise. The more business people can self-serve their questions the better. It falls to data people to make that possible not just with tools but with the data itself.
SQL is easy like solving differential equations is easy. If you know how, you just work througH it, but that’s only because you already learned all the prerequisites and then spend a year getting good at it. Anyways the point is that either way, you’ll never have an entire organization working at this level with SQL or solving equations and that’s fine. The CEO doesn’t need it, sales don’t need it. They do need numbers though, so having a dumbed down interface that can show some curves after clicking “net sales” and “country” let’s them move forwards with what matters to them, without having to learn the difference between an inner and outer join, which doesn’t matter to them.
Not everyone needs to be a programmer even if they do need the value typically buried in programmer interfaces.
Questions to be made on data are always an afterthought, after the systems to gather and produce this data have already been designed.
The idea that it’s ok to just give a BI tool to business people later, and not involve them at the beginning to inform the system and data model design, is maybe why we’re in the tar pit.
A really excellent process/tool would help business figure out what kind of questions they need answered, and work backwards to the data model and implementation.
What do you guys think about a system that would 'guide' the user through the process of creating a star schema data warehouse (Kimball style)?
The advantage of this approach would be the fact that you do the dirty laundry upfront during the modeling & population phases. The end result is a data structure that is fool-proof, i.e. there is only one way to join facts/dimensions, it's self-documenting. In fact pre-joined views could automatically be created (and persisted if need be), giving the business user a clean structure to interact with.
The payoff from having some engineers doing the T of ETL (transforming the data to be more queryable, denormalizing values etc) is insane if you have problems in democratizing data analysis/reporting
The dumb thing for people who don't know what this looks like: make a separate database. Create tables based on what you often want to query (but don't try to keep the same shape as the base data!). Then figure out how to fill in those tables from the base data
We use SAS Enterprise Guide in my org it has a visual SQL generator similar to what is previewed it is quite good for simple queries but I find it falls down when you introduce complexities.
Joins between tables are one of the big pain points for me. I am not sure how the demoed product does joins but in SAS EG you join tables by click and dragging a line between the two columns you want to join together and then it gives you a pop up to select Inner, Outer, Left, Right etc. (although it uses plain English. I.e rather than saying "left join" it says something like "All Rows from Table 1 and matching Rows from Table 2) it also gives you a visual Venn diagram type preview. This is frustrating for a few reasons: if table is large i.e several hundred columns you have to scroll for an eternity to find the columns you want to join against which is frustrating (a drop down box you can use keyboard shortcuts to jump to column name would be much better UI IMO). It is difficult to do complex joins "Where A.Col1 Between B.Col1 and B.Col2" for example. Maybe just my org but this is common feature of our data we have a lot of Event driven stuff with Start and End date and then Raw time series data you need to aggregate between the event frames. SAS's Timeseries stuff is very good but SQL side not so much...
It is also very easy for unsuspecting users to join on unindexed columns which leads to very poor database performance. In general there are a lot of performance footguns with generated SQL user can have working query then change something in the GUI and suddenly query that ran in subsecond takes several minutes. I have developed the habit that for complex I use GUI to generate SQL then hand edit it before running it to make sure it will be performant. Or I just hand write it to begin with.
Pattern matching and filtering is another pain point the query editor lets you use "LIKE" but does a poor job explaining to user how it works people who only use the GUI are surprised when I explain you use "%" and "_" to do character expansion a lot of people blindly assume "*" is used and then complain their query isn't working
Microsoft Power BI also has a query editor I've used this program a little bit but do not have as much experience from what I experienced using it I found it to be a bit less flexible it seemed easier to do the data extract and transformation first then load it into Power BI.
Absolutely. I've been employed in healthcare IT for the last 5 years, doing data extracts for researchers and reporting. Many people don't quite appreciate the underlying complexities of extracting data from not only large EMR systems such as Cerner (with over 6000 schema-less tables), but integrating that with the countless other systems the hospital employs for things such as radiology, pathology, ICD-coding, etc. Many data aren't entered as you would expect, documentation is often lacking and there are very few people who can tell you where exactly the data sits, and how data is actually entered by medical staff doing various work all across the campuses. A simple research request becomes immensely complex when multiple systems are involved and you're using something as complex and as evolving as health data.
So I agree that while this tool may be handy for some, the real challenge with this sort of work is knowing where the (often unstructured) data sits, how it's entered and by whom, and how to extract meaning from it.
I think of use cases when I see this product other than what you mentioned. I don’t see this as an end user product for analysis. IMO a product like this can be used for iterating over backend design with other developers when discussing a new feature or model change. Imagine walking through a complex data model when discussing a new feature and needing to quickly visualize stuff. Now I write SQL by hand pretty quickly but I would love to use a product like this in a team design discussion where we want to visualize what we would be pulling from the DB and instead of having to hand write all the joins and filters as you walk 5-6 tables you can quickly and easily see what data is going to be provided to your DAO layer. You can spend way more time focusing on the discussion and less time waiting around for data access changes to be visualized. Then in the end the developer can take the auto generated SQL and use it as a base to start writing code for the DAO layer. Then the developer can do the optimizations and cleanup to the query you mentioned in your post.
So the agile world of data we're looking to bring about is one where your DAO or marts, are still regular schemas - not just flat tables or cubes. So the developer teams would be creating these mart schemas, and then business users could be doing much of their own self-serve exploration and visualization with Visual SQL. I wrote about some of these modern architecture approaches here https://chartio.com/blog/cloud-data-management-book-launch/
First I want to applaud the effort here, this looks like a great tool for BI/dash boarding and Ive signed up to demo it.
I'm not here to poo-poo your product at all but want to explain why I think this cant work in all cases, and that is totally fine!
The application that is my bread and butter is about 25 years old and started as a mid 90s power builder application. Its since migrated to a 3 tier .NET enterprise app and the data model has been dragged along with it largely unchanged. The data model is full of cruft and unnatural keys that are the result of a ton of technical debt. The newer tables in the database are more sane and follow most SQL best practices, but there is a large portion that was written (and I know the original data architect who SWEARS this is true) in no-vowel notation, unless the table starts with a vowel, or maybe throw a random e in there too. A few examples:
Sane column name -> actual column name
AccountDetailAccountCode -> AcctDtlAcctCdeID
AccountingPeriodID -> AccntngPrdID
PlannedMovementBatchID -> PlnndMvtBtchID
Provision -> prvsn
There are about 1000 tables in this database, so part of the problem with SQL is that you have to know about the table structure to know about the database. Another is that a lot of this data is constructed in stored procedures, functions or views.
Most people need at least 12-18 months to totally grasp the application and the data model in a real way. I have to create database diagrams still to find a path to get the data I want, and Ive been at it for 10+ years. What I would love to see is a way to show table relationships that is filterable in some way- e.g. not just because there is a key relationship. To me its much more useful if I knew that table a is related to table b and both table a and table b have some threshold of row counts. I don't care if there is a key that joins table a and table b if table b has 0 rows. Building that intelligent meta data about the database would be hugely helpful.
> At least 95% of the challenge in querying data doesn't come from the difficulty of writing SQL statements IMO, but from the complexity/brokenness of the data itself.
More and more of our consulting gigs are this - collating and cleaning crappy data, and then putting BI reporting on top of it.
I agree with this as well. Products like this seem to be designed primarily for well-structured and lovingly maintained databases. If you are lucky enough to have one of those, congrats!
Dave, founder of Chartio here. We're so excited to launch what we call Visual SQL today. It's been a lot of work based on customer feedback and extensive prototyping and user testing. If you'd rather skip the story attached here you can also check out our product walkthrough video or give it a spin yourself here:
I sit in a monthly "metrics review" meeting where we mostly speculate about why the dashboards are broken, and what the data team will have to do to fix it.
If our TPM was able to self-service this through an intuitive interface it would be a massive productivity win.
Same here! Seeing how useful tableau (a far less powerful tool) has been for our company, it's clear that there is a big market for empowering everyone to dig for data-driven insights.
I love the product. I am sure most people don't want to allow direct connections to their databases. I haven't used the software so it might put a lot of load on my database that the server is not able to handle. It will be best if I can import a SQL dump and then use the software on a copy of my data. The alternative right now is to run a separate server just so I can try out the software.
It's not clear to me, but this looks like a SaaS offering and not a client I can run in my own environment? If that's the case, I'm sure there's a market, but definitely not something I could use.
Looks nice, but like others here we need something that can be self hosted or even better – run offline.
Somewhat related: does anyone know of any good components/libraries that can be embedded in a web page to let users interactively build SQL queries of reasonable complexity? By that I mean something that not only lets you pick columns and filter and maybe group things, but also do joins on other tables as well.
Edit: just to clarify, it doesn't have to be be visual only. In fact, something that lets you edit a query interactively either by visual components or actual SQL code is perfect, especially if it's bidirectional so that edits in the code also show in the UI and vice versa.
Thanks for this comment. I am working on something very close to this vision at https://boomadmin.com
The idea is to have an open source (but commercial use paid license) that people can self host and totally have it offline or have it hosted on cloud with a single click.
I quit my job last month to work on it fulltime to work on an MVP. If you’re interested please add your email for an early beta.
Just want to clarify – it doesn't have to be no code. In fact, a bidirectional workflow where you can edit the query either by writing SQL or clicking buttons is ideal.
For offline I would recommend SQLeo[0]. It has been around for a long time but I have yet to find something more powerful when exploring a new database or visually constructing queries. You can also visualise queries not created using SQLeo. Don't be put off by the dated look and feel, you can change the look and feel using some startup parameters.
It would reallllly be nice if you set up a web demo. I looked all over trying to find the link that would let me just TRY this marvelous thing you're telling me about, but it's all behind sign-ups and accounts and I just want to see the product. Not in a video-- I mean play with it.
Can you make a demo session with dummy data for people with play with?
Ah yeah, you do have to signup to try it out (though it does start you off with demo data). We wanted to offer a non-login version but it makes things more complicated for a company that obviously needs to prioritize security highly.
As someone who used to work on ETL systems, from a distance, this looks basically like Tableau to me. (My first thought was actually "oh, neat, someone made an open-source Tableau clone!", but it's not open-source, is it?) You should describe how it's different.
If I'm going to pay money for a hosted application for serious work, I'm going to tend to prefer an established service that I'm confident will be around for a while. I'm sure you've got something good here but I'm not sure what it is.
We do compete very directly with Tableau and are often used along side it as well. In the article I explain a lot of what's different about it from other BI products like Tableau. The main things are - it's way more flexible as a visual language, and through a lot of prototypes and user tests has shown 80% of business users can write significant queries on their first try.
As for an established service - Chartio's been around for almost 10 years, we're profitable, and are the main data interface behind some really great companies and brands https://chartio.com/customers/
Over the years, we’ve found that even power users who know they’ll eventually go into SQL mode prefer to start in the visual mode, as the grouping, date formatting, and joins are all done automatically.
This is huge, right here. I can use and understand SQL well enough, but the syntax annoys me constantly. Avoiding that would be a huge boon.
Yeah - changing date formats, double typing things you're grouping and ordering by, remembering the oddities of each dialect, typing out full join paths - it's a nice experience to have that done automatically.
Right now the SQL we write is very proper - with quotes around all of the column names and table names listed before each column name. It's not what a human would write. I'd love to one day make it a little more human so that the switching into SQL mode will feel even better. It'll be a fun project.
What do you think of Looker's idea of turning SQL into a higher level analytic DSL (which can be written/implemented by data engineers)? This DSL then makes it easy for end users to do a more complicated analytics.
One issue I've had with pure SQL for analytic purposes is that a user can't build a library of higher-order functions that can be parameterized (imagine a window function to do a cumulative returns across various time periods, applied to different underlying source columns), so its difficult to build up a common business domain language without taking the typical approach of constructing SQL through glueing string fragments together typical of most programming language SQL libraries.
All of my other SQL tools are running right on my machine. I hope there will at least a self-host version of this (or something just like it) available.
It would be amazing to have something like this embedded in Azure Data Studio.
Ah, unfortunately there isn't. We're 100% cloud hosted - though we call ourselves a Hybrid cloud solution because through our reverse SSH tunnel connections (where your servers SSH into ours) we also work well with on-prem data.
If security is your main reason for wanting self-hosting you may be interested to know we're also SOC2, HIPAA, and GDPR compliant.
That's great you're compliant with those! For HIPAA, do you enter into a BAA with prospective customers who deal in HIPAA data (PHI)?
EDIT>> The implication being that you may not be HIPAA compliant if you don't enter into a BAA. IANAL, so I defer to the experts, but I figured I'd ask the original question anyway. If you are above board on all of this, I will definitely take this tool to my boss.
Can we self-host, please? Is there a standalone app? We deal in PHI (HIPAA) and we cannot have a tool like this have any access to our data store unless it's private and secure.
We already have a live product using servers from a host that specializes in HIPAA. I'm not asking about how to host HIPAA-related data. I'm asking specifically about being able to use this tool. I'm not going to hand over credentials to our database to an untrusted party (meaning, not our company). So the hope is that we can either self-host this tool, or use it as a desktop application.
At least 95% of the challenge in querying data doesn't come from the difficulty of writing SQL statements IMO, but from the complexity/brokenness of the data itself. Especially with the move to unstructured data streams, most data warehouses seem to have become pretty bad when it comes to extracting truth out of the information within. Many business users know some SQL but blanch at creating coherent reports from what's available to them. What BI people are paid for seems to be having knowledge about the problems particular to a domain.
If your data is relatively clean and follows a good model, this would be a great way to help someone join, group, and aggregate their data without knowing SQL. I think a lot of people would use Tableau for this if it didn't also have a steep learning curve / high expense. However as soon as you get into data where you're writing case statements, coalescing null fields, matching on different data types, decoding, partitioning over streaming data etc., it doesn't help someone without knowledge of the caveats within the data sources themselves. Show me someone who doesn't know SQL using this to produce insight out of compromised data and I will be impressed.
https://dataschool.com/data-governance/
Our next phase is to help people get to that cleaner source of truth much more quickly than traditional dimensional modeling approaches. Tools like Visual SQL and DBT (https://www.getdbt.com) are really changing the complexities here.
My coworkers seem much more interested in making a bigger EMR or adding nodes to Redshift than designing a reasonable data mart because "star schemas don't scale". I'm interested to see what you come up with, it is a huge gap in the current literature.
Shameless plug: http://github.com/socialpoint-labs/sqlbucket
What is really good is to assemble a library of visual queries for the customer. This is a good idea for the reason that many users have the same fundamental types of queries on their data. When finally you have enough of the basic queries that the user can do useful work without programming then you can find a way to customize this yes.
Have you data on how many similar queries customers use? Then you should know how to create the basic set of important operations.
SQL is easy. Data is hard.
If you aren't investing the work up front to make your data useful then no amount of tooling or magic beans will make writing meaningful SQL easy. If your reports (SQL) are hard to write it's because you have bad data.
Having said that making data exploration easier is always a worthwhile exercise. The more business people can self-serve their questions the better. It falls to data people to make that possible not just with tools but with the data itself.
Not everyone needs to be a programmer even if they do need the value typically buried in programmer interfaces.
Questions to be made on data are always an afterthought, after the systems to gather and produce this data have already been designed.
The idea that it’s ok to just give a BI tool to business people later, and not involve them at the beginning to inform the system and data model design, is maybe why we’re in the tar pit.
A really excellent process/tool would help business figure out what kind of questions they need answered, and work backwards to the data model and implementation.
The advantage of this approach would be the fact that you do the dirty laundry upfront during the modeling & population phases. The end result is a data structure that is fool-proof, i.e. there is only one way to join facts/dimensions, it's self-documenting. In fact pre-joined views could automatically be created (and persisted if need be), giving the business user a clean structure to interact with.
The dumb thing for people who don't know what this looks like: make a separate database. Create tables based on what you often want to query (but don't try to keep the same shape as the base data!). Then figure out how to fill in those tables from the base data
Joins between tables are one of the big pain points for me. I am not sure how the demoed product does joins but in SAS EG you join tables by click and dragging a line between the two columns you want to join together and then it gives you a pop up to select Inner, Outer, Left, Right etc. (although it uses plain English. I.e rather than saying "left join" it says something like "All Rows from Table 1 and matching Rows from Table 2) it also gives you a visual Venn diagram type preview. This is frustrating for a few reasons: if table is large i.e several hundred columns you have to scroll for an eternity to find the columns you want to join against which is frustrating (a drop down box you can use keyboard shortcuts to jump to column name would be much better UI IMO). It is difficult to do complex joins "Where A.Col1 Between B.Col1 and B.Col2" for example. Maybe just my org but this is common feature of our data we have a lot of Event driven stuff with Start and End date and then Raw time series data you need to aggregate between the event frames. SAS's Timeseries stuff is very good but SQL side not so much...
It is also very easy for unsuspecting users to join on unindexed columns which leads to very poor database performance. In general there are a lot of performance footguns with generated SQL user can have working query then change something in the GUI and suddenly query that ran in subsecond takes several minutes. I have developed the habit that for complex I use GUI to generate SQL then hand edit it before running it to make sure it will be performant. Or I just hand write it to begin with.
Pattern matching and filtering is another pain point the query editor lets you use "LIKE" but does a poor job explaining to user how it works people who only use the GUI are surprised when I explain you use "%" and "_" to do character expansion a lot of people blindly assume "*" is used and then complain their query isn't working
Microsoft Power BI also has a query editor I've used this program a little bit but do not have as much experience from what I experienced using it I found it to be a bit less flexible it seemed easier to do the data extract and transformation first then load it into Power BI.
So I agree that while this tool may be handy for some, the real challenge with this sort of work is knowing where the (often unstructured) data sits, how it's entered and by whom, and how to extract meaning from it.
I'm not here to poo-poo your product at all but want to explain why I think this cant work in all cases, and that is totally fine!
The application that is my bread and butter is about 25 years old and started as a mid 90s power builder application. Its since migrated to a 3 tier .NET enterprise app and the data model has been dragged along with it largely unchanged. The data model is full of cruft and unnatural keys that are the result of a ton of technical debt. The newer tables in the database are more sane and follow most SQL best practices, but there is a large portion that was written (and I know the original data architect who SWEARS this is true) in no-vowel notation, unless the table starts with a vowel, or maybe throw a random e in there too. A few examples:
Sane column name -> actual column name AccountDetailAccountCode -> AcctDtlAcctCdeID AccountingPeriodID -> AccntngPrdID PlannedMovementBatchID -> PlnndMvtBtchID Provision -> prvsn
There are about 1000 tables in this database, so part of the problem with SQL is that you have to know about the table structure to know about the database. Another is that a lot of this data is constructed in stored procedures, functions or views.
Most people need at least 12-18 months to totally grasp the application and the data model in a real way. I have to create database diagrams still to find a path to get the data I want, and Ive been at it for 10+ years. What I would love to see is a way to show table relationships that is filterable in some way- e.g. not just because there is a key relationship. To me its much more useful if I knew that table a is related to table b and both table a and table b have some threshold of row counts. I don't care if there is a key that joins table a and table b if table b has 0 rows. Building that intelligent meta data about the database would be hugely helpful.
More and more of our consulting gigs are this - collating and cleaning crappy data, and then putting BI reporting on top of it.
https://chartio.com/product/visual-sql/
I sit in a monthly "metrics review" meeting where we mostly speculate about why the dashboards are broken, and what the data team will have to do to fix it.
If our TPM was able to self-service this through an intuitive interface it would be a massive productivity win.
Somewhat related: does anyone know of any good components/libraries that can be embedded in a web page to let users interactively build SQL queries of reasonable complexity? By that I mean something that not only lets you pick columns and filter and maybe group things, but also do joins on other tables as well.
Edit: just to clarify, it doesn't have to be be visual only. In fact, something that lets you edit a query interactively either by visual components or actual SQL code is perfect, especially if it's bidirectional so that edits in the code also show in the UI and vice versa.
https://www.metabase.com/docs/latest/users-guide/custom-ques...
The idea is to have an open source (but commercial use paid license) that people can self host and totally have it offline or have it hosted on cloud with a single click.
I quit my job last month to work on it fulltime to work on an MVP. If you’re interested please add your email for an early beta.
I would love your feedback.
Just want to clarify – it doesn't have to be no code. In fact, a bidirectional workflow where you can edit the query either by writing SQL or clicking buttons is ideal.
[0]https://sourceforge.net/projects/sqleo/
Can you make a demo session with dummy data for people with play with?
This video is maybe the closest you can get without trying yourself - https://www.youtube.com/watch?v=YBXMTipHGfQ
If I'm going to pay money for a hosted application for serious work, I'm going to tend to prefer an established service that I'm confident will be around for a while. I'm sure you've got something good here but I'm not sure what it is.
As for an established service - Chartio's been around for almost 10 years, we're profitable, and are the main data interface behind some really great companies and brands https://chartio.com/customers/
Right now the SQL we write is very proper - with quotes around all of the column names and table names listed before each column name. It's not what a human would write. I'd love to one day make it a little more human so that the switching into SQL mode will feel even better. It'll be a fun project.
One issue I've had with pure SQL for analytic purposes is that a user can't build a library of higher-order functions that can be parameterized (imagine a window function to do a cumulative returns across various time periods, applied to different underlying source columns), so its difficult to build up a common business domain language without taking the typical approach of constructing SQL through glueing string fragments together typical of most programming language SQL libraries.
It would be amazing to have something like this embedded in Azure Data Studio.
If security is your main reason for wanting self-hosting you may be interested to know we're also SOC2, HIPAA, and GDPR compliant.
EDIT>> The implication being that you may not be HIPAA compliant if you don't enter into a BAA. IANAL, so I defer to the experts, but I figured I'd ask the original question anyway. If you are above board on all of this, I will definitely take this tool to my boss.
https://aws.amazon.com/health/healthcare-compliance/?nc=sn&l...
https://chartio.com/product/security/
Redash can be self-hosted
(and our SeekTable also has on-premise version)