To give some context, Amphi is a low-code ETL tool for both structured and unstructured data. The key use cases include file integration, data preparation, data migration, and creating data pipelines for AI tasks like data extraction and RAG. What sets it apart from traditional ETL tools is that it generates Python code that you own and can deploy anywhere. Amphi is available as a standalone web app or as a JupyterLab extension.
To be fair, the only place the words ‘open’ and ‘source‘ appear in the readme are once in a sub-heading, where it’s phrased ‘open-source’. It’s clearly labelled ELv2.
Possibly more of a subtle miscommunication or misunderstanding than a deliberate lie.
i liked the idea of leveraging jupyterlab as server. data engineers/scientists already use jupyter, so this is neat idea.
custom extension for jupyterlab is a great way to leverage existing jupyterlab install base: not everyone will be willing to install and jump through hoops to install software X, but installing extension is one pip install away and no need to run separate process, since you are running inside jupyterlab server.
this reminds of ALTERYX (another drag and drop ETL tool)
Thanks! Being based on JupyterLab also allows Amphi to benefit from the vast ecosystem of extensions already available, such as the Git extension or using different file systems (S3).
Some users pointed out they were Alteryx users but liked the Python code generation from Amphi :)
With all the data issues strong quality and normalisation I often get the impression that enabling more people with non CS backgrounds to do this work is not necessarily a good thing.
In other words, if writing python and sql is the skill requirement that stops you from making an etl pipeline, maybe do something else.
While I’m not usually on board with gatekeeping, this field already struggles with a huge amount of very non-technical folks and their respective managers, producing overall mediocre results and giving the profession a bad rep, to the point where I now avoid the Data Engineer title and just call it “SWE specialized in large data processing” or something equally as fluffy.
For me it’s more accurate, too. At $work, there’s no difference to how an SWE vs a “DE” works. Same interview process too, DSA, distributed systems etc.
However, having done this for more than a decade, that is relatively rare. It’s usually a mix of GUI tools with zero reproducibility / infra-as-code, untyped python, copy pasted shell scripts, zero tests, zero ci/cd, no lifting/static analysis/code reviews etc., paired with generally zero understanding of the underlying tech. It’s all very formulaic with little to no actual understanding.
I will spare you my usual rant on why a language without a solid type system like python is a horrible idea for this field, too.
Which is why I much appreciate dbt. While some people scoff at the idea of “SQL with jinja templating”, their approach has certainly helped to move DE closer to SWE work, purely by virtue of their value prop mostly being exactly that. And it works out great.
So if Bob from accounts needs a new report generating, he has to wait for 6 month for an IT guy to do it? Who probably won't do a very good job, because he doesn't understand what Bob needs as well as Bob does? Bob is going to hack something horrific together in Excel instead. Better surely to let Bob have a GUI point and click tool more appropriate to the job?
On the other hand, Bob keeps asking for a self-serve reporting tool but in my experience, he doesn't actually want to use it. We went the route of putting all the data into a lake and hooking up GUI reporting tools to it and what did we get? Bob doesn't understand this or that column, Bob made a report that is fundamentally flawed, Bob sent a request for the engineers to make him a report using the tool and so on. Bob wanted something, or someone else, to do the work for him. When it became apparent that the tool isn't magic and can neither read minds nor divinate the true meaning of data in the DB, it became the engineer's problem again. So why not let engineers use the tools they prefer?
Bob doesn't write a specification, because Bob doesn't know as well what he wants. He will have to explore, try out until he reaches something that he can work with. Nobody is willing to spend time planning and documenting stuff. Everyone feels one youtube away from being expert in software development.
This is elitist and frankly, unhelpful. The answer to a skills shortage is not a practitioner lockdown, but policy, training, guidance and mentoring. If you're stuck in start up land and you have this issue, you have hired the wrong skills. If you're encountering this in enterprise land, your organisation, and potentially you depending on your position of influence, should be angling to improve compliance and literacy not through obstruction but through policy and upskilling. Failing to do so will kill your ability to innovate.
FWIW, while I disagree with the parent comment, I don't see you arguing against it.
They actually implied that you should try upskilling first — but if that fails, you shouldn't be doing ETL yourself.
I mostly disagree with the parent comment because there's so many things one can easily do up to a level, and then when the going gets tough, you need to call in an expert. Eg. most people can operate a screwdriver or impact driver to fix things, but to fix some problems, you really need a trained technician (or well, an experienced DIY person, but that's not everybody).
The fact that you are not strong enough to screw in an M14 bolt does not mean you should be forbidden from using an impact driver: tools are there to help you. The logic of the parent comment was seemingly that if you are not strong enough to tighten an M14 bolt, you probably don't know what you are doing regardless of the type of the bolt you are tightening, so you should simply not do it.
The point I agree with in a parent comment is that not everybody can achieve a similar level of proficiency: while upskilling and improving/simplifying tools can get you most of the way there, there's always going to be that extra bit that requires a sudden, sharp jump in knowledge, smartness or experience to be able to deal with it.
>This is elitist and frankly, unhelpful. The answer to a skills shortage is not a practitioner lockdown, but policy, training, guidance and mentoring.
I think the point is that these tools have their own learning curve, and non-tech business people are not doing it well, either; how much different is it from learning SQL? Which one is more broadly valuable and transferrable as skill?
If this is the career you want (data or data-adjacent), why not just learn SQL? There are far more learning resources and the value of the knowledge will assuredly outlast any low-code tool.
That is source available, not open source. The term "open-source" is widely used to describe software that is licensed using a specific set of software licenses that grant certain freedoms to users. You can read more here[0]
It's open source if you're using language like a normal human being. If you're a bit of a pedant and wish everyone to adhere to definitions imposed from on high regardless of real-world usage, it's absolutely not open source.
Hey, Amphi's developer here. Those two tools are great, big fan of dlt myself :)
However, Amphi is a low-code solution while those two are code-based. Also, those two focus on the ingestion part (EL) while Amphi is focusing on different ETL use-cases (file integration, data preparation, AI pipelines).
I was not familiar with the acronym ETL and it is not explained anywhere in the website! My feedback would be to at least write it once, on the first instance so others like me will know what they are reading :)
It is a common term and practice among enterprise software users, i.e. generally medium or large companies that use packaged plus custom software for their business needs.
ETL is not common among startups, because they have a different focus, infrastructure and scale.
In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed (including cleaning), and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations.
Thanks for pointing that out, it's actually mentioned (Extract, transform and load ...) in the very first sentence below the tagline, but if you didn't get it then it's not clear.
Great, thanks for sharing. I was familiar with Dask and cudf separately but not this one.I was planning to implement dask support through Modin but I'll definitely take a look at dask_cudf.
It's published on GitHub under license ELv2 - Elastic License v2. This does not meet the open source definition, so indeed it's not Open Source. ELv2 is an open source sibling though, closer than many other openish licenses:
https://www.elastic.co/pricing/faq/licensing
Still, Amphi should not claim to be 'Open Source'.
To give some context, Amphi is a low-code ETL tool for both structured and unstructured data. The key use cases include file integration, data preparation, data migration, and creating data pipelines for AI tasks like data extraction and RAG. What sets it apart from traditional ETL tools is that it generates Python code that you own and can deploy anywhere. Amphi is available as a standalone web app or as a JupyterLab extension.
Visit the GitHub: https://github.com/amphi-ai/amphi-etl Give it a try and let me know what you think
Possibly more of a subtle miscommunication or misunderstanding than a deliberate lie.
custom extension for jupyterlab is a great way to leverage existing jupyterlab install base: not everyone will be willing to install and jump through hoops to install software X, but installing extension is one pip install away and no need to run separate process, since you are running inside jupyterlab server.
this reminds of ALTERYX (another drag and drop ETL tool)
Some users pointed out they were Alteryx users but liked the Python code generation from Amphi :)
In other words, if writing python and sql is the skill requirement that stops you from making an etl pipeline, maybe do something else.
For me it’s more accurate, too. At $work, there’s no difference to how an SWE vs a “DE” works. Same interview process too, DSA, distributed systems etc.
However, having done this for more than a decade, that is relatively rare. It’s usually a mix of GUI tools with zero reproducibility / infra-as-code, untyped python, copy pasted shell scripts, zero tests, zero ci/cd, no lifting/static analysis/code reviews etc., paired with generally zero understanding of the underlying tech. It’s all very formulaic with little to no actual understanding.
I will spare you my usual rant on why a language without a solid type system like python is a horrible idea for this field, too.
Which is why I much appreciate dbt. While some people scoff at the idea of “SQL with jinja templating”, their approach has certainly helped to move DE closer to SWE work, purely by virtue of their value prop mostly being exactly that. And it works out great.
They actually implied that you should try upskilling first — but if that fails, you shouldn't be doing ETL yourself.
I mostly disagree with the parent comment because there's so many things one can easily do up to a level, and then when the going gets tough, you need to call in an expert. Eg. most people can operate a screwdriver or impact driver to fix things, but to fix some problems, you really need a trained technician (or well, an experienced DIY person, but that's not everybody).
The fact that you are not strong enough to screw in an M14 bolt does not mean you should be forbidden from using an impact driver: tools are there to help you. The logic of the parent comment was seemingly that if you are not strong enough to tighten an M14 bolt, you probably don't know what you are doing regardless of the type of the bolt you are tightening, so you should simply not do it.
The point I agree with in a parent comment is that not everybody can achieve a similar level of proficiency: while upskilling and improving/simplifying tools can get you most of the way there, there's always going to be that extra bit that requires a sudden, sharp jump in knowledge, smartness or experience to be able to deal with it.
I think the point is that these tools have their own learning curve, and non-tech business people are not doing it well, either; how much different is it from learning SQL? Which one is more broadly valuable and transferrable as skill?
If this is the career you want (data or data-adjacent), why not just learn SQL? There are far more learning resources and the value of the knowledge will assuredly outlast any low-code tool.
what makes it not OSS?
[0] https://opensource.org/licenses
https://dlthub.com/
https://hub.meltano.com/
we[0] use meltano in production and I'm happy with it. I've played around with dlt and it's great, just not a ton of sources yet.
0 - https://www.definite.app/
However, Amphi is a low-code solution while those two are code-based. Also, those two focus on the ingestion part (EL) while Amphi is focusing on different ETL use-cases (file integration, data preparation, AI pipelines).
Good luck! Looks cool.
It seems Definite's use case is focused on connecting to lots of data sources. For much smaller scale, how does Amphi compare?
It looks like Amphi could handle some low code transformations (the "T" in ETL), but calling it ETL feels like a stretch.
So to rephrase a bit, if you're looking for an open source, python based Fivetran alternatives, dlt and meltano would be my picks.
0 - https://mattturck.com/landscape/mad2024.pdf)
Wikipedia:
https://en.m.wikipedia.org/wiki/Extract,_transform,_load
I'll just add:
It is a common term and practice among enterprise software users, i.e. generally medium or large companies that use packaged plus custom software for their business needs.
ETL is not common among startups, because they have a different focus, infrastructure and scale.
https://en.wikipedia.org/wiki/Extract,_transform,_load
Extract, transform and load...
vs
ETL: Extract, transform and load data...
Extract, transform and load (ETL) data...
Extract, Transform and Load data...
Still, Amphi should not claim to be 'Open Source'.