But the key issue is going to be privacy. I’m not big on LLM, so I’m sorry if this is obvious, but can I use something like this without sending my data outside my own organisation?
It's one thing to show HN / share, its another thing to spam it with your ads.
The earlier post was a report summarizing LLM labeling benchmarking results. This post shares the open source library.
Neither is intended to be an ad. Our hope with sharing these is to demonstrate how LLMs can be used for data labeling, and get feedback from the community
For us human labeling is suprisingly cheap, the main advantage of GPT-4 would be that it would be much faster, since scams are always changing we could general new labels regularly and be continuously retraining our model.
In the end we didn't go down that route, there were several problems:
- GPT-4 accuracy wasn't as good as human labelers. I believe this is because scam messages are intentionally tricky, and require a much more general understanding of the world compared to the datasets used in this article which feature simpler labeling problems. Also, I don't trust that there was no funny business going on in generating the results for this blog, since there is clear conflict of interest with the business that owns it.
- GPT-4 would be consistently fooled by certain types of scams whereas human annotators work off a consensus procedure. This could probably be solved in the future when there's a larger pool of other high-quality LLMs available, and we can pool them for consensus.
- Concern that some PII information gets accidentally sent to OpenAI, of course nobody trusts that those guys will treat our customers data with any level of appropriate ethics.
All the datasets and labeling configs used for these experiments are available in our Github repo (https://github.com/refuel-ai/autolabel) as mentioned in the report. Hope these are useful!
Fixed it for you.
Is there some noise in these labels? Sure! But the relative performance with respect to these is still a valid evaluation
Autolabel is quite orthogonal to this - it's a library that makes interacting with LLMs very easy for labeling text datasets for NLP tasks.
We are actively looking at integrating function calling into Autolabel though, for improving label quality, and support downstream processing.