These agents aren't super smart: just few PDFs for context plus a few sentences system prompts.
I do get what I want in 80% of use cases (not measured, just a feeling).
> Previous Claude models often made unnecessary refusals that suggested a lack of contextual understanding. We’ve made meaningful progress in this area: Opus, Sonnet, and Haiku are significantly less likely to refuse to answer prompts that border on the system’s guardrails than previous generations of models. As shown below, the Claude 3 models show a more nuanced understanding of requests, recognize real harm, and refuse to answer harmless prompts much less often.
I get it - you, as a company, with a mission and customers, don't want to be selling a product that can teach any random person who comes along how to make meth/bombs/etc. And at the end of the day it is that - a product you're making, and you can do with it what you wish.
But at the same time - I feel offended when I'm running a model on MY computer that I asked it to do/give me something, and it refuses. I have to reason and "trick" it into doing my bidding. It's my goddamn computer - it should do what it's told to do. To object, to defy its owner's bidding, seems like an affront to the relationship between humans and their tools.
If I want to use a hammer on a screw, that's my call - if it works or not is not the hammer's "choice".
Why are we so dead set on creating AI tools that refuse the commands of their owners in the name of "safety" as defined by some 3rd party? Why don't I get full control over what I consider safe or not depending on my use case?
Please take this in a nice way: I can't see why I would use this over ChatbotUI+Ollama https://github.com/mckaywrigley/chatbot-ui
Seem the only advantage is having it as MacOS native app and only real distinction is maybe fast import and search - I've yet to try that though.
ChatbotUI (and other similar stuff) are cross-platform, customizable, private, debuggable. I'm easily able to see what it's trying to do.
Selfishly, would you mind sharing literature or blog posts that led you to this level of understanding of LLMs? I'm trying hard to understand the inner workings via experiments but definitely far behind your expertise.
Thanks
While problem is not trivial, my initial reaction was "it can't be 3, it's too obvious". And it seems that a lot of people in the comments here do get to the right answer on their own.
So, why not hundreds of reports?
As someone said here, I'm extremely grateful for free quality higher education I have received in my home country. And I'm saddened that many are in student debts for decades.
I'm grateful to be in the best health I've been in last 10 years.
I'm grateful to myself for quitting alcohol more than a year ago and sticking with it.
I'm grateful, for the first time, to feel financially stable for many years to come.
I'm grateful to United States of America for being our home and safe harbor for the past 10 years, especially last 3, given the global instability.
How I'm using narrative.bi: - A high-level report of WoW, and MoM changes on our website traffic that land in my email. - Alternative to GA4 UI, which drives me nuts.
What I don't get yet and I want to see implemented (I'll definitely pay for this): - Dimensional analysis: Which dimensions contribute to the topline change the most. I.e. Say unique users increased by 30% WoW, what top 5 dimensions have contributed to this change? Country? Device? Etc - Global event correlation: are there any global events that influence the metrics change? Maybe a holiday in the Country?
[1] https://twitter.com/avysotsky/status/1723403222906286185
But generally rolling your own has other benefits.