On the FAQ page there are links to images of the end result / physical
Instead of: Gihub Link => bizcardz => FAQ => "Show me the end result"
On the FAQ page there are links to images of the end result / physical
Instead of: Gihub Link => bizcardz => FAQ => "Show me the end result"
I'm sorry but this is another example of not checking AI's work. Whatever about the excessive recording, that's one thing, but blindly trusting the AI's output and then using it blindly as a company document for a client is on you.
Ex: Hop on a conference call with a group of people, Person A "leaves early" but doesn't hang up the phone, then the remaining group talks about sensitive info they didn't want Person A to hear.
For example, the article above was insightful. But the authors pointing to 1,000s of disparate workflows that could be solved with the right context, without actually providing 1 concrete example of how he accomplishes this makes the post weaker.
So every message that gets generated by the first LLM is then passed to a second series of LLM requests + a distilled version of the legislation. ex: "Does this message imply likelihood of credit approval (True/False)". Then we can score the original LLM response based on that rubric.
All of the compliance checks are very standardized, and have very little reasoning requirements, since they can mostly be distilled into a series of ~20 booleans.
Let us assume that the author's premise is correct, and LLMs are plenty powerful given the right context. Can an LLM recognize the context deficit and frame the right questions to ask?
They can not: LLMs have no ability to understand when to stop and ask for directions. They routinely produce contradictions, fail simple tasks like counting the letters in a word etc. etc. They can not even reliably execute my "ok modify this text in canvas" vs "leave canvas alone, provide suggestions in chat, apply an edit once approved" instructions.
In our application e use a multi-step check_knowledge_base workflow before and after each LLM request. Pretty much, make a separate LLM request to check the query against the existing context to see if more info is needed, and a second check after generation to see if output text exceeded it's knowledge base.
And the results are really good. Now coding agents in your example are definitely stepwise more complex, but the same guardrails can apply.
We're currently researching surgery on the cache or attention maps for LLMs to have larger batches of images work better. Seems like Sliding window or Infinite Retrieval might be promising directions to go into.
Also - and this is speculation - I think that the jump in multimodal capabilities that we're seeing from models is only going to increase, meaning long-context for images is probably not going to be a huge blocker as models improve.
Ex: Reading contracts or legal documents. Usually a 50 page document that you can't very effectively cherry pick from. Since different clauses or sections will be referenced multiple times across the full document.
In these scenarios, it's almost always better to pass the full document into the LLM rather than running RAG. And if you're passing the full document it's better as text rather than images.
The biggest problem with direct image extraction is multipage documents. We found that single page extraction (OCR=>LLM vs Image=LLM) slightly favored the direct image extraction. But anything beyond 5 images had a sharp fall off in accuracy compared to OCR first.
Which makes sense, long context recall over text is already a hard problem, but that's what LLMs are optimized for. Long context recall over images is still pretty bad.
Caterpillar, John Deer, etc. already have remote operation vehicles. And a lot of provisions on what types of kits can be retrofitted onto their equipment without violating their terms/warranties.
I'm sure this is already something they've taken into consideration, but it seems like this will be more focused on partnerships with existing OEMs rather than selling add on kits to current fleets.
There really wasn’t any need for half the dumb shit they did in that show. It didn’t add to the drama, it just made the whole thing feel completely fake. Which is impressive considering they’re writing largely about real world computing history.
And don’t get me started on the characters themselves. I think I liked maybe half the cast. The others made me cringe every time they were on screen.
It’s such a pity because they could have just as successful show if they refined it a little.
It just had too much of that early 2000's cable TV style drama. Which I understand is required since it was on network tv. I honestly think if it was made again today as a netflix/prime series it would be a lot better.