It’s fascinating that Amazon Web Services have so many overlapping and competing services to achieve the same objective. Efficiency/small footprint was never their approach :D
For example, look how many different types of database they offer (many achieve the same objective but different instantiation)
As others said the product isnt the model, its the API based token usage. Happily selling whatever model you need, with easy integrations from the rest of your aws stack, is the entire point.
This is a digression, but I really wish Amazon would be more normal in their product descriptions.
Amazon is rapidly developing its own jargon such that you need to understand how Amazon talks about things (and its existing product lineup) before you can understand half of what they're saying about a new thing. The way they describe their products seems almost designed to obfuscate what they really do.
Every time they introduce something new, you have to click through several pages of announcements and docs just to ascertain what something actually is (an API, a new type of compute platform, a managed SaaS product?)
Amazontalk: We will save you costs
Human language: We will make profit while you think you're saving the costs
Amazontalk: You can build on <product name> to analyze complex documents...
Human language: There is no product, just some DIY tools.
Amazontalk: Provides the intelligence and flexibility
Human language: We will charge your credit card in multiple obscure ways, and we'll be smart about it
Yeah but even then they won't describe it using the same sort of language that everyone else developing these things does. How many parameters? What kind of corpus was it trained on? MoE, single model, or something else? Will the weights be available?
It doesn't even use the words "LLM", "multimodal" or "transformer" which are clearly the most relevant terms here... "foundation model" isn't wrong but it's also the most abstract way to describe it.
Once upon a time there were (and still are) mainframes (and SAP is similar in this respect). These insular systems came with their own tools, their own ecosystem, their own terminology, their own certifications, etc. And you could rent compute & co on them.
If you think of clouds as being cross continent mainframes, a lot more things make a more sense.
No audio support: The models are currently trained to process and understand video content solely based on the visual information in the video. They do not possess the capability to analyze or comprehend any audio components that are present in the video.
This is blowing my mind. gemini-1.5-flash accidentally knows how to transcribe amazingly well but it is -very- hard to figure out how to use it well and now Amazon comes out with a gemini flash like model and it explicitly ignores audio. It is so clear that multi-modal audio would be easy for these models but it is like they are purposefully holding back releasing it/supporting it. This has to be a strategic decision to not attach audio. Probably because the margins on ASR are too high to strip with a cheap LLM. I can only hope Meta will drop a mult-modal audio model to force this soon.
They also announced speech to speech and any to any models for early next year. I think you are underestimating the effort required to release 5 competitive models at the same time.
'better' is always a loaded term with ASR. Gemini 1.5 flash can transcribe for 0.01/hour of audio and gives strong results. If you want timing and speaker info you need to use the previous version and a -lot- of tweaking of the prompt or else it will hallucinate the timing info. Give it a try. It may be a lot better for your use case.
Setting up AWS so you can try it via Amazon Bedrock API is a hassle, so I made a step-by-step guide: https://ndurner.github.io/amazon-nova. It's 14+ steps!
This is a guide for the casual observer who wants to try things out, given that getting started with other AI platforms is so much more straightforward. It's all open source, with transparent hosting, catering to any remaining concerns someone interested in exactly that may have.
If you're already in the AWS ecosystem or have worked in it, it's no problem. If you're used to "make OpenAI account, add credit card, copy/paste API key" it can be a bit daunting.
AWS does not use the exact same authn/authz/identity model or terminology as other providers, and for people familiar with other models, it's pretty non-trivial to adapt to. I recently posted a rant about this to https://www.reddit.com/r/aws/comments/1geczoz/the_aws_iam_id...
Personally I am more familiar with directly using API keys or auth tokens than AWS's IAM users (which are more similar to what I'd call "service accounts").
Setting up Azure LLM access is a similar hellish process. I learned after several days that I had to look at the actual endpoint URL to determine how to set the “deployment name” and “version” etc.
Nice! FWIW, The only nova model I see on the HuggingFace user space page is us.amazon.nova-pro-v1:0. I cloned the repo and added the other nova options in my clone, but you might want to add them to yours. (I would do a PR, but... I'm lazy and it's a trivial PR :-)).
I'm so confused on the value prop of Bedrock. It's seems like it wants to be guardrails for implementing RAG with popular models but it's not the least but intuitive. Is it actually better than setting up a custom pipeline?
The value I get is:
1) one platform, largely one API, several models,
2) includes Claude 3.5 "unlimited" pay-as-you-go,
3) part of our corporate infra (SSO, billing, ... corporate discussions are easier to have)
I'm using none to very little of the functionality they have added recently: not interested in RAG, not interested in Guardrails. Just Claude access, basically.
They missed a big opportunity by not offering eu-hosted versions.
Thats a big thing for complience. All LLM-providers reserve the right to save (up to 30days) and inspect/check prompts for their own complience.
However, this means that company data is potentionally sotred out-of-cloud. This is already problematic, even more so when the storage location is outside the EU.
I really wish they would left-justify instead of center-justify the pricing information so I'm not sitting here counting zeroes and trying to figure out how they all line up.
AWS is the golden goose. If Amazon doesn't tie up Anthropic, AWS customers who need a SOTA LLM will spend on Azure or GCP.
Think of Anthropic as the "premium" brand -- say, the Duracell of LLMs.
Nova is Amazon's march toward a house brand, Amazon Basics if you will, that minimizes the need for Duracell and slashes cost for customers.
Not to mention the potential benefits of improving Alexa, which has inexcusably languished despite popularizing AI services.
:Edited for readability
For example, look how many different types of database they offer (many achieve the same objective but different instantiation)
https://aws.amazon.com/products/?aws-products-all.sort-by=it...
Price is pretty good. I'm assuming 3.72 chars/tok on average though.. couldn't find that # anywhere.
Deleted Comment
I guess it depends on how sensitive your data is
Amazon is rapidly developing its own jargon such that you need to understand how Amazon talks about things (and its existing product lineup) before you can understand half of what they're saying about a new thing. The way they describe their products seems almost designed to obfuscate what they really do.
Every time they introduce something new, you have to click through several pages of announcements and docs just to ascertain what something actually is (an API, a new type of compute platform, a managed SaaS product?)
Amazontalk: You can build on <product name> to analyze complex documents... Human language: There is no product, just some DIY tools.
Amazontalk: Provides the intelligence and flexibility Human language: We will charge your credit card in multiple obscure ways, and we'll be smart about it
It doesn't even use the words "LLM", "multimodal" or "transformer" which are clearly the most relevant terms here... "foundation model" isn't wrong but it's also the most abstract way to describe it.
If you think of clouds as being cross continent mainframes, a lot more things make a more sense.
What’s the subnet of the security group of my user group for Aws lambda application in a specific environment that calls kms to get a secret for….
Deleted Comment
This is blowing my mind. gemini-1.5-flash accidentally knows how to transcribe amazingly well but it is -very- hard to figure out how to use it well and now Amazon comes out with a gemini flash like model and it explicitly ignores audio. It is so clear that multi-modal audio would be easy for these models but it is like they are purposefully holding back releasing it/supporting it. This has to be a strategic decision to not attach audio. Probably because the margins on ASR are too high to strip with a cheap LLM. I can only hope Meta will drop a mult-modal audio model to force this soon.
This is a guide for the casual observer who wants to try things out, given that getting started with other AI platforms is so much more straightforward. It's all open source, with transparent hosting, catering to any remaining concerns someone interested in exactly that may have.
Personally I am more familiar with directly using API keys or auth tokens than AWS's IAM users (which are more similar to what I'd call "service accounts").
https://github.com/aws-samples/bedrock-access-gateway
I'm using none to very little of the functionality they have added recently: not interested in RAG, not interested in Guardrails. Just Claude access, basically.
The one that really stands out is GroundUI-1K, where it beats the competition by 46%.
Nova Pro looks like it could be a SOTA-comparable model at a lower price point.
Thats a big thing for complience. All LLM-providers reserve the right to save (up to 30days) and inspect/check prompts for their own complience.
However, this means that company data is potentionally sotred out-of-cloud. This is already problematic, even more so when the storage location is outside the EU.
Legally we're only allowed to use text-embeddings-3-large at work because Azure don't host text-embeddings-3-small within a European region.