Explicitly calling out that they are not going to train on enterprise's data and SOC2 compliance is going to put a lot of the enterprises at ease and embrace ChatGPT in their business processes.
From our discussions with enterprises (trying to sell our LLM apps platform), we quickly learned how sensitive enterprises are when it comes to sharing their data. In many of these organizations, employees are already pasting a lot of sensitive data into ChatGPT unless access to ChatGPT itself is restricted. We know a few companies that ended up deploying chatbot-ui with Azure's OpenAI offering since Azure claims to not use user's data (https://learn.microsoft.com/en-us/legal/cognitive-services/o...).
We ended up adding support for Azure's OpenAI offering to our platform as well as open-source our engine to support on-prem deployments (LLMStack - https://github.com/trypromptly/LLMStack) to deal with the privacy concerns these enterprises have.
My company (Fortune 500 with 80,000 full time employees) has a policy that forbids the use of any AI or LLM tool. The big concern listed in the policy is that we may inadvertently use someone else’s IP from training data. So, our data going into the tool is one concern, but the other is our using something we are not authorized to use because the tool has it already in its data. How do you prove that that could never occur? The only way I can think of is to provide a comprehensive list of everything the tool was trained on.
It’s a legal unknown. There’s nothing more to it. Your employer has opted for one side of the coin flip, and it’s the risk averse-one. Any reasonably-sized org is going to be raising the same questions, but instead opting to reap the benefits and take on the legal risk, which is something organisations do all the time anyway.
For me that discussion is always hard to grasp. When a human would learn coding autodidacticly by reading source code, and later they would write new code — then they could only do so because they read licensed code. No one would ask for the license, right?
> The only way I can think of is to provide a comprehensive list of everything the tool was trained on.
There are some startups working in the space that essentially plan to do something like this. https://www.konfer.ai/aritificial-intelligence-trust-managem... is one I know of that is trying to solve this. They enable these foundation model providers to maintain an inventory of training sources so they can easily deal with coming regulations etc.
Microsoft/OpenAI are selling a service. They’re both reputable companies. If it turns out that they are reselling stolen data, are you really liable for purchasing it?
If you buy something that fell of a truck, then you are liable for purchasing stolen goods. But if it turns out that all the bananas in wall mart were stolen from cosco you’re not as a customer liable for theft.
Similarly, I don’t know if Clarkson Intelligence have purchased proper license for all the data they are reselling. Maybe they are also scraping some proprietary source and now you are using someone else’s IP.
Realistically you can prove that just as well as you can prove that employees aren't using ChatGPT via their cellphones.
There are also organizations that forbid the use of Stack overflow. As long as employees don't feel like you're holding back their career and skills by prohibiting them from using modern tools, and keep working there, hey. As long as you pay them enough to stay, people will put up with a lot, even if it hurts them.
To effectively sue you, I believe the plaintiff would have to prove the LLM you were using was trained on that IP and it was not in the public domain. Neither seems very doable.
Just curious, do they have bans on "traditional" online sources like Google search results, Wikipedia, and Stack Overflow?
From my view, copying information from Google search results isn't that much different from copying the response from ChatGPT.
Notably Stack Overflow's license is Creative Commons Attribution-ShareAlike, which I believe very people actually realize when copying snippets from there.
> but the other is our using something we are not authorized to use because the tool has it already in its data.
We won't know if this is legally sound until a company who isn't forbidding A.I. usage gets sued and they claim this as a defense. For all we know the court could determine that, as long as the content isn't directly regurgitated, it's seen as fair use of the input data.
So, how do you plan to commercialize your product? I have noticed tons of chatbot cloud-based app providers built on top of ChatGPT API, Azure API (ask users to provide their API key). Enterprises will still be very wary of putting their data on these multi-tenant platforms. I feel that even if there is encryption that's not going to be enough. This screams for virtual private LLM stacks for enterprises (the only way to fully isolate).
We have a cloud offering at https://trypromptly.com. We do offer enterprises the ability to host their own vector database to maintain control of their data. We also support interacting with open source LLMs from the platform. Enterprises can bring up https://github.com/go-skynet/LocalAI, run Llama or others and connect to them from their Promptly LLM apps.
We also provide support and some premium processors for enterprise on-prem deployments.
Usually that’s not a problem it just means adding OpenAI as a data processor (at least under ISO 27017). There’s a difference between sharing data for commercial purposes (which is usually verboten), vs for data-processing purposes.
I've been maintaining SOC2 certification for multiple years, and I'm here to say that it's largely performative and an ineffective indicator of security posture.
The SOC2 framework is complex and compliance can be expensive. This can lead organizations to focus on ticking the boxes rather than implementing meaningful security controls.
SOC2 is not a good universal metric for understanding an organization's security culture. It's frightening that this is the best we have for now.
Will be doing a show HN for https://proc.gg, a generative AI platform I've built during my sabbatical.
I personally believe that in addition to OpenAI's offering, the ability to swap to an open source model e.g. Llama-2 is the way to go for enterprise offerings in order to get full control.
Azures ridiculous agreement likely put a lot of orgs off. They also shouldn't have tried to "improve" upon OpenAI's APIs. OpenAI's APIs are a little under thought (particularly fine tuning) but so what?
Non-use of enterprise data for training models is table-stakes for enterprise ML products. Google does the same thing, for example.
They'll want to climb the compliance ladder to be considered in more highly-regulated industries. I don't think they're quite HIPAA-compliant yet. The next thing after that is probably in-transit geofencing, so the hardware used by an institution reside in a particular jurisdiction. This stuff seems boring but it's an easy way to scale the addressable market.
Though at this point, they are probably simply supply-limited. Just serving the first wave will keep their capacity at a maximum.
(I do wonder if they'll start offering batch services that can run when the enterprise employees are sleeping...)
I thought they already didn't use input data from the API to train; that it was only the consumer-facing ChatGPT product from which they'd use the data for training. It is opt-in for contributing inputs via API.
The ChatGPT model has violated pretty much all open source licenses (including MIT license which needs attribution. Show me one single OSS project's license attribution before arguing please.) and is standing still. With the backing of microsoft, I am confused. What will happen if they violate their promise and train data selectively from competitors or potential small companies?
What is actually stopping them? Most companies won't have the fire power to go against microsoft backed openai. How can we ensure that they can't violate this? How can they be practically held accountable?
This as far as I am concerned is "Trust me bro!". How is it not otherwise?
> The ChatGPT model has violated pretty much all open source licenses
Are you claiming this because they used copyrighted material as training data? If so, I think you're starting from the wrong point.
Please correct me if I'm wrong, but last I heard using copyrighted data is pretty murky waters legally and they're operating in a gray area. Additionally, I don't think many open source licenses explicitly forbid using their code as training data. The issue isn't just that most other companies don't have the resources to go up against Microsoft/OpenAI, it's that even if they did, it isn't clear whether the courts would find that Microsoft/OpenAI did anything wrong.
I'm not saying that I side with Microsoft/OpenAI in this debate, but I just don't think this is as clear cut as you're making it seem.
> Are you claiming this because they used copyrighted material as training data? If so, I think you're starting from the wrong point.
All open source license comes under copyright law. It means if they violate the OSS license, the license is void and the tech/material becomes copyright protected. So yes, it would mean that it is trained on copyrighted material.
> Additionally, I don't think many open source licenses explicitly forbid using their code as training data.
It doesn't forbid. For example, permissive license like MIT can be used to train LLM's if they are in compliance. The only requirement when you train on a MIT licensed codebase is that you need to provide attribution. It is one of the easiest license to comply. It means, you just need to copy paste the copyright notice. The below is the MIT license of Emberjs.
Copyright (c) 2011 Yehuda Katz, Tom Dale and Ember.js contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
This copyright notice needs to be somewhere in ChatGPT's website/product to be in compliance with MIT license. If it is not, MIT license is void and you are violating the license. The end result is you are training on copyrighted material. I am more than happy to be corrected if you could find me any single OSS license attribution shown somewhere for training the openai model.
Also, this can be still be fixed by adding the attribution for the code that is trained on. THIS IS MY ARGUMENT. The absolute ignorance and arrogance is their motivation and agenda.
Which is why I am asking, WHAT IS STOPPING THEM FROM VIOLATING THEIR OWN TERMS AND CONDITIONS FOR CHATGPT ENTERPRISE?
US copyright/IP management is such a shitsh*w. On one hand you can get sued by patent trolls who own the patent for 'button that turns things on' or get your video delisted for recording at a mall where some copyrighted music is playing in the background, on the other hand, you get people arguing that scraping code and websites with proprietary licences is 'fair use'
Taking this from a different perspective, let's say that ChatGPT, CodePilot, or similar service gets trained on Windows source code. Then a WINE developer uses ChatGPT or CodePilot to implement one of the methods. Is WINE then liable for including Windows proprietary source code in their codebase even if they have never seen that code.
The same would apply to any other application. What if company A uses code from company B via ChatGPT/CodePilot because company B's code was used as training data? Imagine a startup database company using Oracle's database code through use of this technology.
And if a proprietary company accidentally uses GPL code through these tools, and the GPL project can prove that use, then the proprietary company will be forced to open source their entire application.
> What is actually stopping them? Most companies won't have the fire power to go against microsoft backed openai.
Microsoft/Amazon/Google already have competitor's data in their cloud. They could even fake encryption to get all the customer's disk access. Also most employees use google workspace or office 365 cloud to store and share confidential files. How is different with OpenAI that makes it any more worrying?
> For all enterprise customers, it offers:
> Customer prompts and company data are not used for training OpenAI models.
> Unlimited access to advanced data analysis (formerly known as Code Interpreter)
> 32k token context windows for 4x longer inputs, files, or follow-ups
I'd thought all those had been available for non enterprise customers, but maybe I was wrong, or maybe something changed.
" We do not train on your business data or conversations, and our models don’t learn from your usage. ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. "
Which part of that is new, because I was pretty sure they were saying "we do not train on your business data or conversations, and our models don’t learn from your usage" already. Maybe the SOC 2 and encryption is new?
>" We do not train on your business data or conversations, and our models don’t learn from your usage. ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. "
That's great. But can customer prompts and company data be resold to data brokers?
But, can they provide a comprehensive dump of all data it was trained on that we can examine? Otherwise my company may end up using IP that belongs to someone else.
It's exactly opposite. The entire point of an enterprise option would be that you DO train it on corporate data, securely. So the #1 feature is actually missing, yet is announced as in the works.
ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. Our new admin console lets you manage team members easily and offers domain verification, SSO, and usage insights, allowing for large-scale deployment into enterprise.
I think this will have a solid product-market-fit. The product (ChatGPT) was ready but not enterprise. Now it is. They will get a lot of sales leads.
Just the SOC2 bit will generate revenue… If your organization is SOC2 compliant, using other services that are also compliant is a whole lot easier than risking having your SOC2 auditor spend hours digging into their terms and policies.
I believe the API (chat completions) has been private for a while now. ChatGPT (the chat application run by OpenAI on their chat models) has continued to be used for training… I believe this is why it’s such a bargain for consumers. This announcement allows businesses to let employees use ChatGPT with fewer data privacy concerns.
What about prompt input and response output retention for x days for abuse monitoring? does it not do that for enterprise? For Microsoft Azure's OpenAI service, you have to get a waiver to ensure that nothing is retained.
I'm going to see if the word "Enterprise" convinces my organization to allow us to use ChatGPT with our actual codebase, which is currently against our rules.
- GPT-4 API: has max 8K tokens (for most users atm)
- GPT-3.5 API: has max 16K tokens
I'd consider the 32K GPT-4 context the most valuable feature. In my opinion OpenAI shouldn't discriminate in favor of large enterprises. It should be equaly available to normal (paying) customers.
Having conversations saved to go back to like in the default setting on Pro, that's disabled when a Pro user turns on the privacy setting, is another big difference.
Interesting, but I am a bit disappointed that this release doesn't include fine-tuning on an enterprise corpus of documents. This only looks like a slightly more convenient and privacy-friendly version of ChatGPT. Or am I missing something?
At the bottom, in their coming soon section: "Customization: Securely extend ChatGPT’s knowledge with your company data by connecting the applications you already use"
I saw it, but it only mentions "applications" (whatever that means) and not bare documents. Does this mean companies might be able to upload, say, PDFs, and fine-tune the model on that?
Azure-hosted GPT already lets you "upload your own documents" in their playground; it seems to be similar to how ChatGPT GPT-4 Code Interpreter handles file uploads.
You don't fine-tune on a corpus of documents to give the model knowledge, you use retrieval.
They support uploading documents to it for that via that code interpreter, and they're adding connectors to applications where the documents live, not sure what more you're expecting.
Yes, but what if they are very large documents that exceed the maximum context size, say, a 200-page PDF? In that case won't you be forced to do some form of fine-tuning, in order to avoid a very slow/computationally expensive on-the-fly retrieval?
Haha i also thought about that Y Combinator video. Yep, their prediction didn't age well and it's becoming clear that openAI is actually a direct competitor to most of the startups that are using their api. Most "chat your own data" startups will be killed by this move.
No different than Apple, then. A lot of value is provided to customers by providing these features through a stable organization not likely to shutter within 6 months, like these startup "ChatGPT Wrappers". I hope that they are able to make a respectable sum and pivot.
If your entire startup was just providing a UI on top of the ChatGPT API, it probably wasn't that valuable to begin with and shutting it down won't be a meaningful loss to the industry overall.
There's a typical presumed business intuition that any large company will confer business to a host of "satellite companies" who offer some offshoot of the product's value proposition but catered to a niche sector. Most of these are however just "OpenAI API + a prefix prompt + user interface + marketing". The issue is (which has been brought up since the release of the GPT-3 API 3 years ago) that no startup can offer much more value than the API alone offers, and thus it's easier, comparatively, than in analogous cases of this type of startup model with other larger companies in the past, for OpenAI to capitalize on this business
This has been the weirdest part of the current wave of AI hype, the idea that you can build some kind of real business on top of somebody else's tech which is doing 99.9% of the work. There are hard limits on how much value you can add.
If you want to build something uniquely useful, you probably have to do your own training at least.
Any startup that is using ChatGPT under the hood is just doing market research for OpenAI for free. The same happened when people started experimented with GPT3 for code completion, right before being replaced by Copilot.
If you want to build an AI start-up and need a LLM, you must use Llama or another model than you can control and host yourself, anything else is basically suicide.
>Any startup that is using ChatGPT under the hood is just doing market research for OpenAI for free
It's not free if you have paying clients.
> If you want to build an AI start-up and need a LLM, you must use Llama or another model than you can control and host yourself, anything else is basically suicide.
You're still doing market research for OpenAI. Just because you aren't using their model doesn't mean they can't copy your UX. Prompts are not viable trade secrets after all.
"Unlimited access to advanced data analysis (formerly known as Code Interpreter)"
Code Interpreter was a pretty bad name (not exactly meaningful to anyone who hasn't studied computer science), but what's the new name? "advanced data analysis" isn't a name, it's a feature in a bullet point.
Also I'd heard anecdotally on the internet (Ethan Mollick's twitter I think) that 'code interpreter' was better than GPT 4 even for tasks that weren't code interpretation. Like it was more like GPT 4.5. Maybe it was an experimental preview and only enterprises are allowed to use it now. I never had access anyway.
I still have access in my $20/m non-Enterprise Pro account, though it has indeed just updated its name from Code Interpreter to Advanced Data Analysis. I haven't personally noticed it being any better than standard GPT4 even for generation of code that can't be run by it (ie non-Python code).
Seemed like a great project. Hope to see it come back!
There are some great open-source projects in this space – not quite the same – many are focused on local LLMs like Llama2 or Code Llama which was released last week:
The UI is relatively mature, as it predates llama. It includes upstream llama.cpp PRs, integrated AI horde support, lots of sampling tuning knobs, easy gpu/cpu offloading, and its basically dependency free.
Ollama is very neat. Given how compressible the models are is there any work being done on using them in some kind of compressed format other than reducing the word size?
All activity stopped a couple of weeks ago. It was extremely active and had close to 5 thousand stars/watch events before it was removed/made private. Unfortunately I never got around to indexing the code. You can find the insights at https://devboard.gitsense.com/microsoft/azurechatgpt
It looks like your account has been using HN primarily (in fact exclusively) for promotion for quite some time. I'm not sure how we didn't notice this before but someone finally complained, and they're right: you can't use HN this way. Note this, from https://news.ycombinator.com/newsguidelines.html: Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.
Normally we ban accounts that do nothing but promote their own links, but as you've been an HN member for years, I'm not going to ban you, but please do stop doing this! We want people to use HN to read and post things that they personally find intellectually interesting—not just to promote something.
If I go back far enough (a couple hundred comments are so), it's clear that you used to use HN in the intended spirit, so this should be fairly easy to fix.
If it was transferred, the /microsoft link would have redirected to it. Instead, it's the git commits re-uploaded to another repo - so the commits are the same but it didn't transfer past issues, discussions or PRs https://github.com/matijagrcic/azurechatgpt/pulls?q=
Based on past discussion, my guess is it was removed because the name and description were wildly misleading. People starred it because it was a repo published by Microsoft called "azurechatgpt", but all it contained was a sample frontend UI for a chat bot which could talk to the OpenAI API.
From our discussions with enterprises (trying to sell our LLM apps platform), we quickly learned how sensitive enterprises are when it comes to sharing their data. In many of these organizations, employees are already pasting a lot of sensitive data into ChatGPT unless access to ChatGPT itself is restricted. We know a few companies that ended up deploying chatbot-ui with Azure's OpenAI offering since Azure claims to not use user's data (https://learn.microsoft.com/en-us/legal/cognitive-services/o...).
We ended up adding support for Azure's OpenAI offering to our platform as well as open-source our engine to support on-prem deployments (LLMStack - https://github.com/trypromptly/LLMStack) to deal with the privacy concerns these enterprises have.
So why do we care from where LLMs learn?
There are some startups working in the space that essentially plan to do something like this. https://www.konfer.ai/aritificial-intelligence-trust-managem... is one I know of that is trying to solve this. They enable these foundation model providers to maintain an inventory of training sources so they can easily deal with coming regulations etc.
Microsoft/OpenAI are selling a service. They’re both reputable companies. If it turns out that they are reselling stolen data, are you really liable for purchasing it?
If you buy something that fell of a truck, then you are liable for purchasing stolen goods. But if it turns out that all the bananas in wall mart were stolen from cosco you’re not as a customer liable for theft.
Similarly, I don’t know if Clarkson Intelligence have purchased proper license for all the data they are reselling. Maybe they are also scraping some proprietary source and now you are using someone else’s IP.
Realistically you can prove that just as well as you can prove that employees aren't using ChatGPT via their cellphones.
There are also organizations that forbid the use of Stack overflow. As long as employees don't feel like you're holding back their career and skills by prohibiting them from using modern tools, and keep working there, hey. As long as you pay them enough to stay, people will put up with a lot, even if it hurts them.
To effectively sue you, I believe the plaintiff would have to prove the LLM you were using was trained on that IP and it was not in the public domain. Neither seems very doable.
From my view, copying information from Google search results isn't that much different from copying the response from ChatGPT.
Notably Stack Overflow's license is Creative Commons Attribution-ShareAlike, which I believe very people actually realize when copying snippets from there.
We won't know if this is legally sound until a company who isn't forbidding A.I. usage gets sued and they claim this as a defense. For all we know the court could determine that, as long as the content isn't directly regurgitated, it's seen as fair use of the input data.
i.e. Without ChatGPT an employee could still copy and paste something from somewhere. ChatGPT actually doesn't change the equation at all.
We also provide support and some premium processors for enterprise on-prem deployments.
Except many companies deal with data of other companies, and these companies do not allow the sharing of data.
The SOC2 framework is complex and compliance can be expensive. This can lead organizations to focus on ticking the boxes rather than implementing meaningful security controls.
SOC2 is not a good universal metric for understanding an organization's security culture. It's frightening that this is the best we have for now.
I personally believe that in addition to OpenAI's offering, the ability to swap to an open source model e.g. Llama-2 is the way to go for enterprise offerings in order to get full control.
"They're huge pussies when it comes to security" - Jan the Man[0]
[0] https://memes.getyarn.io/yarn-clip/b3fc68bb-5b53-456d-aec5-4...
They'll want to climb the compliance ladder to be considered in more highly-regulated industries. I don't think they're quite HIPAA-compliant yet. The next thing after that is probably in-transit geofencing, so the hardware used by an institution reside in a particular jurisdiction. This stuff seems boring but it's an easy way to scale the addressable market.
Though at this point, they are probably simply supply-limited. Just serving the first wave will keep their capacity at a maximum.
(I do wonder if they'll start offering batch services that can run when the enterprise employees are sleeping...)
OpenAI offers baa to select customers.
https://help.openai.com/en/articles/5722486-how-your-data-is...
That said, for enterprises that use the consumer product internally, it would make sense to pay to opt-out from that input being used.
What is actually stopping them? Most companies won't have the fire power to go against microsoft backed openai. How can we ensure that they can't violate this? How can they be practically held accountable?
This as far as I am concerned is "Trust me bro!". How is it not otherwise?
Are you claiming this because they used copyrighted material as training data? If so, I think you're starting from the wrong point.
Please correct me if I'm wrong, but last I heard using copyrighted data is pretty murky waters legally and they're operating in a gray area. Additionally, I don't think many open source licenses explicitly forbid using their code as training data. The issue isn't just that most other companies don't have the resources to go up against Microsoft/OpenAI, it's that even if they did, it isn't clear whether the courts would find that Microsoft/OpenAI did anything wrong.
I'm not saying that I side with Microsoft/OpenAI in this debate, but I just don't think this is as clear cut as you're making it seem.
All open source license comes under copyright law. It means if they violate the OSS license, the license is void and the tech/material becomes copyright protected. So yes, it would mean that it is trained on copyrighted material.
> Additionally, I don't think many open source licenses explicitly forbid using their code as training data.
It doesn't forbid. For example, permissive license like MIT can be used to train LLM's if they are in compliance. The only requirement when you train on a MIT licensed codebase is that you need to provide attribution. It is one of the easiest license to comply. It means, you just need to copy paste the copyright notice. The below is the MIT license of Emberjs.
Copyright (c) 2011 Yehuda Katz, Tom Dale and Ember.js contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
This copyright notice needs to be somewhere in ChatGPT's website/product to be in compliance with MIT license. If it is not, MIT license is void and you are violating the license. The end result is you are training on copyrighted material. I am more than happy to be corrected if you could find me any single OSS license attribution shown somewhere for training the openai model.
Also, this can be still be fixed by adding the attribution for the code that is trained on. THIS IS MY ARGUMENT. The absolute ignorance and arrogance is their motivation and agenda.
Which is why I am asking, WHAT IS STOPPING THEM FROM VIOLATING THEIR OWN TERMS AND CONDITIONS FOR CHATGPT ENTERPRISE?
The same would apply to any other application. What if company A uses code from company B via ChatGPT/CodePilot because company B's code was used as training data? Imagine a startup database company using Oracle's database code through use of this technology.
And if a proprietary company accidentally uses GPL code through these tools, and the GPL project can prove that use, then the proprietary company will be forced to open source their entire application.
Microsoft/Amazon/Google already have competitor's data in their cloud. They could even fake encryption to get all the customer's disk access. Also most employees use google workspace or office 365 cloud to store and share confidential files. How is different with OpenAI that makes it any more worrying?
" We do not train on your business data or conversations, and our models don’t learn from your usage. ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. "
That's great. But can customer prompts and company data be resold to data brokers?
ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. Our new admin console lets you manage team members easily and offers domain verification, SSO, and usage insights, allowing for large-scale deployment into enterprise.
I think this will have a solid product-market-fit. The product (ChatGPT) was ready but not enterprise. Now it is. They will get a lot of sales leads.
That's great. But can customer prompts and company data be resold to data brokers?
- GPT-4 (ChatGPT Plus): has max 4K tokens ?
- GPT-4 API: has max 8K tokens (for most users atm)
- GPT-3.5 API: has max 16K tokens
I'd consider the 32K GPT-4 context the most valuable feature. In my opinion OpenAI shouldn't discriminate in favor of large enterprises. It should be equaly available to normal (paying) customers.
Having conversations saved to go back to like in the default setting on Pro, that's disabled when a Pro user turns on the privacy setting, is another big difference.
This is borderline extortion, and it's hilarious to witness as someone who doesn't have a dog in this fight.
I don't think they're removing all instances of your company from their existing data sources, which would make sense to call "borderline extortion".
They support uploading documents to it for that via that code interpreter, and they're adding connectors to applications where the documents live, not sure what more you're expecting.
Edit: spelling
TLDR: This might have just killed a LOT of startups
If you want to build something uniquely useful, you probably have to do your own training at least.
If you want to build an AI start-up and need a LLM, you must use Llama or another model than you can control and host yourself, anything else is basically suicide.
It's not free if you have paying clients.
> If you want to build an AI start-up and need a LLM, you must use Llama or another model than you can control and host yourself, anything else is basically suicide.
You're still doing market research for OpenAI. Just because you aren't using their model doesn't mean they can't copy your UX. Prompts are not viable trade secrets after all.
Code Interpreter was a pretty bad name (not exactly meaningful to anyone who hasn't studied computer science), but what's the new name? "advanced data analysis" isn't a name, it's a feature in a bullet point.
What a terrible name! They should have asked ChatGPT for suggestions.
https://github.com/microsoft/azurechatgpt
Past discussion:
https://news.ycombinator.com/item?id=37112741
There are some great open-source projects in this space – not quite the same – many are focused on local LLMs like Llama2 or Code Llama which was released last week:
- https://github.com/jmorganca/ollama (download & run LLMs locally - I'm a maintainer)
- https://github.com/simonw/llm (access LLMs from the cli - cloud and local)
- https://github.com/oobabooga/text-generation-webui (a web ui w/ different backends)
- https://github.com/ggerganov/llama.cpp (fast local LLM runner)
- https://github.com/go-skynet/LocalAI (has an openai-compatible api)
The UI is relatively mature, as it predates llama. It includes upstream llama.cpp PRs, integrated AI horde support, lots of sampling tuning knobs, easy gpu/cpu offloading, and its basically dependency free.
- https://github.com/trypromptly/LLMStack (build and run apps locally with LocalAI support - I'm a maintainer)
Full Disclosure: This is my tool
Normally we ban accounts that do nothing but promote their own links, but as you've been an HN member for years, I'm not going to ban you, but please do stop doing this! We want people to use HN to read and post things that they personally find intellectually interesting—not just to promote something.
If I go back far enough (a couple hundred comments are so), it's clear that you used to use HN in the intended spirit, so this should be fairly easy to fix.
https://github.com/matijagrcic/azurechatgpt
Deleted Comment