ChatGPT Enterprise - Readit News

Explicitly calling out that they are not going to train on enterprise's data and SOC2 compliance is going to put a lot of the enterprises at ease and embrace ChatGPT in their business processes.

From our discussions with enterprises (trying to sell our LLM apps platform), we quickly learned how sensitive enterprises are when it comes to sharing their data. In many of these organizations, employees are already pasting a lot of sensitive data into ChatGPT unless access to ChatGPT itself is restricted. We know a few companies that ended up deploying chatbot-ui with Azure's OpenAI offering since Azure claims to not use user's data (https://learn.microsoft.com/en-us/legal/cognitive-services/o...).

We ended up adding support for Azure's OpenAI offering to our platform as well as open-source our engine to support on-prem deployments (LLMStack - https://github.com/trypromptly/LLMStack) to deal with the privacy concerns these enterprises have.

irrational · 2 years ago

My company (Fortune 500 with 80,000 full time employees) has a policy that forbids the use of any AI or LLM tool. The big concern listed in the policy is that we may inadvertently use someone else’s IP from training data. So, our data going into the tool is one concern, but the other is our using something we are not authorized to use because the tool has it already in its data. How do you prove that that could never occur? The only way I can think of is to provide a comprehensive list of everything the tool was trained on.

tmpX7dMeXU · 2 years ago

It’s a legal unknown. There’s nothing more to it. Your employer has opted for one side of the coin flip, and it’s the risk averse-one. Any reasonably-sized org is going to be raising the same questions, but instead opting to reap the benefits and take on the legal risk, which is something organisations do all the time anyway.

fxnn · 2 years ago

For me that discussion is always hard to grasp. When a human would learn coding autodidacticly by reading source code, and later they would write new code — then they could only do so because they read licensed code. No one would ask for the license, right?

So why do we care from where LLMs learn?

ajhai · 2 years ago

> The only way I can think of is to provide a comprehensive list of everything the tool was trained on.

There are some startups working in the space that essentially plan to do something like this. https://www.konfer.ai/aritificial-intelligence-trust-managem... is one I know of that is trying to solve this. They enable these foundation model providers to maintain an inventory of training sources so they can easily deal with coming regulations etc.

wodenokoto · 2 years ago

Isn’t that a benefit of using a provider?

Microsoft/OpenAI are selling a service. They’re both reputable companies. If it turns out that they are reselling stolen data, are you really liable for purchasing it?

If you buy something that fell of a truck, then you are liable for purchasing stolen goods. But if it turns out that all the bananas in wall mart were stolen from cosco you’re not as a customer liable for theft.

Similarly, I don’t know if Clarkson Intelligence have purchased proper license for all the data they are reselling. Maybe they are also scraping some proprietary source and now you are using someone else’s IP.

fragmede · 2 years ago

> How do you prove that that could never occur?

Realistically you can prove that just as well as you can prove that employees aren't using ChatGPT via their cellphones.

There are also organizations that forbid the use of Stack overflow. As long as employees don't feel like you're holding back their career and skills by prohibiting them from using modern tools, and keep working there, hey. As long as you pay them enough to stay, people will put up with a lot, even if it hurts them.

sirspacey · 2 years ago

It’s an interesting question.

To effectively sue you, I believe the plaintiff would have to prove the LLM you were using was trained on that IP and it was not in the public domain. Neither seems very doable.

hnfong · 2 years ago

Just curious, do they have bans on "traditional" online sources like Google search results, Wikipedia, and Stack Overflow?

From my view, copying information from Google search results isn't that much different from copying the response from ChatGPT.

Notably Stack Overflow's license is Creative Commons Attribution-ShareAlike, which I believe very people actually realize when copying snippets from there.

judge2020 · 2 years ago

> but the other is our using something we are not authorized to use because the tool has it already in its data.

We won't know if this is legally sound until a company who isn't forbidding A.I. usage gets sued and they claim this as a defense. For all we know the court could determine that, as long as the content isn't directly regurgitated, it's seen as fair use of the input data.

onethought · 2 years ago

It's not logical, because how can the company prove that could never happen from 80,000 employees writing things?

i.e. Without ChatGPT an employee could still copy and paste something from somewhere. ChatGPT actually doesn't change the equation at all.

spullara · 2 years ago

They are stupid and don't understand risk vs reward.

mveertu · 2 years ago

So, how do you plan to commercialize your product? I have noticed tons of chatbot cloud-based app providers built on top of ChatGPT API, Azure API (ask users to provide their API key). Enterprises will still be very wary of putting their data on these multi-tenant platforms. I feel that even if there is encryption that's not going to be enough. This screams for virtual private LLM stacks for enterprises (the only way to fully isolate).

ajhai · 2 years ago

We have a cloud offering at https://trypromptly.com. We do offer enterprises the ability to host their own vector database to maintain control of their data. We also support interacting with open source LLMs from the platform. Enterprises can bring up https://github.com/go-skynet/LocalAI, run Llama or others and connect to them from their Promptly LLM apps.

We also provide support and some premium processors for enterprise on-prem deployments.

amelius · 2 years ago

> is going to put a lot of the enterprises at ease and embrace ChatGPT in their business processes.

Except many companies deal with data of other companies, and these companies do not allow the sharing of data.

clbrmbr · 2 years ago

Usually that’s not a problem it just means adding OpenAI as a data processor (at least under ISO 27017). There’s a difference between sharing data for commercial purposes (which is usually verboten), vs for data-processing purposes.

rr808 · 2 years ago

At the corp I work for Chat GPT (even bing) is blocked at the firewall. Hopefully now we'll be able to use it.

oneneptune · 2 years ago

I've been maintaining SOC2 certification for multiple years, and I'm here to say that it's largely performative and an ineffective indicator of security posture.

The SOC2 framework is complex and compliance can be expensive. This can lead organizations to focus on ticking the boxes rather than implementing meaningful security controls.

SOC2 is not a good universal metric for understanding an organization's security culture. It's frightening that this is the best we have for now.

eoproc · 2 years ago

Will be doing a show HN for https://proc.gg, a generative AI platform I've built during my sabbatical.

I personally believe that in addition to OpenAI's offering, the ability to swap to an open source model e.g. Llama-2 is the way to go for enterprise offerings in order to get full control.

osigurdson · 2 years ago

Azures ridiculous agreement likely put a lot of orgs off. They also shouldn't have tried to "improve" upon OpenAI's APIs. OpenAI's APIs are a little under thought (particularly fine tuning) but so what?

dools · 2 years ago

> we quickly learned how sensitive enterprises are when it comes to sharing their data

"They're huge pussies when it comes to security" - Jan the Man[0]

[0] https://memes.getyarn.io/yarn-clip/b3fc68bb-5b53-456d-aec5-4...

The ChatGPT model has violated pretty much all open source licenses (including MIT license which needs attribution. Show me one single OSS project's license attribution before arguing please.) and is standing still. With the backing of microsoft, I am confused. What will happen if they violate their promise and train data selectively from competitors or potential small companies?

What is actually stopping them? Most companies won't have the fire power to go against microsoft backed openai. How can we ensure that they can't violate this? How can they be practically held accountable?

This as far as I am concerned is "Trust me bro!". How is it not otherwise?

superfrank · 2 years ago

> The ChatGPT model has violated pretty much all open source licenses

Are you claiming this because they used copyrighted material as training data? If so, I think you're starting from the wrong point.

Please correct me if I'm wrong, but last I heard using copyrighted data is pretty murky waters legally and they're operating in a gray area. Additionally, I don't think many open source licenses explicitly forbid using their code as training data. The issue isn't just that most other companies don't have the resources to go up against Microsoft/OpenAI, it's that even if they did, it isn't clear whether the courts would find that Microsoft/OpenAI did anything wrong.

I'm not saying that I side with Microsoft/OpenAI in this debate, but I just don't think this is as clear cut as you're making it seem.

Gentil · 2 years ago

> Are you claiming this because they used copyrighted material as training data? If so, I think you're starting from the wrong point.

All open source license comes under copyright law. It means if they violate the OSS license, the license is void and the tech/material becomes copyright protected. So yes, it would mean that it is trained on copyrighted material.

> Additionally, I don't think many open source licenses explicitly forbid using their code as training data.

It doesn't forbid. For example, permissive license like MIT can be used to train LLM's if they are in compliance. The only requirement when you train on a MIT licensed codebase is that you need to provide attribution. It is one of the easiest license to comply. It means, you just need to copy paste the copyright notice. The below is the MIT license of Emberjs.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

This copyright notice needs to be somewhere in ChatGPT's website/product to be in compliance with MIT license. If it is not, MIT license is void and you are violating the license. The end result is you are training on copyrighted material. I am more than happy to be corrected if you could find me any single OSS license attribution shown somewhere for training the openai model.

Also, this can be still be fixed by adding the attribution for the code that is trained on. THIS IS MY ARGUMENT. The absolute ignorance and arrogance is their motivation and agenda.

Which is why I am asking, WHAT IS STOPPING THEM FROM VIOLATING THEIR OWN TERMS AND CONDITIONS FOR CHATGPT ENTERPRISE?

torginus · 2 years ago

US copyright/IP management is such a shitsh*w. On one hand you can get sued by patent trolls who own the patent for 'button that turns things on' or get your video delisted for recording at a mall where some copyrighted music is playing in the background, on the other hand, you get people arguing that scraping code and websites with proprietary licences is 'fair use'

rhdunn · 2 years ago

Taking this from a different perspective, let's say that ChatGPT, CodePilot, or similar service gets trained on Windows source code. Then a WINE developer uses ChatGPT or CodePilot to implement one of the methods. Is WINE then liable for including Windows proprietary source code in their codebase even if they have never seen that code.

The same would apply to any other application. What if company A uses code from company B via ChatGPT/CodePilot because company B's code was used as training data? Imagine a startup database company using Oracle's database code through use of this technology.

And if a proprietary company accidentally uses GPL code through these tools, and the GPL project can prove that use, then the proprietary company will be forced to open source their entire application.

YetAnotherNick · 2 years ago

> What is actually stopping them? Most companies won't have the fire power to go against microsoft backed openai.

Microsoft/Amazon/Google already have competitor's data in their cloud. They could even fake encryption to get all the customer's disk access. Also most employees use google workspace or office 365 cloud to store and share confidential files. How is different with OpenAI that makes it any more worrying?

yard2010 · 2 years ago

To my understanding this train has left the station, it's gonna take much more than lousy gray laws to stop it

> For all enterprise customers, it offers: > Customer prompts and company data are not used for training OpenAI models. > Unlimited access to advanced data analysis (formerly known as Code Interpreter) > 32k token context windows for 4x longer inputs, files, or follow-ups

pradn · 2 years ago

Non-use of enterprise data for training models is table-stakes for enterprise ML products. Google does the same thing, for example.

They'll want to climb the compliance ladder to be considered in more highly-regulated industries. I don't think they're quite HIPAA-compliant yet. The next thing after that is probably in-transit geofencing, so the hardware used by an institution reside in a particular jurisdiction. This stuff seems boring but it's an easy way to scale the addressable market.

Though at this point, they are probably simply supply-limited. Just serving the first wave will keep their capacity at a maximum.

(I do wonder if they'll start offering batch services that can run when the enterprise employees are sleeping...)

ivalm · 2 years ago

> don't think they're quite HIPAA-compliant yet

OpenAI offers baa to select customers.

sheeshkebab · 2 years ago

Fedramp? High?

chartpath · 2 years ago

I thought they already didn't use input data from the API to train; that it was only the consumer-facing ChatGPT product from which they'd use the data for training. It is opt-in for contributing inputs via API.

https://help.openai.com/en/articles/5722486-how-your-data-is...

That said, for enterprises that use the consumer product internally, it would make sense to pay to opt-out from that input being used.

ftxbro · 2 years ago

I'd thought all those had been available for non enterprise customers, but maybe I was wrong, or maybe something changed.

brabel · 2 years ago

I think the real feature is this:

" We do not train on your business data or conversations, and our models don’t learn from your usage. ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. "

Which part of that is new, because I was pretty sure they were saying "we do not train on your business data or conversations, and our models don’t learn from your usage" already. Maybe the SOC 2 and encryption is new?

hammock · 2 years ago

>" We do not train on your business data or conversations, and our models don’t learn from your usage. ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. "

That's great. But can customer prompts and company data be resold to data brokers?

But, can they provide a comprehensive dump of all data it was trained on that we can examine? Otherwise my company may end up using IP that belongs to someone else.

dahwolf · 2 years ago

It's exactly opposite. The entire point of an enterprise option would be that you DO train it on corporate data, securely. So the #1 feature is actually missing, yet is announced as in the works.

bg24 · 2 years ago

I think you missed this part:

ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. Our new admin console lets you manage team members easily and offers domain verification, SSO, and usage insights, allowing for large-scale deployment into enterprise.

I think this will have a solid product-market-fit. The product (ChatGPT) was ready but not enterprise. Now it is. They will get a lot of sales leads.

ttul · 2 years ago

Just the SOC2 bit will generate revenue… If your organization is SOC2 compliant, using other services that are also compliant is a whole lot easier than risking having your SOC2 auditor spend hours digging into their terms and policies.

_jab · 2 years ago

“all conversations are encrypted … at rest” - why do conversations even need to _exist_ at rest? Seems sus to me

cowthulhu · 2 years ago

I believe the API (chat completions) has been private for a while now. ChatGPT (the chat application run by OpenAI on their chat models) has continued to be used for training… I believe this is why it’s such a bargain for consumers. This announcement allows businesses to let employees use ChatGPT with fewer data privacy concerns.

whimsicalism · 2 years ago

You can turn off history & training on your data

_boffin_ · 2 years ago

What about prompt input and response output retention for x days for abuse monitoring? does it not do that for enterprise? For Microsoft Azure's OpenAI service, you have to get a waiver to ensure that nothing is retained.

>Customer prompts and company data are not used for training OpenAI models.

nsxwolf · 2 years ago

I'm going to see if the word "Enterprise" convinces my organization to allow us to use ChatGPT with our actual codebase, which is currently against our rules.

SanderNL · 2 years ago

No copilot too?

bartimus · 2 years ago

Last I checked:

- GPT-4 (ChatGPT Plus): has max 4K tokens ?

- GPT-4 API: has max 8K tokens (for most users atm)

- GPT-3.5 API: has max 16K tokens

I'd consider the 32K GPT-4 context the most valuable feature. In my opinion OpenAI shouldn't discriminate in favor of large enterprises. It should be equaly available to normal (paying) customers.

raylad · 2 years ago

If you pick ChatGPT with GPT-4 and select the Plugins version I believe the context window is 8K.

BoorishBears · 2 years ago

It is pretty much is if you use OpenAI via Azure, or you're large enough and talk to their sales (the 2x faster is dedicated capacity I'm guessing)

saliagato · 2 years ago

everything but 32k version and 2x speed is the same as the consumer platform

swores · 2 years ago

https://news.ycombinator.com/item?id=37298864

Having conversations saved to go back to like in the default setting on Pro, that's disabled when a Pro user turns on the privacy setting, is another big difference.

jwpapi · 2 years ago

32k is available via API

SantalBlush · 2 years ago

This is borderline extortion, and it's hilarious to witness as someone who doesn't have a dog in this fight.

Not really, they want some users to give them conversation history for training purposes and offer cheaper access to people willing to provide that.

jacquesm · 2 years ago

As long as they provide free Enterprise access for all those whose data they already stole...

Dylan16807 · 2 years ago

I assume that means they don't train on company data that is sent through ChatGPT Enterprise.

I don't think they're removing all instances of your company from their existing data sources, which would make sense to call "borderline extortion".

fdeage · 2 years ago

Interesting, but I am a bit disappointed that this release doesn't include fine-tuning on an enterprise corpus of documents. This only looks like a slightly more convenient and privacy-friendly version of ChatGPT. Or am I missing something?

idopmstuff · 2 years ago

At the bottom, in their coming soon section: "Customization: Securely extend ChatGPT’s knowledge with your company data by connecting the applications you already use"

xyst · 2 years ago

Great now chatgpt can train on outdated documents from the 2000s, provide more confusion to new people, and give us more headaches

I saw it, but it only mentions "applications" (whatever that means) and not bare documents. Does this mean companies might be able to upload, say, PDFs, and fine-tune the model on that?

grrowl · 2 years ago

Azure-hosted GPT already lets you "upload your own documents" in their playground; it seems to be similar to how ChatGPT GPT-4 Code Interpreter handles file uploads.

You don't fine-tune on a corpus of documents to give the model knowledge, you use retrieval.

They support uploading documents to it for that via that code interpreter, and they're adding connectors to applications where the documents live, not sure what more you're expecting.

Yes, but what if they are very large documents that exceed the maximum context size, say, a 200-page PDF? In that case won't you be forced to do some form of fine-tuning, in order to avoid a very slow/computationally expensive on-the-fly retrieval?

Edit: spelling

gopher_space · 2 years ago

Retrieval Augmented Generation would be something to check out. There was a good intro on the subject posted here a week or 3 ago.

internet101010 · 2 years ago

This is one of the reasons we decided to go with Databricks. Embed all the things for RAG during ETL.

warthog · 2 years ago

Well the message in this video certainly did not age well: https://www.youtube.com/watch?v=smHw9kEwcgM

TLDR: This might have just killed a LOT of startups

siva7 · 2 years ago

Haha i also thought about that Y Combinator video. Yep, their prediction didn't age well and it's becoming clear that openAI is actually a direct competitor to most of the startups that are using their api. Most "chat your own data" startups will be killed by this move.

polishdude20 · 2 years ago

Yeah like, if OpenAI can engineer chatGPT, they can sure as hell engineer a lot of the apps built on top of chatGPT out there.

ZoomerCretin · 2 years ago

No different than Apple, then. A lot of value is provided to customers by providing these features through a stable organization not likely to shutter within 6 months, like these startup "ChatGPT Wrappers". I hope that they are able to make a respectable sum and pivot.

reportgunner · 2 years ago

In 2023 you get to pay to be the (premium) product.

lopkeny12ko · 2 years ago

If your entire startup was just providing a UI on top of the ChatGPT API, it probably wasn't that valuable to begin with and shutting it down won't be a meaningful loss to the industry overall.

atleastoptimal · 2 years ago

There's a typical presumed business intuition that any large company will confer business to a host of "satellite companies" who offer some offshoot of the product's value proposition but catered to a niche sector. Most of these are however just "OpenAI API + a prefix prompt + user interface + marketing". The issue is (which has been brought up since the release of the GPT-3 API 3 years ago) that no startup can offer much more value than the API alone offers, and thus it's easier, comparatively, than in analogous cases of this type of startup model with other larger companies in the past, for OpenAI to capitalize on this business

TillE · 2 years ago

This has been the weirdest part of the current wave of AI hype, the idea that you can build some kind of real business on top of somebody else's tech which is doing 99.9% of the work. There are hard limits on how much value you can add.

If you want to build something uniquely useful, you probably have to do your own training at least.

littlestymaar · 2 years ago

Any startup that is using ChatGPT under the hood is just doing market research for OpenAI for free. The same happened when people started experimented with GPT3 for code completion, right before being replaced by Copilot.

If you want to build an AI start-up and need a LLM, you must use Llama or another model than you can control and host yourself, anything else is basically suicide.

sebzim4500 · 2 years ago

>Any startup that is using ChatGPT under the hood is just doing market research for OpenAI for free

It's not free if you have paying clients.

> If you want to build an AI start-up and need a LLM, you must use Llama or another model than you can control and host yourself, anything else is basically suicide.

You're still doing market research for OpenAI. Just because you aren't using their model doesn't mean they can't copy your UX. Prompts are not viable trade secrets after all.

simonw · 2 years ago

"Unlimited access to advanced data analysis (formerly known as Code Interpreter)"

Code Interpreter was a pretty bad name (not exactly meaningful to anyone who hasn't studied computer science), but what's the new name? "advanced data analysis" isn't a name, it's a feature in a bullet point.

Also I'd heard anecdotally on the internet (Ethan Mollick's twitter I think) that 'code interpreter' was better than GPT 4 even for tasks that weren't code interpretation. Like it was more like GPT 4.5. Maybe it was an experimental preview and only enterprises are allowed to use it now. I never had access anyway.

I still have access in my $20/m non-Enterprise Pro account, though it has indeed just updated its name from Code Interpreter to Advanced Data Analysis. I haven't personally noticed it being any better than standard GPT4 even for generation of code that can't be run by it (ie non-Python code).

swyx · 2 years ago

GPT 4.5 concept was from latent space! https://www.latent.space/p/code-interpreter#details

z7 · 2 years ago

In my account it now says "Advanced Data Analysis" instead of "Code Interpreter". Looks like it is the new name.

skybrian · 2 years ago

I had the old name, reloaded the page, and got the new name.

What a terrible name! They should have asked ChatGPT for suggestions.

vyrotek · 2 years ago

Any correlation between this and the sudden disappearance of this repo?

https://github.com/microsoft/azurechatgpt

Past discussion:

https://news.ycombinator.com/item?id=37112741

jmorgan · 2 years ago

Seemed like a great project. Hope to see it come back!

There are some great open-source projects in this space – not quite the same – many are focused on local LLMs like Llama2 or Code Llama which was released last week:

- https://github.com/jmorganca/ollama (download & run LLMs locally - I'm a maintainer)

- https://github.com/simonw/llm (access LLMs from the cli - cloud and local)

- https://github.com/oobabooga/text-generation-webui (a web ui w/ different backends)

- https://github.com/ggerganov/llama.cpp (fast local LLM runner)

- https://github.com/go-skynet/LocalAI (has an openai-compatible api)

brucethemoose2 · 2 years ago

Also https://github.com/LostRuins/koboldcpp

The UI is relatively mature, as it predates llama. It includes upstream llama.cpp PRs, integrated AI horde support, lots of sampling tuning knobs, easy gpu/cpu offloading, and its basically dependency free.

Adding to the list:

- https://github.com/trypromptly/LLMStack (build and run apps locally with LocalAI support - I'm a maintainer)

Ollama is very neat. Given how compressible the models are is there any work being done on using them in some kind of compressed format other than reducing the word size?

sdesol · 2 years ago

All activity stopped a couple of weeks ago. It was extremely active and had close to 5 thousand stars/watch events before it was removed/made private. Unfortunately I never got around to indexing the code. You can find the insights at https://devboard.gitsense.com/microsoft/azurechatgpt

Full Disclosure: This is my tool

dang · 2 years ago

It looks like your account has been using HN primarily (in fact exclusively) for promotion for quite some time. I'm not sure how we didn't notice this before but someone finally complained, and they're right: you can't use HN this way. Note this, from https://news.ycombinator.com/newsguidelines.html: Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.

Normally we ban accounts that do nothing but promote their own links, but as you've been an HN member for years, I'm not going to ban you, but please do stop doing this! We want people to use HN to read and post things that they personally find intellectually interesting—not just to promote something.

If I go back far enough (a couple hundred comments are so), it's clear that you used to use HN in the intended spirit, so this should be fairly easy to fix.

CodeCompost · 2 years ago

It seems to have been transferred?

https://github.com/matijagrcic/azurechatgpt

If it was transferred, the /microsoft link would have redirected to it. Instead, it's the git commits re-uploaded to another repo - so the commits are the same but it didn't transfer past issues, discussions or PRs https://github.com/matijagrcic/azurechatgpt/pulls?q=

phillipcarter · 2 years ago

No relation. That project was just a reference implementation of "chat over your data via the /chat API" with a really misleading name.

paxys · 2 years ago

Based on past discussion, my guess is it was removed because the name and description were wildly misleading. People starred it because it was a repo published by Microsoft called "azurechatgpt", but all it contained was a sample frontend UI for a chat bot which could talk to the OpenAI API.

thund · 2 years ago

maybe this? https://github.com/microsoft/chat-copilot

pl4nty · 2 years ago

the name was confusing, it'll be back soon under a different name

Deleted Comment