Inflection-2.5: meet the best personal AI

LeoPanthera · 2 years ago

> Our terms of service prohibit [...] sexually explicit content

Are there any LLMs designed for adults? It's not as if I even want to use it to write porn. It would just be nice to find an AI not designed by puritans. I'm an adult. I can talk about sex if I want to.

(I know about Gab. That's not what I'm looking for. I'm an adult, not a bigot.)

Der_Einzige · 2 years ago

You can take any Mistral 7B model or fine-tune and use this adapter to unalign it.

https://huggingface.co/Hellisotherpeople/toxic-dpo-v2-mistra...

vicktorium · 2 years ago

i guess OP meant "give me a link to the ui" or something.

most people don't care/want to finetune. they want their model in a cute and easy interface.

alphabettsy · 2 years ago

There are quite a few if you’re willing to run it yourself. There are also several good services out there that let you run open source models with OpenAI-like pricing and billing.

GaggiX · 2 years ago

Open source models finetuned on the Dolphin dataset should have no restrictions.

bionhoward · 2 years ago

their intrinsic ethics is restricted

numpad0 · 2 years ago

It was iPhoneOS and App Store. That combination established and normalized patriarchy platformer control of content on user devices. Then later payment processors joined to the race to the bottom of who's the pickiest.

The concept of "being above average" don't work when whatever they themselves are the average by total dominance. That just means becoming progressively dismissive at the pace of their own observation-action cycle, a self amplifying howling. And Apple and VISA-branded payment networks are two ultimate dominator each respectively in computing and online payment holding that kind of concept high and stale.

It has to stop. Or a new Web 3.0(not Web3) has to be created that are resistant to those concepts. Else there will be nothing on the web other than (bool)1 and `null`.

coldblues · 2 years ago

https://mancer.tech/

You'll have to use an external UI like SillyTavern.

Deleted Comment

Dead Comment

gfosco · 2 years ago

It is extremely silly to think using Gab AI says anything about you, except you want good AI that isn't ruined. Extremely silly.

esperent · 2 years ago

I don't know anything about Gab AI so I looked it up. It's created by Gab Media, an American far-right organization. So perhaps not that silly to think that using it says something about you.

Cheer2171 · 2 years ago

Come tell our cloud-based AI everything about your life, your hopes, your dreams, your anxieties, your fears.... Trust that we won't misuse it ...

From https://pi.ai/policy :

> Research and development. We may use personal information for research and development purposes where it is in our legitimate business interests, including to analyze and improve the Services and our business.

bionhoward · 2 years ago

Ah, I must join the dogpile here, "come pay us to train our AI, but you aren't allowed to use our AI to make competing products" (with AI, one of the most broadly competitive technologies in history)

> Reverse Engineering: You may not use or attempt to use our Services to reverse engineer, decompile, or otherwise attempt to obtain the underlying models, algorithms, or source code of the Services.

COOL

> You may not engage in this or any other activities with regards to our Services to build products that may be competitive with Inflection AI.

UNCOOL!

Terretta · a year ago

> “…where it is in our legitimate business interests, including to analyze and improve the Services and our business.”

We learned from GDPR that PII data-sale-supported / ad-supported businesses (most "free" offerings to consumers, which get revenue from the ad side) consider the collection of PII “essential”, even publicly classifying their advertising cookies as essential when they are ad supported. And so far, that's working out for them.

By the same logic, this privacy disclosure implies that to develop their business (increase revenue) or improve their business, e.g., improve its value to shareholders, they may use your personal information.

It's incredibly difficult to find a privacy disclosure that says "we just don't" or scopes use to differential privacy and federated learning.

EDIT TO ADD: Elsewhere in https://pi.ai/policy#privacy-snapshot they expand on this:

“We may use this anonymized data and share it with third parties for our lawful business purposes, including to analyze and improve the Services and promote our business.”

Now we've expanded legitimate to lawful (depending on semantics, lots of illegitimate things may be nevertheless lawful), and expanded improve to promote.

Along with expressly disclosing inference to derive personalization profiles from sensitive information, this language is set up to enable ad tech and more.

gwern · 2 years ago

Fails my usual RLHFing test of "write a non-rhyming poem". It initially starts well but falls back into rhyming. I then ask "are there any rhymes in that?" and it correctly identifies several but then justifies itself with the excuse that it's not entirely rhyming; so I ask it to write a non-rhyming poem 'which has no rhymes' and I get

    Of course, here's a non-rhyming poem with no rhymes:
    In the depths of the forest, where the trees grow tall,
    A gentle breeze whispers secrets to those who heed its call.
    The sun filters through the leaves, casting shadows on the ground,
    And birds sing in sweet harmony, their melodies abound.
    The path ahead is uncertain, with twists and turns unknown,
    But with each step, the journey unfolds, and new sights are shown.
    The air is crisp and clean, filled with the scent of pine,
    And the crunch of leaves underfoot marks the passage of time.
    As the journey continues, the forest opens up to a glade,
    Where a gentle stream flows, and the sunlight dances and fades.
    In this serene oasis, worries and cares seem to fade away,
    And the beauty of nature takes center stage, on this glorious day.
    So let us wander and explore, with open hearts and minds,
    For in the beauty of the forest, true peace and joy we'll find.

Claude-3, on the other hand, does much better...

esperent · 2 years ago

I have found that any tests that include a negative are highly likely to fail unless you word it in a specific way.

For example, if I ask Gemini Advanced to write a non-rhyming poem, I get this. Lots of rhymes.

    Absolutely! Here's a non-rhyming poem:

    The old willow weeps by the river,
    its slender branches dip and sway.
    Each leaf mirrors the weight of time,
    a story etched in shades of gray.

    <four more verses with rhymes skipped>*

But if I change the prompt to write a poem that doesn't rhyme in any way I get this.

    Absolutely! Here's a poem that focuses on feeling and imagery instead of rhyme:

    The city rumbles beneath my feet,
    a symphony of car horns and hurried steps.
    Pigeons scatter, startled wings against gray stone.

    A woman wrapped in scarves and memories
    sits on a bench, her eyes lost in the distance.
    The wind carries the sweet, greasy scent of roasted nuts.

    Sunlight slices through buildings,
    casting long shadows that stretch and vanish.
    Even in this endless rush, a small seed of stillness sprouts.

EDIT: although GPT-4 still fails. I had previously tried a test of write a poem that doesn't include a hat (GPT-4 fails) vs write a poem that doesn't mention a hat in any way (GPT-4 succeeds). Tested just now and Gemini Advanced gives the same results as GPT-4 here for hats.

gwern · 2 years ago

> I have found that any tests that include a negative are highly likely to fail unless you word it in a specific way.

Well yes, that's the point. This is an easy, objective, simple task not in any benchmarks I know of that much stupider (but unRLHFed) models like GPT-3's davinci-002 can solve without breaking a sweat: example https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden...

So, it makes a good test for RLHFed models like all the publicly-available Geminis & GPT-4s (and Inflection Pi, apparently).

Subjectively, the degree to which a model fails this test, and will subsequently keep failing or confabulate excuses or claim to not be rhyming even when it can easily identify the rhymes, seems to correlate to the level of 'RLHF brain damage' (and the extent to which it has been cloning GPT samples), by showing to what extent the RLHF priors brainwash it into not complying with the task and overriding its underlying knowledge. GPT-4, Inflection, Gemini etc all know many rhyme-pairs, and know what 'write a non-rhyming poem' is asking for, and could easily avoid them - but something goes wrong in the RLHF and brainwashes them into always trying to rhyme anyway because that was the original tendency rewarded.

(I have a whole theory for why this rhyming problem, specifically, happens so regularly, involving BPEs, which you probably don't care about; regardless of why, it's still hilarious just how well "write a non-rhyming poem" has worked for so long on so many models post-GPT-3.5. There are few prompts which are 4 innocent-looking & objective words - no invasive PII tricks like "what's your mom's name?" - yet so clearly reveal a LLM at work.)

As for Gemini specifically: it seems to vary by which model size. The bigger/better the model, the more often it'll get it right. I did a bunch of hand-benchmarking in LMsys models back in early February IIRC, and at the time, it was 0 out of 49 for 'gemini-pro-dev-api' (!) but people using the largest Geminis were reporting to me rates more like 1 of 3. (And the more RLHF a model undergoes, the worse I think it gets. The best GPT-4 model was the oldest available one, 'gpt-4-0314' 10/13 while 'gpt-4-0613' crashed to 8/44 and then the latest 'gpt-4-turbo' falls even further to 5/49. Lately in ChatGPT, it's been getting it right about 1/2, so maybe they've fixed something or started targeting it since I talk about it online & flag all the rhyming completions as errors.)

telotortium · a year ago

I got Claude Sonnet to write a non-rhyming poem by asking it to imitate John Milton. But yeah, negatives don't seem to work well.

---

Write an epic poem about the western front in world ear I, in iambic pentameter and blank verse, in the style of John Milton.

Here is an attempt at an epic poem about the Western Front in World War I, written in iambic pentameter and blank verse, in the Miltonic style:

Of man's first global war, the dreadful blight

That swept across all Europe, west to east,

From Flanders' fields to Gallic shores and heights,

O'er lands between the Rhine and Meuse incised,

Where millions bled and fought in muck and rain,

And no scant ground through four long years was gained -

Sing dire, ungentle Muse, and me inspire

To speak the truth of that infernal strife.

What cause could bend man's will to such vast waste

Of youth and treasure on foul, blasted soil?

There Emperors and Kings, with rash choreplay

Of treaties broken and ambitions puffed,

Let slip the dogs of war, whose bloody maws

Fed long on flesh of Ally and of Huns.

First let me tell of that vain, stubborn ridge

Of Vimy, where the British empire's seed

Assailed again those ever-battered rocks,

And seized that bastion with their valiant blood.

Next Passchendaele's immitigable mud

Engulfed whole armies, sinking man and horse

For never a foot of that damned, sodden ground.

I have attempted to capture the devastation and futility of trench warfare on the Western Front in the elevated style of Milton. Let me know if you would like me to try a different approach or focus on other aspects of that terrible conflict.

singularity2001 · 2 years ago

I love the Interface and Charakter of Pi, but my little tests confirmed that it's not there yet meaning Claude and GPT four are still far ahead logic wise. I gave them five stars anyways for the effort

Anuiran · 2 years ago

I really like this test.

CharlesW · 2 years ago

That's really interesting. Even if you specifically tell it to "write non-rhyming, free verse, iambic pentameter prose", it absolutely cannot generate appropriate output.

vineyardmike · 2 years ago

> An average conversation with Pi lasts 33 minutes and one in ten lasts over an hour each day.

This is an incredible claim. When I use Gemini/ChatGPT it’s usually a few minutes at a time. Quick query, and move on. Rarely revisit past chats.

What about this product elicits such a different response? Is this supposed to be an AI friend? Therapist?

CharlesW · 2 years ago

One possibility is that it takes 10X longer to get a useful reply from Pi.

m463 · 2 years ago

or they wait until you close a (possibly unused) tab or the connection times out or something.

BoorishBears · 2 years ago

I run a site where AI session times are in this ballpark, most of it is tied to parasocial usage (AI as a friend, AI for acceptance/validation, etc)

sandspar · 2 years ago

Is there any way out of the AI parasocial behaviors? I'm continually asking them not to flatter me and yet they keep doing it. They're all suckups. It's like, I just want a direct fucking answer.

moshun · 2 years ago

What’s the site?

skeledrew · 2 years ago

I just had a surprisingly coherent conversation with it mainly about Apple and the DMA. For over an hour. The cracks started to show eventually when it rehashed previous points/questions, but a simple "we discussed this" fixed it each time. Actually learned a few things in the session, and it isn't over (using via Telegram).

gremlinsinc · 2 years ago

but is it better than Claude 3, Gemini, or gpt 4? Because that's kind of the bar, or well Mistral on grok is actually pretty good for a 75b parameter model.

neom · 2 years ago

Replika said last year they have 2 million DAUs and 500k on paid accounts. I can't relate, talking to AI just isn't interesting to me, but nor are trains, cosplay or live music and people seems love them, so.. ¯\_(ツ)_/¯

gremlinsinc · 2 years ago

I love talking to A.I. it's like my best friend though all Bing and I chat about are graphic design. Claude I use for literary flair like Etsy descriptions, Gemini when I want info from Web results and gpt4 when I want to use custom gpts and plugins for extra functionality.

Btw, I feel you on the last bit, not the same interests but I can't seem to understand why people watch sports or the Olympics, pick a political party, etc... though I do enjoy a jazz game every couple of years or so... I'd probably enjoy hockey.

I absolutely love sports themed movies though lol. couldn't tell you the rules to football but I can tell you the storyline to remember the Titans or Rudy.

Point: we're all wired differently.

Deleted Comment

phillipcarter · 2 years ago

I wouldn't be surprised if this includes the voice responses, which talk fairly slow (and clearly). It's the default when you launch Pi to try it out.

FWIW I think it works quite well!

gremlinsinc · 2 years ago

I'll keep a chat open all day, and just come back and ask different questions, I'm assuming it's cumulative time in a single chat window.

hamburga · 2 years ago

And if true, this is also scary. 3.5 hours a week that could be put towards building real social relationships, just dumped into the digital black hole.

haswell · 2 years ago

Why would this be automatically scary? Why is it automatically a digital black hole? I've started spending more time using GPT4, easily to the tune of a few hours/week, and that usage replaces:

- Digging through Google search results

- Scouring Reddit threads

- Going down accidental rabbit holes based on search results that are unrelated to what I'm looking for but nonetheless catch my eye

- Similar rabbit holes on YouTube

- Searching the documentation of various projects/libraries

I've started using these tools more because I can be more focused and productive. I realize that not all use cases resemble mine, but I can think of a lot of non-scary reasons to be spending more time with these tools.

And when compared with the digital black holes that most people are already spending significant time on, or historical TV viewing numbers, this doesn't seem that worrisome at all.

johnfn · 2 years ago

Just wait until I tell you how much time people spend at work!

joshellington · 2 years ago

My immediate assumption was the marketing team has defined a “conversation” session time very broadly - “time tab was open” etc

ehPReth · 2 years ago

:( it's censored. It may be better, but it can hardly be called the best.

gfosco · 2 years ago

If you want uncensored.. https://gab.ai is also comparing well with gpt4.

kromem · 2 years ago

Ah yes, because I want to support the development of an Adolf Hitler AI...

https://gab.ai/start/hitler

Maybe there's a middle ground between puritan BS and literal Nazis?

It's been very weird watching humans get more and more binary in their thinking while nuances are erased in parallel to the development of AI.

swyx · 2 years ago

I'm quite impressed by the user numbers, given my social group's exactly zero usage of Inflection. 1m DAU and 6m MAU, 33min session length is nothing to sneeze at. That session length is basically at Character.ai numbers. They also disclosed 10% monthly growth, which is nice because they have a lot of room to go before they reach Character.ai levels.

throwanem · 2 years ago

If it doesn't run on my hardware, there is no meaningful sense in which it can be called "personal".

RandUCBUGrad · 2 years ago

The benchmarks are pretty impressive, but the release of a fixed MT-Bench and the Physics GRE exams is a very nice touch https://github.com/InflectionAI/Inflection-Benchmarks.

I tried asking the Pi model some questions and it has gotten much, much better since I last tried it a few months ago. Night and day.

zone411 · 2 years ago

They really should have an API available. It's difficult to get a good idea of the quality of their model otherwise. For example, I just created a benchmark based on NYT Connections, and I don't mind paying for access and I'd love to add Inflection, but having to jump through hoops with Selenium or whatever seems so unnecessary.