There is a missing ingredient that LLMs lack, however. They lack insight. Writing is made engaging by the promise of insight teased in its setups, the depths that are dug through its payoffs, and the revelations found in its conclusion. It requires solving an abstract sudoku puzzle where each sentence builds on something prior and, critically, advances an agenda toward an emotional conclusion. This is the rhetoric inherent to all storytelling, but just as in a good political speech or debate, everything hinges on the quality of the central thesis—the key insight that LLMs do not come equipped to provide on their own.
This is hard. Insight is hard. And an AI supporter would gladly tell you "yes! this is where prompting becomes art!" And perhaps there is merit to this, or at least there is merit insofar as Sam Altman's dreams of AI producing novel insights remain unfulfilled. This condition notwithstanding, what merit exactly do these supporters have? Has prompting become an art the same way that it has become engineering? It would seem AlphaWrite would like to say so.
But let's look at this rubric and evaluate for ourselves what else AlphaWrite would like to say:
```python # Fallback to a basic rubric if file not found return """Creative writing evaluation should consider: 1. Creativity and Originality (25%) - Unique ideas, fresh perspectives, innovative storytelling 2. Writing Quality (25%) - Grammar, style, flow, vocabulary, sentence structure 3. Engagement (20%) - How compelling and interesting the piece is to read 4. Character Development (15%) - Believable, well-developed characters with clear motivations 5. Plot Structure (15%) - Logical progression, pacing, resolution of conflicts""" ```
It's certainly just a default, and I mean no bad faith in using this for rhetorical effect, but this default also acts as a template, and it happens to be informative to my point. Insight, genuine insight, is hard because it is contingent on one's audience and one's shared experiences with them. It isn't enough to check boxes. Might I ask what makes for a better story: a narrative about a well developed princess who provides fresh perspectives on antiquated themes, or a narrative about a well developed stock broker who provides fresh perspectives on contemporary themes? The output fails to find its audience no matter what your rubric is.
And here lies the dilemma regarding the idea that prompts are an art: they are not. The prompts are not art by the simple fact that nobody will read them. What is read is what all that is communicated and any discerning audience will be alienated by anything generated by something as ambiguous as a English teacher's grading rubric.
I write because I want to communicate my insights to an audience who I believe would be influenced by them. I may be early in my career, but this is why I do it. The degree of influence I shall have measures the degree of "art" I shall attain. Not by whether or not I clear the minimum bar of literacy.
Some of these questions change slightly, since we might end up with "unlimited resources" (i.e. instead of having e.g. 5 engineers on a team who can only get X done per sprint, we effectively have near-limitless compute to use instead) so maybe the answer is "build everything on the wish-list in 1 day" to the "what should we prioritize" type questions?
Interesting times.
My gut is that software engineers will end up as glorified test engineers, coming up with test cases (even if not actually writing the code) and asking the AI to write code until it passes.
Maybe this could be used as proof of work? To stop wasting computing resources in crypto currencies and get something useful as a byproduct.
What makes it for education? Why can't it be used as a general purpose proof checker?
In a general purpose theorem proving environment, such as with Lean, there is a different attitude about what level of abstraction to expose by default. It's less intuitive to a child to have a tutor need to explain what it means for a function to be `unsafe` than it is to explain what it means to `print` an expression.
By creating a separate platform, you can set these defaults to curate different kinds of engagement with users. Take the `processing` language as an example. While it's Java under the hood, the careful curation of the programming environment incentivizes learners to play with it like a toy, increasing creative expression and fault-less experimentation.
Also, a github link for those who don't want to use a google account: https://github.com/rj-calvin/verisimilitude/blob/069723c94df...