Readit News logoReadit News
causal · 2 years ago
This kind of reflects the fact that a lot of working with LLMs is just organizing text, and prompts can become a real engineering problem when you are orchestrating pipelines of dozens or more files with completions at various points with context windows of 100K tokens or more.

I've not found a satisfying framework yet, generally find raw Python best. But I still spend too much time on boilerplate and tweaking formatting or samplers and chunking for context windows.

If anyone knows of a better tool for abstracting that away (LangChain is not it IMO) please let me know.

stavros · 2 years ago
DSPy?
_andrei_ · 2 years ago
Pretty nice, made a CLI app for this as well, seems like a common need: https://github.com/3rd/promptpack

But sending whole files isn't always optimal, I'm thinking there has to be a better way, like picking workspace symbols and pulling in only the code they depend on from other files. Something something LSP/tree-sitter-based.

anotherpaulg · 2 years ago
This is what aider does, using tree sitter to extract the AST from each source file. It uses the ASTs to build a call graph. And then does a graph optimization to identify the most relevant parts of the code base, given the current state of the LLM chat.

There’s more details in this article:

https://aider.chat/2023/10/22/repomap.html

_andrei_ · 2 years ago
Ah wasn't aware, nice work!
arthurcolle · 2 years ago
aider is super limited. solid approach but needs a lot of work to make it usable
bredren · 2 years ago
Cool, thanks for sharing.

That's a great point.

A tree-based file path browser with ability to select all or individual functions or classes would be cool.

Jetbrains IDEs have a good interface for symbols via the refactoring UI. Maybe I'll look there for some inspiration.

Jimmc414 · 2 years ago
This is nice. I created something similar, https://github.com/jimmc414/1filellm

It converts papers, repositories, PRs, YT transcripts and web docs into one text file in the clipboard for llm ingestion

Sakos · 2 years ago
This stuff sounds cool, but doesn't it quickly run into token/context limits on the models?
Jimmc414 · 2 years ago
Not anymore in the subscription LLM offerings. Claude seems to allow 70k tokens or more in their paid UI, ChatGPT seems to be about half of that while custom GPTs allow well over 100k.
smcleod · 2 years ago
I use code2prompt (https://github.com/mufeedvh/code2prompt) with the following zsh wrapper:

  function code2prompt() {

    # wrap the code2prompt command in a function that sets a number of default excludes
    # https://github.com/mufeedvh/code2prompt/

    local arguments excludeFiles excludeFolders templatesFolder excludeExtensions
    
    templatesFolder="${HOME}/git/code2prompt/templates"
    excludeFiles=".editorconfig,.eslintignore,.eslintrc,tsconfig.json,.gitignore,.npmrc,LICENSE,esbuild.config.mjs,manifest.json,package-lock.json,\
    version-bump.mjs,versions.json,yarn.lock,CONTRIBUTING.md,CHANGELOG.md,SECURITY.md,.nvmrc,.env,.env.production,.prettierrc,.prettierignore,.stylelintrc,\
    CODEOWNERS,commitlint.config.js,renovate.json,pre-commit-config.yaml,.vimrc,poetry.lock,changelog.md,contributing.md,.pretterignore,.prettierrc.json,\
    .prettierrc.yml,.prettierrc.js,.eslintrc.js,.eslintrc.json,.eslintrc.yml,.eslintrc.yaml,.stylelintrc.js,.stylelintrc.json,.stylelintrc.yml,.stylelintrc.yaml"
    excludeFolders="screenshots,dist,node_modules,.git,.github,.vscode,build,coverage,tmp,out,temp,logs"
    excludeExtensions="png,jpg,jpeg,gif,svg,mp4,webm,avi,mp3,wav,flac,zip,tar,gz,bz2,7z,iso,bin,exe,app,dmg,deb,rpm,apk,fig,xd,blend,fbx,obj,tmp,swp,\
    lock,DS_Store,sqlite,log,sqlite3,dll,woff,woff2,ttf,eot,otf,ico,icns,csv,doc,docx,ppt,pptx,xls,xlsx,pdf,cmd,bat,dat,baseline,ps1,bin,exe,app,tmp,diff,bmp,ico"

    echo "---"
    echo "Available templates:"
    ls -1 "$templatesFolder"
    echo "---"

    echo "Excluding files: $excludeFiles"
    echo "Excluding folders: $excludeFolders"
    echo "Run with -nn to disable the default excludes"

    # array of build arguments
    arguments=("--tokens")

    # if -t and a template name is provided, append the template flag with the full path to the template to the arguments array
    if [[ $1 == "-t" ]]; then
      arguments+=("--template" "$templatesFolder/$2")
      shift 2
    fi

    if [[ $1 == "-nn" ]]; then
      command code2prompt "${arguments[@]}" "${@:2}" # remove the -nn flag
    else
      command code2prompt "${arguments[@]}" --exclude-files "$excludeFiles" --exclude-folders "$excludeFolders" --exclude "$excludeExtensions" "${*}"
    fi
  }

acbart · 2 years ago
"Isn't this just a GUI for the cat command" "Oh. That's the joke."
reidbarber · 2 years ago
Nice! I made something similar but for the browser recently: https://files2prompt.com

I think there some CLI tools out there as well.

levysoft · 2 years ago
Thank you for sharing, I found it really useful and well done! You did a really great job!
reidbarber · 2 years ago
Thanks, I appreciate it! I hope to keep adding features. If there are missing features that you'd use, feel free to leave them in a reply.
Birdguy05761 · 2 years ago
This is sweet!! Organizing source docs manually is so tedious