It's pretty simple:
- Use a keyboard shortcut to take a screenshot of your active macOS window and start recording the microphone.
- Speak your question, then press the keyboard shortcut again to send your question + screenshot off to OpenAI Vision
- The Vision response is presented in-context/overlayed over the active window, and spoken to you as audio.
- The app keeps running in the background, only taking a screenshot/listening when activated by keyboard shortcut.
It's built with NodeJS/Electron, and uses OpenAI Whisper, Vision and TTS APIs under the hood (BYO API key).
There's a simple demo and a longer walk-through in the GH readme https://github.com/elfvingralf/macOSpilot-ai-assistant, and I also posted a different demo on Twitter: https://twitter.com/ralfelfving/status/1732044723630805212
I was skimming through the video you posted, and was curious.
https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s
code link: https://github.com/elfvingralf/macOSpilot-ai-assistant/blob/...
I suspect OSX vs macOS has marginal impact on the outcome :)
I prefer speaking over typing, and I sit alone, so probably won't add a text input anytime soon. But I'll hit you up on Discord in a bit and share notes.
That would be great for people with Mac mini who don't have a mic.
Just kidding. Text seem to be the most requested addition, and it wasn't on my own list :) Will see if I add it, should be fairly easy to make it configurable and render a text input window with a button instead of triggering the microphone.
Won't make any promises, but might do it.
https://github.com/samoylenkodmitry/Linux-AI-Assistant-scrip...
F1 - ask ChatGPT API about current clipboard content F5 - same, but opens editor before asking num+ - starts/stops recording microphone, then passes to Whisper (locally installed), copies to clipboard
I find myself rarely using them however.
EDIT: I checked again and it seems the pricing is comparable. Good stuff.
Right now there's also a daily API limit on the Vision API too that kicks in before it gets really bad, 100+ requests depending on what your max spend limit is.
https://news.ycombinator.com/item?id=38244883
There are some pros and cons to that. I’m intrigued by your stand-alone MacOS app.
I can see how much time it will save me when I'm working with a software or domain I don't know very well.
Here is the video of my interaction: https://www.youtube.com/watch?v=ikVdjom5t0E&feature=youtu.be
Weird these negative comments. Did people actually try it?
I sent him your video, hopefully he'll believe me now :)
MidiJourney: ChatGPT integrated into Ableton Live to create MIDI clips from prompts. https://github.com/korus-labs/MIDIjourney
I have some work on a branch that makes ChatGPT a lot better at generating symbolic music (a better prompt and music notation).
LayerMosaic allows you to allow MusicGen text-to-music loops with the music library of our company. https://layermosaic.pixelynx-ai.com/
"Here's a list of effects. Here's a list of things that make a song. Is it good? Yes. What about my drum effects? Yes here's the name of the two effects you are using on your drum channel"
None of this is really helpful and I can't get over how much it sounds like Eliza.
https://www.youtube.com/watch?v=zyMmurtCkHI
So... beware when you use it.
You can turn it on and off. Not necessary to turn it on when editing confidential documents.
You never enable screen-sharing in videoconferencing software?
Deleted Comment