https://github.com/ryao/gemini-chat
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
Example for 1.5:
https://github.com/googleapis/python-aiplatform/blob/main/ve...
We're a software engineering consultancy specialized in generative AI. Small team of senior ML engineers offering remote expertise to startups across US/Europe, solving unique challenges of running generative AI in production.
Looking for ML engineers who are a mix of ML expert, software engineer, researcher, and hacker. You'll work embedded in client projects on: - Converting bleeding-edge open source AI models to production - Building production LLM pipelines from scratch - Improving models on speed, robustness, and performance - Designing custom LLM benchmarks and evaluation - Building and scaling ML infrastructure - Setting up monitoring, tracing, and prompt management
Tech stack: Python, LLMs, AWS/GCP, MLOps tools, Docker, Git, Nix
We value: self-starters, quick learners, strong communication skills, software quality, open source. Role involves engineering, talking to clients and outreach.
Requirements: Strong Python, experience with production LLM systems, cloud platforms, MLOps. Must be EU work eligible and live within 2 hours of Nijmegen, Netherlands. Remote work with occasional meetups.
Benefits: 25 days PTO, home office budget, professional development budget
Apply: https://datakami.com/careers
Recruiters/freelancers/agencies: we're not working with recruiters or considering freelancers or agencies at this time.