1) ($$m total, $$k/mo now) Server hosting! This started off as hosting game servers for friends in early high school. It expanded to friends of friends for awhile before I pivoted more towards crypto mining (yes I know, young kid with dubious ambitions). Lots of my first software experience was here. I wrote some software for switching the processes between mining/game hosting across the different boxes (raw processes on servers, the horror). Nowadays it’s winding down as just game hosting and some scientific computing rental to a few universities/their robotics clubs. It’s still slightly profitable but I have no interest in updating servers (CPUs from 2013-2015, GPUs mostly re-sold except for what a few people requested) and it runs everything via containers now.
2) ($$k/mo) Sports film review. I wrote the first version of this my first year of college which was a way to keep the stats book for basketball and football games and stitch the actions with the video footage. We had customers throughout ~20 states primarily high school but some colleges as well. In fact it still runs at a lot of them, but I’m not really connected with managing of it anymore. My co-founder still runs it and we rotate a few students from our alma-matter in as interns and occasionally juniors on it.
This became the basis for an esports version of the software that I created a few years ago. This time with CV to do all the gathering of stats and allowing for jumping around in videos and analyzing overall stats from the output. This started in Call of Duty for their then new professional league but expanded out to Halo, Rocket League and Valorant since. I still do some occasional retraining of the models but the product itself got acqui-hired by a larger company for which I still “consult”
EDIT: I’ve also had many more that cost me more money than they ever made, but I’m a big proponent of failing fast and iterating
An example from a previous job where we used a hand tooled NLP system was querying for doctors/dentists/optometrists and being able to take something like “dentist near me who is available in the afternoons and speaks spanish”. We would parse this user query into a few different queries that would run against a search cluster and database to return the filtered result set, or the closest output.
What would be the ideal way to prepare or tokenize this data for querying with an LLM? It’s partially text (match dentist, speak Spanish), partially geographic (near me, doing a geo radius of N miles from providers location, and part filtering (who meets all those criteria and has availability in a time frame). Is this a use case for large token sizes to be able to take in all possible providers? Or parsing a query more easily from human language -> SQL/other data store query language? Or perhaps figuring out another way to encode this data?