I quickly realised that these conversations had value outside the two of us - pretty much everyone else onboarded had similar questions. Some subjects were about pure onboarding friction, some were about workflows most folks didn't know existed, some were about theoretical concepts.
So I moved the questions to a public (within company) channel, and called it "Marek's Bitching" - because this is what it was. Pretty much me complaining and moaning and asking annoying questions. I invited more London folks (Zygis), and before I knew half of the company joined it.
It had tremendous value. It captured all the things that didn't have real place in the other places in the company, from technical novelties, through discussions that were escaping structure - we suspected intel firmware bugs, but that was outside of any specific team at the time.
Then the channel was renamed to something more palatable - "Marek's technical corner" and it had a clear place in the technical company culture for more than a decade.
So yes, it's important to have a place to ramble, and it's important to have "your own channel" where folks have less friction and stigma to ask stupid questions and complain. Personal channels might be overkill, but a per-team or per-location "rambling/bitching" channel is a good idea.
I'm not familiar enough with Python or Jupyter to know how you would build similar visualizations with them. What would you use?
On the front end I've always had reasonable outcomes with `wandb` for tracking runs once you kind get it all set up nicely, but it's a long tail of configuration and writing a bunch of glue code.
In this situation I'm dealing with a pretty medium amount of data and very modest model training needs (closer to `sklearn` than some mega-CUDA thing) and it feels like I should be able to give someone the company card and just get one of those things with 7 programming languages at the top of the monospace text box for "here's how to log a row", we do Smart Things and now you have this awesome web dashboard and you can give your quants this `curl foo | sh` snippet and their VSCode Jupyter will be awesome.
I'm interested in your opinion as a user on a bit of a new conundrum for me: for as many jobs / contracts as I can remember, the data science was central enough that we were building it ourselves from like, the object store up.
But in my current role, I'm managing a whole different kind of infrastructure that pulls in very different directions and the people who need to interact with data range from full-time quants to people with very little programming experience and so I'm kinda peeking around for an all-in-one solution. Log the rows here, connect the notebook here, right this way to your comprehensive dashboards and graphs with great defaults.
Is this what I should be looking at? The code that needs to run on the data is your standard statistical and numerics Python type stuff (and if R was available it would probably get used but I don't need it): I need a dataframe of all the foo from date to date and I want to run a regression and maybe set up a little Monte Carlo thing. Hey that one is really useful, let's make it compute that every night and put it on the wall.
I think we'd pay a lot for an answer here and I really don't want to like, break out pyarrow and start setting up tables.
The one other big thing that Domino isn't, is it's not a database or data warehouse. You pair it with something like BigQuery or Snowflake or just S3 and it takes a huge amount of the headache of using those things away for the staff you're describing. The best way to understand it is to just look at this page: https://docs.dominodatalab.com/en/cloud/user_guide/fa5f3a/us...
People at my work, myself included, absolutely love this feature. We have an incredibly strict and complex cloud environment and this makes it, so people can skip the setup nonsense and it will just work.
This isn't to say that you can't store data in Domino, it's just not a SQL engine. Another loved feature is their datasets. It's just EFS masquerading as an NFS, but Domino handles permissions and mounting. It's great for non-SQL file storage. https://docs.dominodatalab.com/en/cloud/user_guide/6942ab/us...
So, with those constraints in mind, I'd say it's great for what you're describing. You can deploy apps or API endpoints. You can create on-demand large scale clusters. We have people using Spark, Ray, Dask, and MPI. You can schedule jobs and you can interact with the whole platform programmatically.