Readit News logoReadit News
tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights    · Posted by u/tovacinni
zackangelo · a year ago
Is this tied to a specific framework like pytorch or an inference server like vLLM?

Our inference stack is built using candle in Rust, how hard would it be to integrate?

tovacinni · a year ago
We’d just need to write a Rust client for the daemon and load the weights in a way that is compatible with candle- we can definitely look into this since parts of what we are building is already in Rust!
tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights    · Posted by u/tovacinni
phyalow · a year ago
Genuine question, whats the difference between your startup and just calling the below code with a different model on a cloud machine, other than some ML/Dev OP's engineer not knowing what they are doing...?

  model = get_model(\*model_config)
  state_dict = torch.load(model_path, weights_only=True)
  new_state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()}
  model.load_state_dict(new_state_dict)
  model.eval()
  with torch.no_grad():
  output = model(torch.FloatTensor(X))
  probabilities = torch.softmax(output, dim=X)
  return probabilities.numpy()

tovacinni · a year ago
The advantage of loading from a daemon over loading all the weights at once in Python is that it can support multiple processes or even the same process consecutively (if it dies or something, or had to switch to something else).

Loading from disk to VRAM can be super slow- so doing this every time you have a new process is wasteful. Instead, if you have a daemon process that keeps multiple model weights in pinned RAM, you can load them much quicker (~1.5 seconds for a 8B model like we show in the demo).

You _could_ also make a single mega router process, but then there are issues like all services needing to agree on dependency versioning. This has been a problem for me in the past (like LAVIS requiring a certain transformer version that was not compatible with some other diffusion libraries)

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights    · Posted by u/tovacinni
volkopat · a year ago
This is really exciting! I was hoping for someone to tackle inference time and this product will definitely be a boost to some of our use cases in medical imaging.
tovacinni · a year ago
Awesome to hear- that sounds like an application we'd love to help with!

(Please feel free to reach out to us too at towaki@outerport.com !)

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights    · Posted by u/tovacinni
astroalex · a year ago
Cool! Will this work for multi-GPU inference?
tovacinni · a year ago
Yep, it'll work for multi-GPU as well!
tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights    · Posted by u/tovacinni
parrot987 · a year ago
This looks awesome! will try it out
tovacinni · a year ago
Thanks!!
tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights    · Posted by u/tovacinni
CuriouslyC · a year ago
This seems useful but honestly I think you guys are better off getting IP protection and licensing out the technology. This is a classic "feature not a product" and I don't see you competing against google/microsoft/huggingface in the model management space.
tovacinni · a year ago
Maybe! Many people don't want to be vendor locked-in though and there are new GPU cloud providers gaining traction. Some still prefer on-prem.

We hope to make it easier to bridge the multi-cloud landscape by being independent and 'outer'.

u/tovacinni

KarmaCake day79March 11, 2017
About
gpu sararīman https://tovacinni.github.io
View Original