tovacinni (u/tovacinni)

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights · Posted by u/tovacinni

zackangelo · a year ago

Is this tied to a specific framework like pytorch or an inference server like vLLM?

Our inference stack is built using candle in Rust, how hard would it be to integrate?

tovacinni · a year ago

We’d just need to write a Rust client for the daemon and load the weights in a way that is compatible with candle- we can definitely look into this since parts of what we are building is already in Rust!

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights · Posted by u/tovacinni

phyalow · a year ago

Genuine question, whats the difference between your startup and just calling the below code with a different model on a cloud machine, other than some ML/Dev OP's engineer not knowing what they are doing...?

  model = get_model(\*model_config)
  state_dict = torch.load(model_path, weights_only=True)
  new_state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()}
  model.load_state_dict(new_state_dict)
  model.eval()
  with torch.no_grad():
  output = model(torch.FloatTensor(X))
  probabilities = torch.softmax(output, dim=X)
  return probabilities.numpy()

tovacinni · a year ago

The advantage of loading from a daemon over loading all the weights at once in Python is that it can support multiple processes or even the same process consecutively (if it dies or something, or had to switch to something else).

Loading from disk to VRAM can be super slow- so doing this every time you have a new process is wasteful. Instead, if you have a daemon process that keeps multiple model weights in pinned RAM, you can load them much quicker (~1.5 seconds for a 8B model like we show in the demo).

You _could_ also make a single mega router process, but then there are issues like all services needing to agree on dependency versioning. This has been a problem for me in the past (like LAVIS requiring a certain transformer version that was not compatible with some other diffusion libraries)

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights · Posted by u/tovacinni

volkopat · a year ago

This is really exciting! I was hoping for someone to tackle inference time and this product will definitely be a boost to some of our use cases in medical imaging.

tovacinni · a year ago

Awesome to hear- that sounds like an application we'd love to help with!

(Please feel free to reach out to us too at towaki@outerport.com !)

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights · Posted by u/tovacinni

astroalex · a year ago

Cool! Will this work for multi-GPU inference?

tovacinni · a year ago

Yep, it'll work for multi-GPU as well!

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights · Posted by u/tovacinni

parrot987 · a year ago

This looks awesome! will try it out

tovacinni · a year ago

Thanks!!

tovacinni commented on Launch HN: Outerport (YC S24) – Instant hot-swapping for AI model weights · Posted by u/tovacinni

CuriouslyC · a year ago

This seems useful but honestly I think you guys are better off getting IP protection and licensing out the technology. This is a classic "feature not a product" and I don't see you competing against google/microsoft/huggingface in the model management space.

tovacinni · a year ago

Maybe! Many people don't want to be vendor locked-in though and there are new GPU cloud providers gaining traction. Some still prefer on-prem.

We hope to make it easier to bridge the multi-cloud landscape by being independent and 'outer'.