Svelte Hacker News logo
  • top
  • new
  • show
  • ask
  • jobs
  • about

LLM Inference with Ray: Expert parallelism and prefill/decode disaggregation

anyscale.com

1 points by mycelia 6 hours ago