I’ve been looking into self-hosting LLMs or stable diffusion models using something like LocalAI and / or Ollama and LibreChat.

Some questions to get a nice discussion going:

  • Any of you have experience with this?
  • What are your motivations?
  • What are you using in terms of hardware?
  • Considerations regarding energy efficiency and associated costs?
  • What about renting a GPU? Privacy implications?
  • robber@lemmy.mlOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 months ago

    Thanks! Glad to see the 8x7B performing not too bad - I assume that’s a Mistral model? Also, does the CPU significantly affect inference speed in such a setup, do you know?

    • Audalin@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 months ago

      If your CPU isn’t ancient, it’s mostly about memory speed. VRAM is very fast, DDR5 RAM is reasonably fast, swap is slow even on a modern SSD.

      8x7B is mixtral, yeah.