Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I’ll either make a new one once in awhile or I’ll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

  • corvus@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    ·
    20 hours ago

    Yeah I tested with lower numbers and it works, I just wanted to offload the whole model thinking it will work, 2GB it’s a lot. With other models it prints about 250MB when fails and if you sum up the model size it’s still well below the iGPU free memory so I dont get it… anyway, I was thinking about upgrading the memory to 32GB or may be 64GB but I hesitate because with models around 7GB and CPU only I get around 5 t/s and with 14GB 2-3 t/s, so I run one of around 30GB I guess it will get around 1 t/s? My supposition is that increasing RAM doesn’t increase performance per se, just let’s you upload bigger models to memory, so performance is approximately linear on model size… what do you think?

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      20 hours ago

      From what I know, I assume yes, the relation between model size and speed/performance should be linear. Maybe there is some additional small overhead making it a bit faster or slower than expected. But I’m really not an expert on the maths, so don’t trust me.

      And maybe have a look at this bugreport: https://github.com/ggml-org/llama.cpp/issues/11332
      I think it matches your situation. They resolve this by messing with the batch size and someone recommends not to use Vulkan on an iGPU.