• kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    9 months ago

    For anyone interested in algorithmic changes that improve efficiency, Microsoft’s recent research around moving from floating point weights to ternary ones (1, 0, -1) was really impressive:

    https://arxiv.org/abs/2402.17764

    Basically at larger parameter sizes it outperforms FP networks while being a fraction of the memory footprint and bypassing the need for matrix multiplication.

    It kind of makes sense that it works too, given past research that the networks are creating a virtualized node topology based on combinations of physical nodes, so with enough nodes to work with there isn’t a loss in functionality and the discrete weights should arrive at optimal thresholds more easily than slight adjustments to FP values.

    The next generation of models built on this need to be trained from scratch (this is about pretraining and not quantization after the fact), but it should open the door to new hardware architectures better optimized for networks of ternary weights.

  • Lettuce eat lettuce@lemmy.ml
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    2
    ·
    9 months ago

    Same pattern as crypto. Hype the tech, spend millions on developing chips that can only do one specific thing, deploy them in a virtual gold rush, eventually the bubble pops and the last fools are left holding the bags trying to offload stacks of ASICs that are worthless.

    Billions wasted trying to capitalize on “AI” that will largely cause more harm than good.

    • mesamune@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 months ago

      Crypto did have the asci miners (I think I have that right?) that were much much faster than GPU after the initial push. I would expect the ai craze to do the same.