The rule of thumb is that inference speed halves with every doubling of paramete...

		lhl on Oct 2, 2023 \| parent \| context \| favorite \| on: Stable LM 3B: Bringing Sustainable, High-Performan... The rule of thumb is that inference speed halves with every doubling of parameter size (and obviously a doubling of memory size). You can check out real world performances on devices here: https://llm.mlc.ai/