What I Read: Optimizing LLM in production.
https://huggingface.co/blog/optimize-llm
Optimizing your LLM in production
September 15, 2023
Patrick von Platen
"...efficient LLM deployment.... pros and cons of adopting lower precision, provide a comprehensive exploration of the latest attention algorithms, and discuss improved LLM architectures."