review:2024-01_medusa_simple_llm_inference_acceleration_framework_with_multiple_decoding_heads

문서의 이전 판입니다!

2024-01 Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

https://arxiv.org/abs/2401.10774

Medusa, 추론최적화, LLM, 2024, speculative decoding

/var/www/html/data/pages/review/2024-01_medusa_simple_llm_inference_acceleration_framework_with_multiple_decoding_heads.txt · 마지막으로 수정됨: 2024/03/23 02:42 저자 127.0.0.1