Scaling Llama3 beyond 1M context window with ~perfect utilization, the difference between ALiBi and RoPE, how to use GPT-4 to create synthetic data for your context extension finetunes, and more!
Phenomenal episode where Mark provides excellent explanations to scaling LLM large context windows, LLM evals, tuning the theta hyperparameter, RULER, Gradient's training LLama-3 70B 1M token context window. Add this to your MUST listen/read!
Phenomenal episode where Mark provides excellent explanations to scaling LLM large context windows, LLM evals, tuning the theta hyperparameter, RULER, Gradient's training LLama-3 70B 1M token context window. Add this to your MUST listen/read!