1 Comment

Phenomenal episode where Mark provides excellent explanations to scaling LLM large context windows, LLM evals, tuning the theta hyperparameter, RULER, Gradient's training LLama-3 70B 1M token context window. Add this to your MUST listen/read!

Expand full comment