Is finetuning GPT4o worth it? — with Alistair…

Aug 22, 2024

How Cosine Genie reached 50% on SWE-Bench Lite, 30% on the full SWE-Bench, and 44% on OpenAI's new SWE-Bench Verified, all state of the art results by the widest ever margin recorded.

Listen →

2 Comments

Anthony Garland

Sep 9

An interesting observation from the talk is that swe requires thinking and acting like a human engineer. I wonder if there are better, not human like workflows that could be better but very different?

Expand full comment

Nathan Lambert

Aug 23

Another good episode. I appreciate it making what an “agent” is a bit more scientific. I’m trying to expand into the space a bit in my work.

Expand full comment