How to Run a Paper Club
Your ultimate Paper Club Starter Kit, from your friends at the Latent Space Paper Club, where we have now read >100 papers. Also: Announcing Latent Space Paper Club LIVE! at Neurips 2024! Join us!
We are excited to announce Latent Space LIVE at NeurIPS 2024! This will be the AI Engineer-focused remote+IRL event complementing NeurIPS with 3 categories:
Too Hot For NeurIPS (papers too new/rejected unfairly for NeurIPS),
Best Papers of 2024 (survey talks presenting top papers of the year in a domain, and audience will vote winners), and
Oxford Style Debates (controversial motions for debate with people arguing for/against).
Sign up here for the livestream + to be first for tickets, and apply to speak!!
We’ve really enjoyed running both the Latent Space academic conference coverage (NeurIPS 2023, ICLR 2024) and the Latent Space Paper Club (this essay) and view it as an essential part of our AI Engineering education. The Why of Paper Clubs is simple: 1) shared accountability, 2) learning to read papers/familiarize with a field by spaced repetition, 3) staying close to “ground truth” of papers rather than social media shallow discussion, 4) developing a critical mind as aware of the flaws in many papers as there are quality effort in them. We’ve greatly benefited from crossing the 100-paper-mark at the LSPC and wanted to share some advice for others.
Eugene Yan wrote most of the following guide (xposted to his blog in original form), with Swyx’s comments in italics. See you in Vancouver!
Over the past 18 months, the Latent Space Paper Club (LSPC) has had an unbroken streak of hosting a well-attended online LLM paper club every single week (the attendee list went up to 1000 before culling inactives, and is now back at >450).
That's at least 80 papers, and likely >100, when we consider weeks when we cover multiple related papers (e.g., LoRA + QLoRA, or survey paper days). Together, we pre-read and discuss a paper weekly, covering the fundamentals such as:
Key components: Attention, LayerNorm, FlashAttention, LoRA/QLoRA, ALiBi, RoPE
Models: Transformer, BERT, T5, GPTs, Codex, LLaMAs, Mistral, RWKV, Jamba, Mamba, MoEs, CLIP, Whisper, Moshi, Molmo, the Consistency Models papers, TimeGPT
Training: InstructGPT, RLHF, RLAIF, PPO, REST, SPIN, Self-Play, Orca, Upcycling
Inference: Speculative decoding, Writing in the Margins, Test-time Compute
Trends: Scaling laws, Chinchilla, synthetic data, Gorilla, generative agents
Practice: RAG, EvalGen, DocETL, Copilot, Finetuning vs. RAG, industry surveys
Here's a year's worth of papers, starting from the basics of attention. You can also see:
swyx’s ai-notes or for a more curated version
This has equipped us with the foundation to understand how these techniques and models work, build systems and applications on top of them, and apply them effectively at work and for personal projects. But it's not just about technical knowledge. We've also benefitted from practitioners (including paper authors discussing their own papers with us) sharing their insider know-how, built friendships at in-person meetups, and grown a community of learners and builders.
We'd like you to benefit from this too. That's why we wrote this guide to help you start your own paper club and start learning with your peers.
The First Rule of Paper Club is… Start Now
Step one is just start. It helps have supportive friends/coworkers:
Inside a Single Paper Club meeting
Here we discuss some mechanics that we worked out over time:
Regular Schedules. Every Wednesday, we gather for an hour over lunch1 to discuss a pre-selected paper. (swyx: The weekly cadence offers more connection and space for covering more papers than biweekly or monthly.)
Seek Next Week Volunteers. Paper Clubs live and die by their facilitators. Where we used to scramble last minute in the days prior to get volunteers, we’ve started to ask for volunteers upfront DURING a paper club, so as to establish the routine of expectations. We’ve not yet cracked how to continually encourage new voices and faces to volunteer, short of making it a rather draconian requirement of joining the club.
Pre-reading. Even when you are not presenting the paper, ESPECIALLY when you are not presenting, it helps to pre-read the paper to get the most out of the session. I usually do it the weekend before, and it takes me around an hour using Zotero (swyx: I also use EmergentMind, made by friend of the LSPC Matt Mazur). Reading the paper beforehand prepares you for the session proper, where you can clarify questions or doubts, share insights, and help others understand the material. By my estimate (and experience), skipping the pre-read reduces the value you'll get from the discussion by ~80% or more. (swyx: It also makes for less insightful discussion if you haven’t come up with independent thoughts or critiques of the paper to discuss with the presenter.)
More tips on How To Read Papers in the following section!
Facilitating. Each week, a volunteer guides the group through the paper, covering the motivation, related literature, methodology, results, etc. This usually takes around 45 minutes, with pauses after each section for quick questions. In the last 15 minutes, we have free-form discussion and discuss the implications of the paper, connect it to other work, and consider how to apply the ideas to our work and the broader industry.
More tips on How To Facilitate Papers below!
How to Run A Paper Club for >100 Papers
Selecting papers. Volunteers get to pick whatever paper they want to facilitate, within the overall theme. While our focus is language modeling, we occasionally explore key papers from other domains like vision (e.g., CLIP, ViT, LCMs), audio (e.g., Whisper, Moshi), and RL (e.g., DPO, PPO).
Swyx: The more theoretical/mathematical the paper gets, the harder it is to cover. This doesn’t mean they should be avoided, but some care needs to be taken to not leave the group behind/so lost that they lose interest. Curation is the single biggest week-to-week lever for the Paper Club experience - this is effectively a “high frequency book club” and so standard how to select books for a book club advice may apply. We use Sli.do to both serve as paper idea buffer and voting mechanism, but this is often overridden by 1) volunteers covering older papers, 2) more recent, obviously important papers that “skip to the front of the line” eg. the Llama 3 papers.
Since curation is so important, it’s worth sharing some thoughts on how to quickly evaluate a paper as a candidate for a paper club (without even reading it first):
Is it likely to be relevant in 1-2 years? The total surface area of your knowledge is a function of how much you gain per unit time (say via a regular Paper Club habit) vs how much becomes irrelevant per unit time. Nobody can perfectly predict long term relevance2 but you can likely predict if you’ll still be referencing knowledge from a paper a year out (and if you can’t, it’s worth developing this estimator for your own mental health). Some reliable genres of long lived papers:
it was already relevant 1-3 years ago and is still relevant today (Lindy Effect)
paper for a notable model (everything from GPT1 to GPT4 to Llama to OLMo)
paper for a notable benchmark (that ~everyone uses, e.g. SWEBench)
paper for a notable dataset (that ~everyone uses, e.g. FineWeb)
scaling laws papers (usually has tons of great ablations)
survey paper covering all relevant papers in an impt field (e.g. Prompting)
technique Papers With Code (demonstrates seriousness - eg FlashAttention)
Does it come from reputable authors? We are against strict credentialism as a filter because good ideas can come from anywhere, but it should be uncontroversial that good institutions and authors with a good track record are simply much more likely to birth good papers than those without one. Given you have ~50 slots a year, and there are 500-5000 good authors, this means the vast majority of the papers you read should probably come from reputable authors3, over standard “grad student slop” (it is impolite to point out specific examples, but the sooner you learn to identify “grad student kino” from “grad student slop” the better). Spend your “lowbie cards” wisely.
Good Word of Mouth. Sometimes even if a paper is obvious slop, if everyone else is talking about it, it’s worth reading anyway so as to be informed enough to criticize it substantively rather than by pattern match.
Scheduling. We use Luma for event management (paid, because with our multiple clubs and size we exceeded their free tier) even though they sadly lack support for indefinitely recurring events - because it is easy to reinvite people back on their calendars. We've also used Discord events but it's only visible to folks in the LS Discord. If you have a better solution, please share! We’re not 100% happy with this setup.
Hosting. We host sessions on a paid Zoom account (thanks Swyx!) We've also experimented with Discord stages for a bit but have had persistent issues with screen sharing and viewing at >70 person scale.
Recordings. We have recently started recording the paper facilitation (via Zoom), but not the Q&A. The intent is to encourage folks to share their experience and ghost knowledge (which is sometimes based on what they do on the job). This also incentivizes live attendance and participation in the Q&A which is I think is the main benefit.
Existing Community. Our paper club started out of an existing discord that is obviously tied to Latent Space (though some of our most active members found the discord BEFORE knowing the podcast existed). The club was simply a small group of folks who wanted to study a key paper each week, and level up on language modeling in the process. Over time, we've grown a core group of facilitators (Swyx, Vibhu, the Eugenes, Amgad, Eric, RJ, etc). Including myself, this rotation lets us have each person facilitating every two months on average and not have the workload be too heavy on any one person (many of us often have travel/work commitments that sometimes conflict, but paper club lives on).
swyx: Attaching to an existing community lets the paper club “live on” in discussion forums between meetings, as Baader-Meinhof often dictates that we’ll see related tweets/papers relevant to a paper club before/after having one.
It also lets us hang out more socially as friends — note that “paper” is only 50% of what it means to be a “paper club” - don’t neglect the “club” element! To this extent, swyx’s guide on Community Building can help - specifically the Jobs To Be Done of community! We have met up IRL, helped each other get jobs and funding, and gone offtopic on any number of topics, and perhaps the hardest metric for a workplace-based community - we all expect our membership in this community to outlast our current employment.
Inviting authors. Occasionally, we leverage our network and invite authors to present their work. We've had the authors of Matryoshka Embeddings, Writing in the Margins, and TimeGPT share their work, and invited Shreya and Nathan while facilitating their papers. This gives the group a valuable opportunity to clarify their questions live, and lets the authors share valuable behind-the-scenes insights. For example, we learned from Nathan why Molmo has such a heavy emphasis on analog clocks:
Top of funnel. (swyx) This is optional, as you may not want your paper club size to grow, but a larger group can offer more diverse opinions and interests (we would never have enough Audio modeling covering if Amgad hadn’t just showed up one day). X/Twitter/Bluesky/YouTube are probably the best options, if you don’t already have, say, a leading AI Engineering podcast/newsletter to solve that for you.
How to Read Papers (In An Hour)
Reading papers. This can be challenging, but you're not alone. The three-pass approach (video), is a popular method:
First, skim to get a general idea of the paper (~10 minutes)
Then, read to understand key points but not the details (~1 hour)
Finally, understand the paper in depth and take notes (3 - 6 hours)
Most of the time, unless you're trying to replicate the paper, two-passes will suffice. Here's my walkthrough of applying the three-pass approach.
Tools. Zotero makes it convenient to save papers to your library. It also has built-in markup tools to highlight and annotate papers. And if you come across unfamiliar terms or inscrutable math equations, take a screenshot and ask Elicit or Google Scholar PDF Reader or Semantic Scholar or Claude or NotebookLM to explain!
How to facilitate a paper (even if it’s your first time)
For newbies, a basic framework can start like this (not in strict order):
Big Idea: What is the big idea for this paper? Or the main point the authors are trying to get across?
Results: What are the key findings and achievements?
Relevance: Why does this research matter? What problem does it address? i.e. How does this progress the field?
Related work: How does it improve on previous work?
Open Questions: What open-ended questions do you have after reading this piece?
Most facilitators simply walk through the PDF with their highlights and notes. Others create simple slides, including key graphs and tables to focus attention. There's no hard and fast rule—just do what suits you best as the facilitator.
Swyx: I do find that preparing slides 1) isn’t that hard, and 2) helps me organize my own thoughts better when I am presenting and 3) helps me understand the paper better when someone else is presenting.
Here is a sample 3-paper pre-reading + slide prep session done in 2 hours:
and here is the final Paper Club session that resulted.
Paper Club Variations
(this section is all swyx) Not every paper club should just be focused on one lengthy discussion about one paper. Here are some variations on the theme we’d suggest considering (we have not tried them all).
Limited run: instead of a long streak like the LSPC, you could have a set start/end date when you know everyone is available and there is a strict syllabus to go through. This can be helpful for people worried about the indefinite commitment. We recommend NOT recording these because people will often lie to themselves about wanting to watch back recordings.
Produce artefacts: recordings aren’t the only useful things that could come out of a paper club. The slide decks can often be very helpful to others, but also the members of a paper club could also collaborate on a google doc to make TLDRs and writeups for their own future use (as well as that of others). This is just a collaborative form of Learning in Public - the more you rehash things in your own words, the more you share it and get things wrong, the better you retain it.
Survey paper clubs: Not every notable paper is worth a full hours’ discussion, and yet you’d be remiss not to cover it! So, inspired by the Big Block of Cheese Day, we have a “survey paper club" format where we just do 5-10 min quick hits on papers too smol to warrant a full day.
Conference paper clubs: Conferences are great occasions to meet up IRL!
Insert not-so-subtle plug for you to JOIN OUR NEURIPS EVENT whether live or online!
Participating in a weekly paper club means covering ~50 papers a year. With just two hours per week, one hour of pre-reading and one hour of discussion, you'll spend only ~100 hours (4 days) a year. And this minimal effort can place you in the top 5% of AI engineers and the top 0.1% of the world population in terms of AI knowledge.
So what are you waiting for? Start your own paper club, or see you at the Latent Space paper club!
12pm PT since most of us are in the US West Coast - it’s also a relatively friendly time for the East Coast and Europe
Famously, Attention is All You Need didn’t even get an oral when it first came out at NeurIPS 2017
Note that in academia the “notable authors” often come at the END of the author list as they are advisors to the first authors rather than first authors themselves, however this is usually good enough as the advisors are serving as the filtering mechanism for you.