[AINews] Midjourney Medical: scan your organs like you step on a scale
The only bootstrapped frontier lab announces its second product and second business venture.
It’s a tough choice whether or not the buzzy Midjourney Medical launch today counts as AINews. Yes, Midjourney is one of the most significant and unique AI labs in the world. No, as David Holz was quick to point out, there’s not even any AI immediately present in the Scanner or Spa. But yes, ultrasound CT imaging obviously needs heavy AI assistance, and unlocks massive new applications downstream as BioHub’s CryoEMs did for ESM. And no, as Hacker News is quick to point out, there are a lot of unsolved questions as to how ready or useful this scanner really is.
And yes, that was me in the livestream, which we are transcribing to save you 2 hours.
Overall the vibe was electric and inspiring; I sat next to Robert Scoble, who was there at the original iPhone and Tesla (and Google Glass) launches, who agreed that this launch was comparable in ambition; and next to Tanishq Abraham, who, by sheer coincidence, just so happened to tweet out this Nature paper on ultrasound CT:
On to the facts you must know.
Facts / Announcements
Midjourney announced a medical imaging project, calling it the Midjourney Scanner.
The device is described as a full-body ultrasonic CT / full-body ultrasound system.
David Holz framed it as the “first new whole-body medical imaging modality in 50 years.”
The scanner uses ultrasound rather than MRI, X-ray, or CT radiation.
The system involves:
8,960 transducers per chip/system
40 systems arranged in a ring
358,000 ultrasonic elements total
A 70 cm diameter ring
Waves traveling through water at about 1,481 m/s
Data capture around 17 GB/s
Around 40 GB of data per body slice
Reconstruction using 21 servers
Claimed 2 PFLOPS compute
Claimed 806 TB raw data
Lift movement at 4 cm/s
Goal of several hundred slices in 60 seconds
Claimed resolution of internal tissue details down to about 0.5 mm
Current imagery shown included:
Real body slices
Comparisons with MRI, especially thigh/muscle boundary imagery
Ultrasonic phantom images
Segmentations of organs and biological structures
The current system is Gen 1 / prototype, not the finished consumer product.
Midjourney says it is not yet using AI for the shown images.
The team size working on the device is about nine people.
About a dozen people have been scanned so far.
Current scans can take around 20 minutes, because the system is still bottlenecked by bandwidth, algorithms, DSP, and prototype data-transfer infrastructure.
Midjourney also announced the Midjourney Spa:
First location: around Union Square, San Francisco
About 25,000 sq ft
Four floors
Hot tubs, saunas, cold plunges, gym, and other spa amenities
Around 9–10 scanners
Lease signed and designs underway
Designed by architects associated with major spa projects such as Blue Lagoon
Target opening: end of 2027
Midjourney says it is self-funded, has no investors, and can fund the first spa itself.
The company says it has started discussions with the FDA.
Initial regulatory/commercial path is likely around body composition, because that is considered easier.
Jobs and more info are expected at midjourney.com/medical.
Projections / Goals / Claims About the Future
Midjourney’s stated long-term goal is a fleet of 50,000 scanners.
Claimed goal: enable up to 1 billion scans per month, enough to bring full-body imaging to everyone.
Holz suggested that fewer than a dozen such machines, operating at full speed, could perform more full-body scans than all MRI machines on Earth combined.
The company expects:
Gen 2 scanner by the end of 2026
Gen 3 scanner with custom silicon later
Future systems to become much more advanced through custom chips, AI, physics simulations, and better compute infrastructure
Holz projected that the scanner could eventually support:
Frequent personal health tracking
Daily/weekly/monthly body scans
Preventive medicine
Detection of “weird” changes in the body
Body composition tracking
Doctor-facing review
AI-assisted first-pass analysis
Potentially thousands of diagnoses
Eventually, some therapeutic uses
He speculated that preventive imaging could reduce healthcare costs substantially, possibly by catching disease earlier.
He suggested the scanner could become hundreds or thousands of times cheaper than MRI on a per-scan basis, because the machine is cheaper and faster.
He said the marginal cost of a scan could be effectively zero, though the actual business model will involve spa/facility economics.
Possible pricing models mentioned:
Spa memberships
Walk-in scans
Scan-only pricing
Spa-only pricing
Some broader pricing matrix
The first spa is intended as a learning lab for usage patterns:
Do people want full spa + scan?
Gym + scan?
Quick scan and leave?
Daily, weekly, monthly, or annual scanning?
Holz estimated scaling to thousands of spas could require around $20B in upfront capex.
He speculated the facilities might pay themselves back quickly, even mentioning six months, but explicitly caveated uncertainty.
Therapeutic uses were described as long-term, not day-one:
Tendon/muscle healing
Focused ultrasound
Potential incisionless procedures
Possibly cancer tissue destruction at a distance, but explicitly not near-term.
Opinions / Vision / Framing
Holz framed Midjourney as a community-supported research lab, not a normal VC-backed startup.
He repeatedly emphasized that Midjourney’s image-generation revenue gives it freedom to fund ambitious R&D.
The scanner was presented as part of a broader mission around positive human futures, not just creativity tools.
He described the desired experience as:
“As powerful as an MRI”
“As casual as a trip to the spa”
He does not want scanning to feel like a doctor’s office.
He wants the spa to be desirable even without the scanner.
He personally wants frequent health feedback so everyday diet/exercise choices become measurably visible.
He sees the scanner as a possible new pillar of AI-enabled healthcare: AI needs fast, rich, cheap bodily data.
He argued that the future is not only about AI models but about new infrastructure that lets AI reason over the physical body.
He sees ultrasound as the right modality because it can be fast, safe, dense, and data-rich.
He appears especially excited about longitudinal, high-frequency, sub-millimeter differential tracking: not just “one scan,” but changes over time.
He positioned this as “day one of MRI” for full-body ultrasonic CT: early images may look rough, but the modality could improve dramatically.
Reasons / Rationale
Why ultrasound instead of MRI?
MRI is hard to make both fast and high-quality.
MRI scans are unpleasant: tubes, long sessions, loud sounds.
Ultrasound can push more energy through the body safely.
Ultrasound has no ionizing radiation.
Ultrasound can be repeated often.
Ultrasound is already widely used medically, making some regulatory paths easier.
Why water immersion?
Sound travels through water much faster and more effectively than through air.
Water coupling enables whole-body ultrasound propagation.
The design requires the user to get wet, hence the spa concept.
Why vertical up/down scanner design?
Optimized for throughput.
Easier than having users lie in and climb out of tubs.
“Down and up” scanning supports faster repeated use.
Why build a spa first?
To learn real-world operations.
To test throughput.
To understand consumer behavior and willingness to use scans.
To refine pricing/business model.
To gather data.
To create a replicable template before scaling globally.
Why Midjourney can attempt this?
Existing image business generates revenue.
No investors means fewer constraints.
Midjourney already has compute infrastructure.
The company has skills across AI, imaging, sensors, visualization, and systems engineering.
Holz has prior hardware experience from Leap Motion.
Why not launch as pure medical device immediately?
FDA/regulatory path is complex.
Some use cases are easier than others.
Body composition is an easier initial wedge.
Diagnostic and therapeutic claims require staged approval.
Why cloud processing?
Raw scanner data is enormous.
On-site compute can handle streaming/compression.
Midjourney’s large server clusters can process heavy reconstruction workloads.
They expect to use secure/private cloud workflows.
Criticisms / Risks / Open Questions
Regulatory ambiguity
Holz repeatedly avoided precise FDA claims.
He said body composition is on a good path, but diagnostics are not yet cleared.
The path from body composition to “thousands of diagnoses” is extremely uncertain.
Insurance billing, CPT codes, clinical adoption, and FDA classification remain open.
Medical validity not yet proven
The transcript presents impressive engineering claims, but not clinical validation.
No sensitivity/specificity numbers were given.
No disease-detection benchmarks were presented.
No peer-reviewed evidence was mentioned in the transcript.
“Can see weird things” is not yet the same as clinically actionable diagnosis.
Comparison to MRI is partly apples-to-oranges
Ultrasound and MRI measure different physical properties.
Holz acknowledged MRI is still better in some ways.
Current ultrasound images are not yet broadly better than MRI.
The thigh comparison may show areas where USCT is better, but it is explicitly described as both “fair and unfair.”
Cost claims are speculative
“Effectively zero” marginal scan cost excludes facility, staffing, regulatory, radiologist/doctor review, liability, cleaning, membership ops, and real estate.
Six-month payback was explicitly speculative.
$20B capex to scale is a huge financing and execution challenge.
Throughput claims depend on future systems
Current scans take around 20 minutes.
The 60-second / high-throughput target depends on improvements in bandwidth, algorithms, DSP, and hardware.
Gen 1 is prototype-grade, not industrial-grade.
Data/privacy concerns
Scans generate very sensitive full-body health data.
Data likely goes to Midjourney cloud clusters after compression.
Holz said it would be secure/private, but details were not provided.
Health data governance, consent, storage, access, deletion, and medical liability were not deeply addressed.
False positives / overdiagnosis
Frequent full-body scanning could identify many ambiguous abnormalities.
This may create anxiety, unnecessary followups, incidentalomas, and downstream costs.
Holz acknowledged “flagging weird things” is not casual and could have downsides.
Clinical workflow still unclear
Who reads the scans?
What does the user get back?
What goes to doctors?
What is legally considered diagnosis vs wellness/body composition?
How are urgent findings handled?
Spa-medical hybrid creates operational complexity
Medical device + wet spa + high-throughput consumer facility is a weird stack.
Cleaning, infection control, accessibility, privacy, emergency protocols, staffing, and medical oversight are all nontrivial.
Therapeutics are much further out
Focused ultrasound surgery/cancer destruction was mentioned as technically possible but not near-term.
Holz explicitly said imaging is the low-hanging fruit and therapeutics are scary/regulatory-heavy.
Brand coherence risk
Midjourney is known for image generation; scanner/spa/medical infra is a major category jump.
Holz acknowledged the company may be “confusing for the next six months” as it announces more projects.
The “so what”
Near-term reality: Midjourney has built a real prototype full-body ultrasound CT scanner and is opening a San Francisco spa-like facility as the first deployment/testbed.
Medium-term bet: frequent, cheap, pleasant body imaging becomes a new consumer-health behavior.
Long-term moonshot: Midjourney wants to build global medical imaging infrastructure, potentially making full-body scans routine and AI-analyzable.
Main skepticism: the engineering demo is exciting, but the clinical/regulatory/economic case is still mostly unproven. The gap between “cool full-body images” and “safe, reimbursable, diagnostic healthcare product” is the whole ballgame.
AI News for 6/16/2026-6/17/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter Recap
Top Story: Midjourney Medical
What happened
Midjourney unveiled a medical imaging/scanning system and then published a technical dive on it, triggering a mix of fascination, skepticism, and broader discussion about AI labs moving into hardware/medical devices.
Midjourney’s official account posted “A technical dive inside our new ‘Midjourney Scanner’” in the main announcement tweet, which appears to be the core launch artifact for the project @midjourney.
The launch was preceded or paralleled by discussion of a scanner whose tradeoffs were summarized as: radiation-free, magnet-free, fast, and low-cost, but requiring the person to sit in a water immersion tank and currently having coarser resolution than CT/MRI @iScienceLuvr.
A demo appears to have been available in person: one attendee said, “I put my hand in the @midjourney demo scanner tonight”, framing it as a tangible prototype rather than a purely conceptual announcement @saranormous.
The announcement generated strong enthusiasm from supporters who viewed it as evidence of unusually ambitious product direction from Midjourney, including comments like “this is so amazing” and “let inventors like @DavidSHolz invent” @saranormous.
Others interpreted the launch competitively against more incremental AI hardware efforts; one reaction contrasted it with “boring lapel camera” bets and argued other AI labs should “slap yourself” if Midjourney is building this kind of thing @matvelloso.
There was also lightweight technical commentary from people interested in imaging methods, including speculation about detector/emitter arrangements and real-time variants @johnowhitaker, plus teasing that some users seemed unusually prepared for the launch topic @johnowhitaker.
Facts vs opinions
Factual claims explicitly present in the tweet set
Midjourney published a technical dive into a product called the “Midjourney Scanner” @midjourney.
The scanner was described as:
Radiation-free
Magnet-free
Fast
Low-cost
Requiring a water immersion tank
Having coarser resolution than CT/MRI @iScienceLuvr
A person physically tried a demo scanner with their hand @saranormous.
Interpretations/opinions/speculation
Strongly positive reactions framed the scanner as visionary or “the future” @saranormous.
Some observers took the launch as evidence that Midjourney is pursuing a more ambitious hardware roadmap than competing AI labs @matvelloso.
One humorous reply escalated the idea into “next up is full cargo transport by midjourney,” clearly not a factual claim @yacinelearning.
Independent technical commentary suggested possible future design directions, such as distributed scattered detectors and emitters or real-time systems, but these were not presented as features of Midjourney’s current scanner @johnowhitaker.
Technical details and inferred modality
The tweet corpus contains only a limited number of hard specs, but they are enough to outline the project’s positioning.
No ionizing radiation: “Radiation-free” implies the system is not using X-rays/CT-style ionizing modalities @iScienceLuvr.
No magnets: “Magnet-free” differentiates it from MRI, which relies on strong magnetic fields @iScienceLuvr.
Water immersion tank: This is a major clue about the physical sensing setup. Water coupling is common in some acoustic and wave-propagation imaging systems because it improves transmission and coupling between emitters, tissue, and detectors @iScienceLuvr.
Resolution below CT/MRI: The system is not being claimed, in these tweets, to outperform incumbent clinical imaging on resolution; in fact, an explicit limitation is that resolution is coarser than CT/MRI @iScienceLuvr.
Speed/cost positioning: It is framed as fast and low-cost, suggesting the value proposition is likely accessibility, throughput, or portability rather than top-end image fidelity @iScienceLuvr.
There is also technically informed reaction about the likely sensing challenges:
John Whitaker notes that systems based on light, ultrasound, electric current, etc. have a harder inverse problem than X-rays because signals do not travel in straight lines in the same way, making reconstruction more complex @johnowhitaker.
He also suggests a future version with many scattered detectors and emitters rather than mechanically moving components, indicating that at least some readers infer the current system may involve motion/scanning geometry rather than fully parallelized capture @johnowhitaker.
Taken together, the public discussion points toward a non-CT, non-MRI modality with wave-based reconstruction and meaningful algorithmic/inverse-problem content, though the tweets here do not provide definitive modality labeling or performance tables beyond the stated tradeoffs.
Different perspectives
Supportive / optimistic
The most enthusiastic camp sees this as exactly the kind of high-upside, weird, non-consensus invention AI founders should pursue, not just incremental chatbot/UI products. That tone is clear in “let inventors like @DavidSHolz invent” @saranormous.
In-person demo reactions emphasized the visceral novelty of interacting with a real scanner, not just reading a paper or watching a video @saranormous.
Some interpreted the move as a sign that Midjourney may be thinking beyond image generation and toward full-stack applied invention, possibly combining hardware, sensing, and AI reconstruction.
Neutral / technical-curious
The most grounded reaction in the set is the concise pros/cons summary: radiation-free, magnet-free, fast, low-cost versus water immersion and lower resolution than CT/MRI @iScienceLuvr.
Technically curious observers liked the strangeness of the modality while immediately identifying the physical and systems tradeoffs:
Non-straight-line propagation compared with X-rays
Need for better real-time capture arrangements
Questions about detector/emitter topology @johnowhitaker
Opposing / skeptical / cautionary
Direct hostile criticism is limited in this tweet set, but skepticism is implicit in several points:
Clinical utility skepticism: saying it has coarser resolution than CT/MRI is a substantive caveat, especially in medicine where image quality can directly affect diagnostic value @iScienceLuvr.
Practicality skepticism: requiring a water immersion tank is a serious ergonomic and deployment constraint for routine clinical or consumer use @iScienceLuvr.
Modality skepticism: technical comments about non-straight-line propagation hint at the usual challenge for alternative imaging systems: the physics and inverse reconstruction are hard, and the pretty demo may not automatically translate into robust, clinically reliable imaging @johnowhitaker.
Competitive framing
One notable perspective was less about the scanner itself and more about what it says strategically: if Midjourney is attempting hardware-medical invention, then AI companies pursuing narrower wearable-camera concepts look conservative by comparison @matvelloso.
Context: why this matters
Midjourney is primarily known as an image-generation company. That makes a medical/scanner reveal noteworthy for several reasons:
It suggests a willingness to move from generative media software into real-world sensing and hardware.
Medical imaging is a domain where inverse problems, signal processing, reconstruction, and increasingly ML-based interpretation all matter; it is not an obvious adjacency, but it is a technically deep one.
The scanner appears to be positioned not as “better than MRI/CT on all axes,” but as a potential entrant in the classic disruption lane: worse on a premium metric, better on cost/accessibility/operational burden.
If the system is genuinely fast and low-cost, the most plausible implications are in:
screening or triage,
settings where CT/MRI access is limited,
repeat imaging where avoiding radiation matters,
specialized anatomical use-cases where immersion-based setups are acceptable.
The launch also fits a broader 2025 pattern where AI-adjacent companies increasingly try to define themselves not just as model vendors, but as builders of new interfaces to the physical world. In that framing, Midjourney Medical is less about a single scanner and more about whether frontier AI-era startups can productize difficult sensing systems, not just generate content.
Implications and open questions
Regulatory path: nothing in these tweets addresses approvals, validation studies, or whether this is research-only versus intended for clinical deployment. For medical relevance, those questions are central.
Reconstruction stack: the phrase “technical dive” implies the company has discussed internals, but the tweet set here does not expose the actual algorithmic details. The likely crux is reconstruction quality under a constrained sensing setup.
Use-case specificity: lower resolution than CT/MRI does not necessarily doom the system; many imaging tools win by being good enough for a narrow workflow. But no specific target indication appears in these tweets.
Form factor challenge: a water immersion tank is acceptable for some scanning contexts and a major barrier for others. Whether this is a prototype artifact or a fundamental requirement matters.
Throughput and cost realism: “fast” and “low-cost” are meaningful only relative to benchmarks—scan time, hardware cost, consumables, operator burden, and downstream interpretation overhead. Those numbers are not provided in the tweets here.
AI’s role: the most interesting technical question may be whether Midjourney’s contribution is primarily in hardware design, inverse-problem reconstruction, learned denoising/super-resolution, automated interpretation, or an integrated stack spanning all of these. The social reaction suggests people are projecting a lot onto the project because Midjourney’s brand is associated with learned visual systems rather than classical medical devices
AI research, agents, and open models
A notable research meta-point: Chinese open-source literature over the last year was highlighted as unusually high-ROI to follow, with the claim that the “alpha is insanely huge” @himanshustwts.
PapersWithCode’s top trending paper was VibeThinker-3B, described as a 3B parameter model exploring verifiable reasoning in small LMs and allegedly landing in the performance tier of DeepSeek V3.2, GLM-5, and Gemini 3 Pro @NielsRogge.
A computer-use paper, PreAct, was praised for compiling successful agent runs into a guarded replayable state machine, eliminating per-step LM calls on repeats and yielding 8.5x to 13x faster replay @dair_ai.
Another RL/agent paper proposed LLM-as-Environment-Engineer, where the policy uses its own failures to redesign the next training environment; the associated benchmark is MAPF-FrozenLake @dair_ai.
Omar Sar0 argued coding agents need verifiers and robust guardrails, not blind autonomous loops, reinforcing a trend toward constrained agentic execution @omarsar0.
David Khourshid’s coding-agent take was more operational: AI-generated code still has to be read, and not reading it simply defers the debugging burden @DavidKPiano.
On RL theory, John Schulman said PPO’s resurgence in the LLM era comes from effects not anticipated in the original paper, including the importance-ratio objective correcting biases from numeric error, async training, and forward-pass noise, while clipping alters entropy via a mechanism only later understood; he cites DAPO @johnschulman2.
Relatedly, Chris Wolfe said recent post-GRPO analysis papers (e.g. DAPO, Dr. GRPO, GSPO, TIS) are exactly the kind of objective-analysis work he hopes to see for PPO in reasoning/agent contexts @cwolferesearch.
John Carmack posted a detailed critique of Temporal Differences for visual representation learning, summarizing the method: train a frame encoder and a “motion encoder” on RGB frame differences so latent(frame1) + delta ≈ latent(frame2), with a 0.25 second stride; he questioned the DINO EMA anti-collapse choice and the soundness of the delta construction @ID_AA_Carmack.
AI infrastructure, inference, and product rollouts
Xenova released a demo and kernels from the now-shut-down Fable 5 effort, claiming it had pushed Gemma 4 to 255 tok/s on WebGPU; the framing is that agentic kernel optimization could materially improve browser/on-device inference @xenovacom.
Fal announced Kling 3.0 Turbo and O3 upgrades:
faster generation
lower costs
better lip-sync
more stable motion
stronger prompt/reference consistency in “Omni”
up to 15s clips
full 4K generation with Omni
improved storyboard and multishot workflows @fal
Kling’s own account amplified the Fal rollout as a creator-facing quality/speed improvement @Kling_ai.
GitHub Copilot’s Auto mode now uses a custom routing model to choose among models based on reasoning depth, code complexity, debugging difficulty, and tool orchestration needs; a blog post and a linked research paper were shared @pierceboggan, @pierceboggan.
Kimi Code Web appears to be back online, per a brief ecosystem note @bigeagle_xd.
Grok image generation projects were mentioned via grok.com/imagine, but with no substantive technical detail @chaitu.
Talent, labs, and competitive dynamics
The biggest personnel story outside Midjourney: Noam Shazeer announced he is joining OpenAI, leaving Google after saying it was a difficult decision and praising his former team @NoamShazeer.
Sam Altman celebrated the move, saying Noam was one of the people he had most wanted to work with since OpenAI’s beginning @sama, then joked about OpenAI being SOTA “in noams” @sama.
Commentary emphasized Shazeer’s significance as co-author of Transformer, T5, and Switch Transformer and pioneer of sparse MoE systems, with some calling it the most important AI talent move of the year @scaling01.
Aidan Clark signaled excitement about working with Noam and linked it to a sense that RSI is getting closer @aidan_clark.
A broader industry reading from replies:
DeepMind/Brain merger may have indirectly benefited Anthropic/OpenAI @arohan
Anthropic got Karpathy while OpenAI got Noam @TheTuringPost
speculation that the move says as much about Google disappointment as OpenAI pull @teortaxesTex
There was also chatter about relative power/valuation: Liam Fedus posted “Breaking: OpenAI overtakes Anthropic’s valuation” @LiamFedus.
More opinionated geopolitical/competitive takes argued that various actors have incentives to prevent Anthropic from maintaining too large a lead, though these were clearly speculative rather than factual reporting @teortaxesTex, @teortaxesTex.
Adoption, usage, and model quality discourse
Blanche Minerva offered a practical quality complaint: ChatGPT and Claude can disagree on something as concrete as the overlap in citations between two papers, underscoring persistent reliability issues in applied knowledge tasks @BlancheMinerva.
Several posts focused on GLM and Chinese model progress:
praise for the GLM team as “heroic” @teortaxesTex
follow-up saying the latest generation reached something like Opus-level expectations beyond prior assumptions @teortaxesTex
speculation that future frontier capability gains may hinge more on RL recipes than pure pretraining scale @teortaxesTex, @teortaxesTex
There was also a cluster of highly speculative posts about “Claude” identity/persona salience appearing in outputs, framed as memetic or steganographic behavior rather than established fact @teortaxesTex, @teortaxesTex, @teortaxesTex.
Broader tech and society
A Tacit Labs join announcement framed biology as the next place where AI should uncover genuinely new knowledge rather than just recombine existing understanding @maxisawesome538.
There was a joke about the White House demanding a solution to the halting problem, a reminder that AI-policy discourse still often compresses deep CS impossibilities into simple-sounding asks @the_engi_nerd.
In autonomy, one post noted the apparent lack of fresh AV startup activity despite Waymo/Tesla making the category seem increasingly feasible @gabriberton.
Miscellaneous opinion posts on learning, coding, and contribution included:
you can contribute to AI without deep formal math background @gabriberton
a token-understanding/generation interview question about whether a model can understand a token it cannot generate @gabriberton
a joke that a Slack alternative could be built with “half a day of vibe coding” @gabriberton
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. GLM-5.2 Open-Weights Frontier Benchmarks
GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available (Activity: 1569): The image is a technical benchmark bar chart for Terminal-Bench 2.1 showing GLM-5.2 scoring
81.0, making it the first open-weights model in the chart to clear the dashed80%threshold, though closed models Claude Opus 4.8 (85.0) and GPT-5.5 (84.0) remain ahead overall (image). The post frames this as GLM-5.2 beating other open models and even Gemini 3.1 Pro, but a commenter notes Terminal-Bench2.1is an “easier” revision of Terminal-Bench 2 with relaxed timeouts/rules, so cross-version score comparisons may be inflated. Comments debate whether “open weights” meaningfully implies “local” usability: one user argues “if you can download it, it’s a local model,” while another says it is still impossible to run locally for99%of users due to hardware requirements.A commenter argues that Terminal-Bench 2.1 is not directly comparable to Terminal-Bench 2, claiming 2.1 is an easier revision with changed timeouts, relaxed problem rules, and broader harness compatibility. They note that models generally should not score lower on 2.1 than 2, and suggest the more meaningful signal will be initial Terminal-Bench 3 scores before labs start optimizing against the benchmark.
There is a technical deployment debate around whether GLM-5.2 should be considered a “local model.” One side argues that “if you can download it, it’s a local model” because unlike Claude or ChatGPT the weights can be run by users, while another points out that the model is effectively impossible to run locally for
99%of users due to hardware/performance constraints such as very low tokens-per-second on consumer systems.
Keep reading with a 7-day free trial
Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.





