The Humans in the Loop: Dollars and Cents
This Week in AI for Devs: Inference Doesn't Grow on Trees
Welcome to The Humans in the Loop, your executive summary of AI news for devs. I’m Andrew, the [human] author, and I’d love your feedback on this newsletter. Please feel free to email me at andrew@heavybit.com with your thoughts. Thanks!
Our top story: The Inference Cost Problem
It’s 2026, and orgs are running AI jobs regularly, which means inference bills are becoming a regular part of operations. But inference is anything but “regular.” It’s spiky and intermittent due to variations in demand, token lengths, and levels of complexity across models that can lead to bottlenecks like KV caches. The most obvious gating factor is pricey silicon chips, with NVDA holding a powerful market advantage over AI margins by selling the chips at whatever prices they like, after which cloud providers charge “rent” to model vendors who then sell plans to end-users. But researcher Mathias Lechner argues there’s more to inference economics than silicon, as users prioritize fast performance, which is at odds with the latency-inducing gating and queuing techniques commonly associated with large-batch GPU compute jobs. (Though GitHub engineer Sean Goedecke notes there are short-term technical “tricks” for faster inference that Anthropic and OpenAI use to enable “fast mode" for their newest models.) And the researchers at Epoch AI contend that inference will, as whole, decrease in cost over time. So at the moment, NVDA is arguably the sole owner of the foundational base of the inference stack, from which everyone else rents compute and inference up the stack. But given the many pretenders to that throne, and the many other players on the hunt to improve their margins, the race to making inference cheaper and faster is going to get much hotter.
Next Up: The AI news and trends that busy technical teams need to know.
💡 Spotlight: Newly Launched AI Startups
- Ineffable Intelligence: $1B from Sequoia for RL
- Entire: $60M from Felicis for agentic code management
- Nullify: $12.5M from SYN for AI-powered product sec
- Supper: $11M from Union Square Ventures for “AI-native data platform”
- Duckbill: $7.75M from Heavybit for predictable AI & cloud spend
- Fibr AI: $5.7M from Accel for agentic website customization
- Qontext: $2.7M from HV Capital for “an independent context layer for AI”
🚀 New Models & Products Spotlight
- Claude Sonnet 4.6: Improved dev workflows for enterprise
- GPT-5.3-Codex-Spark: OpenAI’s first non-NVDA model launch
- Seed2.0: ByteDance’s new general purpose model family
- MiniMax M2.5: OSS model scoring 80.2% on SWE-Bench Verified
- Qwen3.5: The first open-weight models in BABA’s family
- microgpt: Andrej Karpathy’s minimalist 200-lines-of-code LLM
💻 Development
DORA on AI Capabilities
DORA’s latest: Getting the most out of AI means having a clear AI stance; work in small batches; healthy data ecosystems; user focus; AI-accessible data; quality internal platforms; and strong version control.
[GH] ClawSec
Interested in trying MoltBot/OpenClaw but concerned about security issues? This repo catalogs the skills to secure your lobster-related projects.
Cognition: Using Devin to Build Devin
Nader Dabit explains how the creator of the first “AI engineer” uses its own tools to build.
🤔 Interesting AI Projects, Research, and Updates
Research: LLM Reasoning Failures
This comprehensive study catalogs the main breakdowns of LLM reasoning, pertaining to LLM architecture, app-specific issues, and performance.
Reverse-Engineering GPT-5’s Tokenizer
Metehan Yeşilyurt cracks open tiktoken to reveal how the GPT-5 family processes queries (and how the processing affects AEO/GEO).
💼 Hiring and Community
Startups Hiring This Week:
- Applied AI Engineer → MLabs
- AI Engineer → iVoyant
Mid-Markets Hiring This Week:
- Computer Vision Engineer → RandomTrees
- Compute Server Platform Architect → Cerebras
Enterprises Hiring This Week:
- AI Engineer → Shipt
- Principal SWE, Full Stack → NVDA
OpenClaw Creator & META’s Partnerships Lead Join OpenAI
OpenClaw goes independent foundation as Peter Steinberger joins the ChatGPT vendor along with META’s celebrity whisperer Charles Porch.
🏭 Industry News & Trends
For Driving 4% of All GH Repos, Anthropic Gets $30B Funding
The Claude vendor claims that Claude Code now submits a growing percentage of all new repos (and also claims a $380B valuation).
Mistral Acquires Koyeb
The serverless inference startup joins the French frontier lab to build out European AI infrastructure.
META Expands NVDA Deal “to Use Millions of Chips”
META plans to use a lot of NVDA silicon as part of its $135B AI budget this year.
⚖️ Copyright, IP, Licensing, Regulation, and Safety
2026 International AI Safety Report
2026’s emerging/increasing AI risks including willfully malicious AI use, technical failures, and potential systemic problems.
Spain Opens Investigation on Alleged AI Child Abuse Material
The Spanish government calls for investigation of objectionable AI-generated material on social media.
MPAA Lashes Out at ByteDance
Alleges that the TikTok creator’s new video model Seedance 2.0 “engaged in unauthorized use of U.S. copyrighted works on a massive scale.”

