The Humans in the Loop: Why AI Won't Replace Human Coders Yet

This Week in AI for Devs: Humans Still Better at Finding Root Causes

Feb 20, 2025

Welcome to The Humans in the Loop, your executive summary of AI news for devs. I’m Andrew, the [human] author, and I’d love your feedback on this newsletter. Please feel free to email me at andrew@heavybit.com with your thoughts. Thanks!

Our top story: OpenAI’s Bold Experiment: SWE-Lancer

OpenAI’s research teams have conducted an eye-opening experiment. What happens when you:

1.) Take 1,000+ actual, for-pay software engineering tasks listed on the freelance website Upwork, valued at $1M worth of labor (including spot coding jobs and full-on engineering management work) and…
2.) Put them through three of the top foundation models on the market (GPT-4o, o1, and Anthropic Claude 3.5 Sonnet)?

It turns out, you get none of the models earning the full $1M. According to the full research, “agents excel at localizing, but fail to root cause, resulting in partial or flawed solutions.” In other words, top-of-the-line LLMs can quickly identify and address superficial issues but can’t understand how or why the root causes may affect multiple components within a project. However, the models did exhibit significantly higher success—upwards of 3x to 4x higher pass rates—on jobs tagged as managerial versus those tagged for individual contributors. A handful of recent articles also suggest that AI may have other downsides when used in engineering tasks, such as biasing tool usage or quietly undermining the fundamentals for junior devs, though it should be noted that these other editorials are anecdotal rather than data-driven. In any case, it’s clearer than ever that the story of AI in software development isn’t just a simple, across-the-board force multiplier—and that there are still many areas where there’s no substitute for an experienced engineer.

And now: Everything else busy devs need to know in AI news this week.

💻 Development

META Announces First LlamaCon - April 29, 2025
META’s first dev conference for its Llama model family will “share the latest on open-source AI developments to help developers do what they do best.”

[GH] CursorCore - Conversational Programming Framework
This open-source programming assistant attempts to integrate your coding history with your current project’s code.

Free Course: Evaluating AI Agents
DeepLearning’s latest course covers adding observability and eval best practices to running agentic workflows.

🤔 Interesting AI Projects, Research, and Updates

“Brain-Inspired” Computing Just Needs a Killer App
Neuromorphic engineering—running human brain-like neural networks—is ready for primetime. What’s missing is “some demonstration of a killer app.”

GOOG’s Gemini 2.0-Powered “Co-Scientist”
GOOG’s new multi-agent system, powered by Gemini 2.0, is an AI research collaborator that quickly poses and validates hypotheses.

Build DeepSeek R1 for $30, Train Models for $50
Researcher Jiayi Pan reproduces a DeepSeek-like model for peanuts, while other teams use distillation—large models “teaching” smaller models—to train models on the cheap.

💼 Hiring and Community

Startups Hiring This Week:
- Founding Engineer → Baton AI
- ML Scientist I → Dyno Therapeutics

Mid-Markets Hiring This Week:
- Full-Stack Dev, AI/ML Integration → Pure Integration
- AI Decisioning Technical Architect → Hightouch

Enterprises Hiring This Week:
- Director, Driving Assistance and ML → Lucid Motors
- ML Engineer II → CHWY

💡 Spotlight: Newly-Launched AI Startups
- Gomboc AI: $13M from Ballistic Ventures for AI cloud security remediation
- Sawmills: $10M from Team8 for AI-powered telemetry optimization
- Singulr AI: $10M from Nexus Venture Partners for AI governance and security
- Safe Intelligence: $5.25M from Amadeus Capital for deep validation AI systems
- OneTab: $3.3M from SOSV for SDLC agents
- CO/AI: $1.8M from Sequoia Scout Fund for AI tool discovery

🏭 Industry: M&A, Launches, Trends

NVDA Has Been 2X-ing Compute Power Every 10 Months
The chip leader hasn’t just been resting on its laurels. Ongoing releases and sales effectively doubled extant compute power every 10 months.

Arm Gets Into the Chip Manufacturing Business
The chip designer, which has been content to create blueprints for decades, is now looking to compete with NVDA in the manufacturing arena.

New Grok Model Released, OpenAI Refuses $97B Takeover Bid
xAI’s new model is showing strong performance, but its CEO’s $97.4B offer for OpenAI gets turned down.

Former OpenAI CTO’s New Startup Unveiled: Thinking Machines
The website for Mira Murati’s new startup reveals an impressive roster of OpenAI / Meta / Mistral / DeepMind talent and a vague mission statement about improving AI’s state of play.

RIP: Humane’s AI Wearable Pin
Humane, the startup whose wearable AI pin made headlines last year, has had its assets acquired by HP and its AI pin decommissioned.

⚖️ Copyright, IP, Licensing, and Regulation

Paris AI Summit: What Did It Mean?
Though the event hosted Macron, US VP Vance, and a host of AI VIPs, the summit ended inconclusively on AI regulation and safety measures.

First US District Court Case on AI Infringement Finds Against AI
The first case in a US district court—the Federal courts that often set precedents—finds against ROSS Intelligence on copyright infringement.

OpenEuroLLM to Develop LLMs for All EU Languages
The $39M digital sovereignty project calls for LLMs in all EU languages. A day later, Mistral drops a new model for the Middle East and South Asia.

France’s Answer to Stargate: $112B Investment and 1GW of Nuclear
Perhaps in response to the $500B Stargate program, France has pledged to invest $112B in AI infrastructure and devote a gigawatt of nuclear power.

About the Author

Hi there. My name is Andrew, and I work at Heavybit, the leading VC for developer-first startups. As the Editorial Lead, my goal is to find the most valuable and important AI news for developers and founders. The idea is to curate and bottom-line emerging trends from our perspective in 10+ years of coaching and funding developer-facing companies. Email me and let me know what you think.

That’s all for this edition. In the meantime, please feel free to follow The Humans in the Loop on BlueSky, Twitter and LinkedIn.

The Humans in the Loop

The Humans in the Loop: Why AI Won't Replace Human Coders Yet

This Week in AI for Devs: Humans Still Better at Finding Root Causes

Discussion about this post