The Humans in the Loop: The Problem(s) With Agents

This Week in AI for Devs: These Growing Pains Sure Are Painful

Jul 24, 2025

Welcome to The Humans in the Loop, your executive summary of AI news for devs. I’m Andrew, the [human] author, and I’d love your feedback on this newsletter. Please feel free to email me at andrew@heavybit.com with your thoughts. Thanks!

Our top story: The Agent Did What?

By now, you’ve heard how a hacker hijacked AMZN’s agentic GenAI assistant Q to wipe end users’ computers. You’ve probably also heard about how Replit’s vibe-coding agent deleted the production database of SaaStr’s Jason Lemkin this weekend. While there are some silver linings (Replit rolled out split production DBs the following Monday), there are still larger questions about getting agentic projects to work in the long run. AI engineer Utkarsh Kanwat suggests agents have fundamental challenges when faced with multi-step workflows, and can incur spiraling token costs. Neuroscientist and AI leader Roshan Nanu points out that agents have a fundamental, cart-before-the-horse problem of requiring a high-quality dataset and a robust pipeline to even have a shot at consistent performance. And Box CEO Aaron Levie suggests that aside from reliability, there are still huge obstacles keeping agents from crossing the chasm. Despite all the hype for agents and all the new resources every week (which we’ll cover below), they still have a ways to go, which could be why the war for AI talent is only getting hotter. More on that below.

Up Next: The bottom line in AI news for busy devs.

💻 Development

[GH] RunAgent: Write Agents in Python, Run in Any Other Language
This handy project purports to let you code up agents in Python but run them using a multi-language SDK.

Compressing 2 Years of Coding a Web App into 120 Hours
IT leader Ryan Gravette shares quick tips and learnings from his experience on architecting, markdown files, data management, and tool selection.

What If AI, But Command Line?
Product lead and researcher Ryan Stortz discusses what CLI could mean for AI programs going forward.

If John Henry From the American Tall Tale Was a Coder, Would He Win?
OpenAI alum Przemysław Dębiak narrowly defeats ChatGPT in a coding competition. If humans can’t keep “beating” AI in coding, maybe they need to join it?

🤔 Interesting AI Projects, Research, and Updates

Comparing the Architecture of Major LLMs
Researcher Sebastian Raschka: GPT, LLaMa, and other major models have evolved to incorporate efficiency-boosting features like caching and attention management.

Context Engineering for Agents
Manus co-creator Yichao Ji discusses the importance of juggling memory, context, and attention in improving agentic performance.

AAPL Talks Latest Learnings About Its Newest Models at WWDC
AAPL shares its newest practices for model splitting, architecture, and data sourcing used by its latest models. Full research here.

Too Much Test-Time Compute Makes AI a Dull Boy
A new study suggests the performance of reasoning models collapses when extending reasoning length on longer-timed jobs. Full research here.

💼 Hiring and Community

Startups Hiring This Week:
- AI/ML Engineer → In Tandem Families
- Software Engineer, Reinforcement Learning → AnyScale

Mid-Markets Hiring This Week:
- Software Engineer, AI SDK → Vercel
- Senior Team Lead, ML → Rocket Money

Enterprises Hiring This Week:
- AI/ML Research Engineer → DASTY
- Product Manager, Video → GOOG DeepMind

MSFT Poaches 20 Engineers From GOOG Gemini Team
MSFT brings in 20 Gemini engineers, including Amar Subramanya, the former head of engineering for the Gemini chatbot project.

META Poaches 3x Leading GOOG Researchers
A new Gemini version recently passed a tough math challenge (see below). META commemorated the occasion by poaching the researchers that built it.

AnySphere Acqui-Hires Koala Team Members
After poaching Anthropic’s Claude Code leads, the Cursor creator’s newest acqui-hire nabs team members from the AI-powered CRM.

Want a Promotion at AMZN? Prove Your AI Expertise
AMZN’s Ring team needs proof of AI expertise for any promotions, echoing MSFT making AI “no longer optional” and SHOP blocking new hires in favor of AI.

💡 Spotlight: Newly Launched AI Startups
- Arago: $26M from Earlybird for photonic AI hardware
- Kluisz.ai: $9.6M from RTP Global for secure AI cloud
- Daylight Security: $7M from Bain for agentic + human cyber security
- StrongestLayer: $5.2M from Recall Capital for AI phishing protection
- DroidRun: $2.4M from Merantix for mobile-native agents

🚀 New Model Launches
- Qwen3-235B-A22B-Instruct-2507: Alibaba’s latest rivals 4o and Claude Opus
- OpenReason-Nemotron: NVDA’s open-source reasoning model
- Kimi K2: Moonshot AI’s model that scores highly in LiveCodeBench
- EXAONE 4.0: LG’s model with reasoning and non-reasoning variants

🚚 Shipping Now: AI Product Launches & Enhancements
- ChatGPT Agent: OpenAI’s first stab at a commercial agent product
- Gemini Embedding: DeepMind’s latest text embedding model goes GA
- Agent Mode & CLI for Gemini Code Assist: More tools for AI codegen
- Deep Research/Voice/Projects: Mistral’s Le Chat adds many new features
- GitHub Spark: GitHub gets into vibe coding

🏭 Industry News & Trends

International Math Olympiad: Both OpenAI and Gemini Claim Gold
In the challenging competition, experimental models from both vendors claim gold.

…But the New ARC-AGI-3 AI Intelligence Benchmark Stumps Major Models
The ARC Prize Foundation’s 2026 benchmark, which released a partial test preview in July, already has major foundation models reporting scores of zero.

Claude Code Max Suddenly Gets Throttled
Anthropic apparently throttled users on the highest tier of its powerful AI coding assistant…without telling them.

META and AWS Really Want to Help Your GenAI Startup
META and AWS have partnered to offer 6-month mentorship programs to startups building over top of LLaMa.

Perplexity to Add “Shortcuts,” Wants Comet AI Browser on Phones
The AI search vendor will add “shortcuts for repetitive tasks” to Comet, and is reportedly in talks to displace Chrome on new mobile devices.

⚖️ Copyright, IP, Licensing, and Regulation

White House Reveals New AI Policy Plan
The new plan supports US “global dominance” in AI with data center buildouts, deregulation, and exporting AI software and hardware to US allies.

META Politely Declines Voluntary EU AI Guidelines
As the European AI Act nears its launch, the EU has also issued voluntary guidelines that META has refused to accept, opening itself to more regulatory scrutiny.

Spotify Publishes AI Songs From Dead Artists Without Permission
Yeah. It’s pretty bad.

Study: 72% of Teens Use AI Companions, 24% Surrender Personal Information
Nearly 3/4 of US teens have used AI companions. Nearly 1/4 have shared personal details with them. Might be time to have “the AI talk” with your kids.

About the Author

Hi there. My name is Andrew, and I work at Heavybit, the leading VC for software infrastructure startups in the age of AI. As the Editorial Lead, my goal is to find the most valuable AI news for technical founders and teams. The idea is to give the bottom line on emerging trends from our perspective in 10+ years of coaching and funding infrastructure companies. Email me and let me know what you think.

That’s all for this edition. In the meantime, please feel free to follow The Humans in the Loop on LinkedIn, BlueSky, and Twitter.

The Humans in the Loop

The Humans in the Loop: The Problem(s) With Agents

This Week in AI for Devs: These Growing Pains Sure Are Painful

Discussion about this post