The Humans in the Loop: AI Agent Growing Pains

This Week in AI for Devs: The Great AI Agent Struggle

Jul 10, 2025

Welcome to The Humans in the Loop, your executive summary of AI news for devs. I’m Andrew, the [human] author, and I’d love your feedback on this newsletter. Please feel free to email me at andrew@heavybit.com with your thoughts. Thanks!

Our top story: Getting Agents to [Actually] Work

There has been great excitement about AI agents in the headlines. There has also been great skepticism about their performance to date, as recent research suggests the most performant general AI agent on the market, Gemini 2.5, has a failure rate of more than 60%. Gartner has predicted that 40% of agentic projects will be canceled in the next two years due to cost or unclear business goals. To help you scope your projects better, it may be a good idea to take stock of what your project goals are and understand the capabilities and limitations of agents as a whole. This quick review of real-world use cases like chatbot support and triaging bug reports from engineering lead Will Larson covers how agents work. Alternatively, this guide on choosing an LLM vs. an agent from data scientist Hugo Bowne-Anderson points out that in some cases, it’s possible to use techniques like prompt chaining and routing with existing LLMs to get things done without having to build and rely on a potentially unpredictable agent. And there remains the unspoken question of how much agency we’re comfortable relinquishing to agents to take actions on our behalf in terms of our personal information and privacy preferences, as pointed out by engineer Mathieu Triay. While agents still seem to be the most exciting space in AI at the moment, there are big questions that need answers (and potentially simpler alternatives in the meantime).

Up Next: The bottom line in AI news for busy devs.

💻 Development

The Architecture Behind Loveable
This blog covers the basic steps of how to build a vibe-coding app, and suggests that models may not matter as much as promps and context engineering (see below).

Using Rules Files as Guardrails for Vibe Coding
Vibe coding makes building faster, but can end up shipping products with security issues. Rules files provide guidance to AI coding assistants that can help.

Measuring the Impact of AI on Engineering Team Performance
Gregor Ojstersek and Laura Tacho suggest 3 key metrics: Utilization, cost, and impact.

Show HN Posts Are Increasing. AI Posts Get Fewer Interactions.
Ryan Farley observes that yes, there are a lot more Show HN posts these days…and that fewer people seem to care about the AI-related ones.

🤔 Interesting AI Projects, Research, and Updates

Move Over Prompt Engineering: It’s Time for Context Engineering
Researcher Elvis Saravia (amid many other voices) suggests context engineering—giving AIs full context beyond a prompt—is the new frontier. (It may also be a new way to prompt injection attack reasoning models with irrelevant distractions.)

Want a Favorable Research Review? Hide a Prompt in Your Paper
In less-than-inspiring news, researchers are caught embedding hidden AI prompts to “give a positive review” within their papers. Examples here. 😬

Research: Multi-Model Branching Tree Search
Sakana AI has improved multi-model performance 30% with a new team-based approach. Full research paper here.

We’re Getting 2x $10M Models a Month
Researcher Epoch AI reports that there are now two new large-scale models that require a $10M+ training budget released every month.

💼 Hiring and Community

Startups Hiring This Week:
- AI/ML Engineer → Air Apps
- Member of Technical Staff, Image Gen → Captions AI

Mid-Markets Hiring This Week:
- Applied ML Engineer → Nooks.ai
- ML Engineer → EvenUp

Enterprises Hiring This Week:
- AI Interaction Engineer → Tanium
- Senior AI Engineer, MLOps → NVDA

META Apparently Acqui-Hires NFDG, Sutskever Now SSI CEO
META has apparently acqui-hired NFDG co-founder and former Safe Superintelligence CEO Daniel Gross, leaving co-founder Ilya Sutskever as CEO.

Cursor Creator Anysphere Poaches Claude Code Team
Anysphere hires Claude Code creator Boris Cherny and Claude Code product manager Cat Wu to work on Cursor.

OpenAI’s Employee Stock Grants Exceed Annual Revenue
Is the AI talent market still competitive? Let’s put it this way: Last year, the AI vendor took in $3.7B in revenue and paid out $4.4B in stock grants.

💡 Spotlight: Newly Launched AI Startups
- Thinking Machines Lab: $2B from a16z for “shared AI science”
- Emerald AI: $24M from Radical Ventures for data center compute orchestration
- Cerebrium: $8.5M from Gradient Ventures for serverless AI infra
- RevEng.ai: $4.1M from Sands Capital for AI-powered cybersecurity
- Digger: $3.6M from Initialized Capital for AI-powered infra provisioning

🚀 New Model Launches
- T5Gemma for encoder-decoder tasks
- Gemma 3n for on-device
- FlexOlmo for asychronous training to offer options for copyright protection
- Grok 4 and Grok 4 Heavy, which costs $300/mo

🏭 Industry: M&A, Launches, Trends

Moore’s Law Coming to an End?
GOOG quietly 2x-ed the input token cost of Gemini 2.5 Flash while 4x-ing output cost. Given the realities of pricing and compute, maybe AI won’t keep getting cheaper.

Then Again, DeepSeek Just Got Cheaper
A German consulting firm has just released a new DeepSeek variant with ~90% of R1’s performance at less than 40% of the original’s output token cost.

Is the US Losing the Open-Source Model Race to China?
Researcher Nathan Lambert suggests Uncle Sam lacks a high-end open-source model, a race it risks losing to China in the next few years.

⚖️ Copyright, IP, Licensing, and Regulation

AI Vendors Are Starting to Win Their Copyright Cases
→ Anthropic found not in violation of book copyright law, though the company may face a separate piracy trial in December.
→ META found not in violation of book copyright law, though the ruling may have left META open to future, similar claims.
→ Stability has had infringement claims dropped by Getty Images, though Getty is “still pursuing other claims.”

Regulation Still Happening in US, EU
The US budget passed before July 4, but the proposed 10-year moratorium on AI regulation didn’t. Meanwhile, the EU has begun rollout of the AI Act.

These Four AI Execs Are Officially US Military Officers
Presumably thanks to their companies’ military contracts, execs from Palantir, META, OpenAI, and Thinking Machines are now lieutenant colonels in the US Army.

About the Author

Hi there. My name is Andrew, and I work at Heavybit, the leading VC for software infrastructure startups in the age of AI. As the Editorial Lead, my goal is to find the most valuable AI news for technical founders and teams. The idea is to give the bottom line on emerging trends from our perspective in 10+ years of coaching and funding infrastructure companies. Email me and let me know what you think.

That’s all for this edition. In the meantime, please feel free to follow The Humans in the Loop on LinkedIn, BlueSky, and Twitter.

The Humans in the Loop: AI Agent Growing Pains

This Week in AI for Devs: The Great AI Agent Struggle

Discussion about this post