The Humans in the Loop: Can We Interest You in Some Agents?
This Week in AI for Devs: Loads of New Agentic Resources
Welcome to The Humans in the Loop, your executive summary of AI news for devs. I’m Andrew, the [human] author, and I’d love your feedback on this newsletter. Please feel free to email me at andrew@heavybit.com with your thoughts. Thanks!
Our top story: New Agentic Lessons, Tools, and Benchmarks
This week (and perhaps for next week, and the week after that), we see the headlines for AI development being chock-full of agent-related resources. If you have yet to build your own coding agents, DeepLearning has a course for Hugging Face’s smolagents to create self-contained code snippets. If “smol” is too big for you, HF has also launched Tiny Agents, which are MCP-powered and only need about 50 lines of code to get up and running. Need custom engineering agents specific to your current work? This open-source project “creates AI agents specialized in your codebase” for code analysis and testing. Need integrations for your agents? This open-source project connects your agents to 600+. Need to get a better handle on what agentic failure modes mean? MSFT has released comprehensive report on agentic failure taxonomy. Need to better understand how well coding agents actually resolve issues? AMZN researchers have developed SWE-PolyBench, a new benchmark for coding agents that goes beyond pass rate into more-granular evaluations like file-level localization and node-level retrieval. As the developer community continues to experiment with agents, there’s no shortage of resources to play with so far.
Next Up: The bottom line in AI news for busy devs.
💻 Development
Llama API Preview and Llama Defenders at LlamaCon 2025
At META’s first AI dev event: A preview of the developer-facing API to support Llama model usage and a new suite of sec tools.
DeepWiki: AI-Based Understanding of Repos
Cognition AI, creator of “AI engineer” Devin, unveils an AI-powered utility to help you quickly understand any public repo. Here’s a 30-second demo.
Swap in Gemini Reasoning Levels With 3 Lines of Code
GOOG now lets users set 4 reasoning levels in Gemini 2.5 Flash via the OpenAI compatability layer in just 3 lines of code.
🤔 Interesting AI Projects, Research, and Updates
Center for AI Safety Announces SafeBench Winners
The winning benchmarks of the CAIS’ 2025 contest include frameworks to counter cyber, prompt injection, and backdoor attacks.
Anthropic on Interpretability
Claude vendor’s CEO outlines the challenges of not being able to fully understand how your AI product “thinks.”
DeepSeek R1 Documentary: “OpenAI Is Not God”
This 30-minute YouTube documentary covers the rise of the disruptive DeepSeek R1 model as DeepSeek prepares to launch its successor R2.
💼 Hiring and Community
Startups Hiring This Week:
- Lead Data Scientist → Philo
- Robotics Software Eng → Dexterity, Inc.
Mid-Markets Hiring This Week:
- Senior Backend Eng, ML → HighSpot
- Senior Software Eng, ML → Nuro, Inc.
Enterprises Hiring This Week:
- Gen AI Engineer → Visa
- Principal Full-Stack Dev, AI & Agentic → Sandisk
💡 Spotlight: Newly-Launched AI Startups
- AuthMind: $19.3M from Cheyenne Ventures for agentic observability
- Corvic AI: $12M from M Ventures for enterprise “AI cognitive infrastructure”
- Terminal 3: $8M from Illuminate Financial for AI-powered blockchain privacy
- Terra Security: $8M from SYN Ventures for AI-driven pentesting
- Augur: $7M from General Advance for AI-powered preventive cyber
- Bagel: $5.5M from at.inc for AI-driven product insights
- Cluely: $5.3M from Abstract Ventures for AI-powered “cheating on everything”
- Lattica: $3.25M from Cyber Fund for AI encryption
- Etiq AI: $1M from GapMinder VC for ML model testing & debugging
🚀 New Model Launches
- BABA Qwen3 Family - 8 models including 235B MoE model
- BIDU Ernie 4.5 Turbo and X1 Turbo, boasting 50% lower costs
- Zhipu GLM-4 32B, comparable to DeepSeek R1 and GPT
🏭 Industry: M&A, Launches, Trends
You’ll Never Guess What They’re Talking About at RSA
Why yes, AI is a key topic, with Gemini and Anthropic admitting they were misused for malicious attacks. Perhaps just as unsurprising, the consensus seems to be that we all need to put much “more effort” into AI security.
New Models from China: Qwen3, Ernie 4.5, Zhipu GLM-4
From the far East come three new models to try:
- Alibaba Qwen3 Family - 8 models including 235B MoE model
- Baidu Ernie 4.5 Turbo and X1 Turbo, boasting 50% lower costs
- Zhipu GLM-4 32B, comparable to DeepSeek R1 and GPT
⚖️ Copyright, IP, Licensing, and Regulation
The White House Is All-In on AI
The White House has mandated both greater AI adoption at government agencies and greater adoption in education at the K-12 level.
GOOG Antitrust Case: Pre-Installed Gemini on Samsung Phones
GOOG’s ongoing antitrust investigation reveals it paid the phone vendor handsomely to have its AI model pre-installed on mobile devices.
The EU Wants You (to Weigh in on the AI Act)
Anyone with an opinion on AI regulation, even non-Europeans, is welcome to share their feedback here.
About the Author
Hi there. My name is Andrew, and I work at Heavybit, the leading VC for developer-first startups. As the Editorial Lead, my goal is to find the most valuable and important AI news for developers and founders. The idea is to curate and bottom-line emerging trends from our perspective in 10+ years of coaching and funding developer-facing companies. Email me and let me know what you think.
That’s all for this edition. In the meantime, please feel free to follow The Humans in the Loop on LinkedIn, BlueSky, and Twitter.


