The Humans in the Loop: Now Available in Strawberry

This Week in AI for Devs: OpenAI's "Thinking" Model

Sep 19, 2024

Welcome to The Humans in the Loop, your executive summary of AI news for devs. I’m Andrew, the [human] author, and I’d love your feedback on this newsletter. Please feel free to email me at andrew@heavybit.com with your thoughts. Thanks!

Our top story: Inside OpenAI’s o1

After quietly revealing that OpenAI will no longer lean on its non-profit DNA and become more of a traditional for-profit company, the company has unveiled o1 (previously known as ‘Strawberry’), an “AI that can do general-purpose complex reasoning” supported by reinforced learning, as o1-preview and o1 mini versions. o1’s accomplishments, like scoring 89% on Codeforces and hitting top 500 rank in AIME, are certainly impressive at first blush. Cognition found improved performance for its “AI engineering agent” Devin using o1 as well.

However, the new capabilities aren’t free, or perfect. The preview version of the new product has notable gaps, including API restrictions for function calling and streaming, rate limits, and higher latency on responses (as the model “takes longer to think”). o1 token prices are also notably more expensive—with input and output tokens costing 3x-4x more than GPT-4o. (Don’t try prompting o1 on how it works either—OpenAI may ban you.)

Slightly more alarmingly, the company’s CEO has stepped down from its safety committee after o1 is the first of OpenAI’s models rated a ‘medium’ risk for CBRN—that is, “chemical, biological, radiological and nuclear weapons” risk due to “sometimes instrumentally faked alignment during testing.” Does this mean AIs will launch nukes anytime soon? Probably not. But we’re also still a ways off from AGI.

Next: The bottom line for the rest of this week’s AI news.

💻 Development

Guide: Browser-Based GPU Programming
This guide from researchers Sarah Pan and Austin Huang covers how to use gpu.cpp to program GPUs using puzzle challenges.

Guide: How to Create RAG Pipelines with Pinecone
This guide covers how to create a RAG pipeline to pull in data from a S3 bucket, create embeddings, and write vectors to Pinecone.

Model to Try: Piiranha-v1 for Data Privacy
Piiranha-v1 is a lightweight 280M encoder model specifically built to detect and protect personally identifiable information (PII).

🤔 Interesting AI Projects, Research, and Updates

DevGuild AI Summit: Code Generators
There’s still no clear framework for codegen best practices. Join 200+ tech leaders in SF on Oct 30 to change that.

DataGemma: GOOG Takes on Hallucinations With Real-World Data
These new RAG and retrieval-interleaved generation (RIG) models use real-time retrieval to debunk hallucinations.

Research: LLM Self-Correction
This new research paper surveys the ability of LLMs to self-correct.

💼 Hiring and Community

Startups Hiring This Week:
- ML Engineer → HackerPulse
- ML Engineer → Output Biosciences

Mid-Markets Hiring This Week:
- ML Engineer → Stellent IT
- Senior ML Engineer → Enigma.io

Enterprises Hiring This Week:
- Data Engineer → Tether.io
- Principal AI Architect → EY

💡 Spotlight: Newly-Launched AI Startups
- World Labs (founded by Fei-Fei Li): $230M from a16z for AI in 3D space
- Supermaven: $12M from BVP for AI coding assistants
- Protege: $10M from CRV for AI training data
- Acuvity: $9M from Foundation Capital for AI governance
- c/side: $6M from Uncork for AI-powered Web security
- SplxAI: $2M from Runtime Ventures to secure AI chatbots

🏭 Industry: M&A, Launches, Trends

In AI, The Data Pipeline Is the Secret Sauce
DevOps veteran Jesse Robbins explains why data management in AI is having its “DevOps moment.”

AWS Bets Its Bottom Chip
Amazon is getting into the AI chip business, unveiling Inferentia and Trainium to support small-scale AI tasks.

Mistral Launches Free API Tier
Free tier, improved small model, and better pricing across the board.

What to Think About When Evaluating AI Products
Snyk co-founder Guy Podjarny proposes a framework for teams to think about how they evaluate AI solutions.

⚖️ Copyright, IP, Licensing, and Regulation

White House “Data Center Task Force” With NVDA, GOOG, MSFT, OpenAI
AI’s leading corporations have begun not-so-quietly lobbying Uncle Sam to support the development of data centers to host inference.

Not in EU? META and LinkedIn Already Trained AI on Your Data
EU users could opt out—otherwise, Meta already gobbled up your data from 2007. (LinkedIn offers a manual opt-out in your profile settings.)

NVDA Faces $1B Patent Challenge Over New Blackwell Chipset
Texas-based Xockets claims the chip giant “stole” Blackwell’s technology during a 2020 acquisition.

About the Author

Hi there. My name is Andrew, and I work at Heavybit, the leading VC for developer-first startups. As the Editorial Lead, my goal with The Humans in the Loop is to find the most valuable and important AI news for developers and founders. The idea is to curate and bottom-line emerging trends from our perspective in 10 years of coaching and funding developer-facing companies. Email me and let me know what you think.

That’s all for this edition. In the meantime, please feel free to follow The Humans in the Loop on Twitter and LinkedIn.

The Humans in the Loop

The Humans in the Loop: Now Available in Strawberry

This Week in AI for Devs: OpenAI's "Thinking" Model

Discussion about this post