The Humans in the Loop: The Snowflake-Databricks Arms Race, Level Up Your AI Dev Skills

Databricks and Snowflake Declare War, Be a Better AI Dev

Jul 03, 2023

Welcome to The Humans in the Loop: Your executive summary of AI/ML/LLM news that:

...Takes you through the week's biggest stories in AI development
...Covers business, copyright, community, and all the other good stuff
...Does it in three lines or less (usually)!

Know someone who might also want to receive this newsletter? Email us at thehumansintheloop@heavybit.com.

Our top story: Snowflake and Databricks go to war on a billion-dollar AI battlefield

The two data giants have thrown down their respective gauntlets this week:
⚔️ Snowflake teams up with Nvidia to embed the chip giant's NeMo GenAI platform into Snowflake Data Cloud to let Snowflake customers train models on their own data.
⚔️ Databricks acquired MosaicML for an astonishing $1.3B(the startup focused on lowering GenAI costs) while unveiling vector search via its Unity Catalog.

Data management will doubtless be a crucial part of any eventual AI stack, but the two firms are wasting no time, making big moves to try to outmaneuver one another.

Wishing you a Happy 4th as we get to what this week's AI news means for devs:

Development

Guides, Resources, and Events to Become a Better AI Dev...
🤖 Boost AI Adoption with DORA: DevOps best practices for AI-assisted coding.

🔥 Accelerate Pytorch Training: Ramp PyTorch deep learning models faster.

💬 Anatomy of a Chatbot: Everything to know about what you need to build.

🪚 July 7 - GOOG Cloud hackathon: Work with PaLM2 and other GOOG AI tools.

☕️ Article: The New AI Stack for JS?

Javascript devs: Could your future AI stack include auth + logic + vector database + LLM orchestration + model tools + deployment? This article explores a potential AI stack for JS.

🏋️ GH: Coding Assistants Will Make Us $1.5 Trillion! (In Productivity)

GitHub predicts AI assistants will make us so productive, we'll generate a trillion dollars in value.

Copyright, IP, Licensing, and Regulation

⚖️ How Many Foundation Models Comply With EU AI Act? Zero.

Stanford says: None of the major foundation models complies with the EU's recently-passed AI Act. The EU will open AI "crash test centers" to "ensure safety," but this will be a long, long road.

🇺🇸 Doesn't Look Like Uncle Sam Has Figured Out AI Regulation, Yet

Congress' new AI plan...lacks specifics. Biden meets AI CEOs + India's PM, but mainly on China competition. Toughest rules so far: House reps to pay for ChatGPT and be extra-careful using it!

Funding

Funding Rounds: Rare Earth, AI Analytics, Custom Models, and More...

💰 KoBold Metals (Series B, $200M): Bill Gates, Jack Ma, and Jeff Bezos bring this ML-powered mining company for rare earth minerals (used for EVs) to a billion-dollar valuation.

💰 Reka AI (Series A, $58M): Former GOOG, BIDU, META researchers Dani Yogatama, Qi Liu, Cyprien de Masson d’Autume, and Yi Tay un-stealth to build custom enterprise AI models.

💰 Calypso AI (Series A, $23M):Founder Neil Serebryany's team to focus on containerized guardrails that prevent toxicity and block sensitive data from being shared with external models.

💰 Faros AI (Series A, $20M):Ex-Salesforce founders Vitaly Gordon, Shubha Nabar, and Matthew Tobvin will focus on creating AI analytics to track org health and tech debt.

💰 BentoML (Seed, $9M): Ex-Databricks veteran Chaoyu Yang and Ex-Samsung engineer Bozhao Yu will focus on helping devs code and ship AI applications.

🏢 Corporate Giants Are Getting Serious About Investing in AI Startups

Plenty of massive corporations are getting involved in VC for AI startups, including CRM, QCOM, CSCO, AMZN, META, ORCL, AMD, BIDU, and DBX. (There goes the neighborhood!)

🤑 Not Enough VC $ For You? Have Some Accelerators and Grants

GOOG unveils an Israel-based AI startup accelerator, taking applications until July 30. Also, ex-GitHub CEO Nat Friedman and Cue founder Daniel Gross are offering $250K startup grants.

Industry

🔍 MongoDB Adds Vector Search

We've seen huge interest in vector databases (Pinecone, Weaviate) to manage unstructured AI data, but MongoDB, like Databricks, is bolting vector search capabilities onto its core offering.

📰 Thompson Reuters Acquires 100-Person AI Legal Firm Casetext for $650M

Reuters buys AI legal startup and creator of AI legal assistant CoCounsel for $6.5M per person.

👨‍👩‍👧‍👦 AWS Wants "AI Models For Everything"

AWS product VP Matt Wood explains that AMZN's AI strategy is simple: Not choosing between models, and not choosing between OSS or proprietary.

🗻 Ramp Acquires AI Customer Support Startup Cohere.io

Finance automation firm Ramp has snapped up AI support startup Cohere, which had previously raised $3.5M itself, for an undisclosed sum. If you won't build your own AI, buy it!

🗣️ Conversational Chatbot Creator Enters Foundation Model Race

Inflection, which has raised $225M for its conversational chatbot Pi, has announced its new foundation model "Inflection-1," an LLM roughly comparable in size and capabilities to GPT-3.5.

☠️ Training on AI Data Leads to "Death Spiral"? That Doesn't Sound Good

Training AIs on AI outputs (instead of high-quality data) could cause "model collapse," yet outsourcing workers are increasingly submitting AI outputs. What's the worst that could happen?

Hiring and Community

👷 Did You Want an AI Job? There Were 20% More

Job listing site Indeed reports that for the month of May, GenAI job listings increased 20% to 204 million positions, despite tech jobs being down 43% from last year overall.

💼 Hiring This Week: TikTok, Cruise, Sony, Anthropic, Instacart

This week's newest listings include:

- Senior AI Researcher @ TikTok
- Staff AI Research Scientist @ Cruise
- Associate AI/ML Staff Research Scientist @ Sony
- Data Analyst @ Anthropic
- Staff ML Engineer @ Instacart

Interesting AI Projects and Updates

Research: Levanter - Training Foundation Models with JAX

Stanford researchers' new codebase is built "for training reproducible, legible foundation models using JAX," with "a number of checkpoints for models trained with Levanter on its HF page."

[GH] vllm: Memory-Efficient Transformer Optimization

A high-throughput and memory-efficient inference and serving engine for LLMs.

Research: Textbooks Are All You Need

This Arxiv report introduces "phi-1," a new small-scale LLM with 1.3B parameters trained on "textbook-quality data" that shows surprisingly strong performance.

[GH] wanda: Pruning for LLMs

Wanda attempts to reduce LLM complexity by dropping less-significant network weights.

Research: Reducing Hallucinations With SequenceMatch

The report discusses a new method of using reinforcement learning when processing data sequences, which are often a root cause of hallucinations.

[GH] FlagAI - Training and Fine-Tuning Large Models

FlagAI (Fast LArge-scale General AI models) is an extensible toolkit for large-scale models.

Research: GPT Self-Repair

This report analyzes modern AI models' ability to self-repair--a rare capability that only GPT-4 seems to posses at the moment.

[GH] LMFlow: Fine-Tuning Foundation Models

An extensible toolkit for fine-tuning and inference of large foundation models.T

Did we miss something? Email thehumansintheloop@heavybit.com and let us know.

The Humans in the Loop

The Humans in the Loop: The Snowflake-Databricks Arms Race, Level Up Your AI Dev Skills

Databricks and Snowflake Declare War, Be a Better AI Dev

Discussion about this post