The Humans in the Loop: The Snowflake-Databricks Arms Race, Level Up Your AI Dev Skills
Databricks and Snowflake Declare War, Be a Better AI Dev
Welcome to The Humans in the Loop: Your executive summary of AI/ML/LLM news that:
...Takes you through the week's biggest stories in AI development
...Covers business, copyright, community, and all the other good stuff
...Does it in three lines or less (usually)!
Know someone who might also want to receive this newsletter? Email us at thehumansintheloop@heavybit.com.
Our top story: Snowflake and Databricks go to war on a billion-dollar AI battlefield
The two data giants have thrown down their respective gauntlets this week:
⚔️ Snowflake teams up with Nvidia to embed the chip giant's NeMo GenAI platform into Snowflake Data Cloud to let Snowflake customers train models on their own data.
⚔️ Databricks acquired MosaicML for an astonishing $1.3B(the startup focused on lowering GenAI costs) while unveiling vector search via its Unity Catalog.
Data management will doubtless be a crucial part of any eventual AI stack, but the two firms are wasting no time, making big moves to try to outmaneuver one another.
Wishing you a Happy 4th as we get to what this week's AI news means for devs:
Development
Guides, Resources, and Events to Become a Better AI Dev...
🤖 Boost AI Adoption with DORA: DevOps best practices for AI-assisted coding.
🔥 Accelerate Pytorch Training: Ramp PyTorch deep learning models faster.
💬 Anatomy of a Chatbot: Everything to know about what you need to build.
🪚 July 7 - GOOG Cloud hackathon: Work with PaLM2 and other GOOG AI tools.
☕️ Article: The New AI Stack for JS?
Javascript devs: Could your future AI stack include auth + logic + vector database + LLM orchestration + model tools + deployment? This article explores a potential AI stack for JS.
🏋️ GH: Coding Assistants Will Make Us $1.5 Trillion! (In Productivity)
GitHub predicts AI assistants will make us so productive, we'll generate a trillion dollars in value.
Copyright, IP, Licensing, and Regulation
⚖️ How Many Foundation Models Comply With EU AI Act? Zero.
Stanford says: None of the major foundation models complies with the EU's recently-passed AI Act. The EU will open AI "crash test centers" to "ensure safety," but this will be a long, long road.
🇺🇸 Doesn't Look Like Uncle Sam Has Figured Out AI Regulation, Yet
Congress' new AI plan...lacks specifics. Biden meets AI CEOs + India's PM, but mainly on China competition. Toughest rules so far: House reps to pay for ChatGPT and be extra-careful using it!
Funding
Funding Rounds: Rare Earth, AI Analytics, Custom Models, and More...
💰 KoBold Metals (Series B, $200M): Bill Gates, Jack Ma, and Jeff Bezos bring this ML-powered mining company for rare earth minerals (used for EVs) to a billion-dollar valuation.
💰 Reka AI (Series A, $58M): Former GOOG, BIDU, META researchers Dani Yogatama, Qi Liu, Cyprien de Masson d’Autume, and Yi Tay un-stealth to build custom enterprise AI models.
💰 Calypso AI (Series A, $23M):Founder Neil Serebryany's team to focus on containerized guardrails that prevent toxicity and block sensitive data from being shared with external models.
💰 Faros AI (Series A, $20M):Ex-Salesforce founders Vitaly Gordon, Shubha Nabar, and Matthew Tobvin will focus on creating AI analytics to track org health and tech debt.
💰 BentoML (Seed, $9M): Ex-Databricks veteran Chaoyu Yang and Ex-Samsung engineer Bozhao Yu will focus on helping devs code and ship AI applications.
🏢 Corporate Giants Are Getting Serious About Investing in AI Startups
Plenty of massive corporations are getting involved in VC for AI startups, including CRM, QCOM, CSCO, AMZN, META, ORCL, AMD, BIDU, and DBX. (There goes the neighborhood!)
🤑 Not Enough VC $ For You? Have Some Accelerators and Grants
GOOG unveils an Israel-based AI startup accelerator, taking applications until July 30. Also, ex-GitHub CEO Nat Friedman and Cue founder Daniel Gross are offering $250K startup grants.
Industry
We've seen huge interest in vector databases (Pinecone, Weaviate) to manage unstructured AI data, but MongoDB, like Databricks, is bolting vector search capabilities onto its core offering.
📰 Thompson Reuters Acquires 100-Person AI Legal Firm Casetext for $650M
Reuters buys AI legal startup and creator of AI legal assistant CoCounsel for $6.5M per person.
👨👩👧👦 AWS Wants "AI Models For Everything"
AWS product VP Matt Wood explains that AMZN's AI strategy is simple: Not choosing between models, and not choosing between OSS or proprietary.
🗻 Ramp Acquires AI Customer Support Startup Cohere.io
Finance automation firm Ramp has snapped up AI support startup Cohere, which had previously raised $3.5M itself, for an undisclosed sum. If you won't build your own AI, buy it!
🗣️ Conversational Chatbot Creator Enters Foundation Model Race
Inflection, which has raised $225M for its conversational chatbot Pi, has announced its new foundation model "Inflection-1," an LLM roughly comparable in size and capabilities to GPT-3.5.
☠️ Training on AI Data Leads to "Death Spiral"? That Doesn't Sound Good
Training AIs on AI outputs (instead of high-quality data) could cause "model collapse," yet outsourcing workers are increasingly submitting AI outputs. What's the worst that could happen?
Hiring and Community
👷 Did You Want an AI Job? There Were 20% More
Job listing site Indeed reports that for the month of May, GenAI job listings increased 20% to 204 million positions, despite tech jobs being down 43% from last year overall.
💼 Hiring This Week: TikTok, Cruise, Sony, Anthropic, Instacart
This week's newest listings include:
- Senior AI Researcher @ TikTok
- Staff AI Research Scientist @ Cruise
- Associate AI/ML Staff Research Scientist @ Sony
- Data Analyst @ Anthropic
- Staff ML Engineer @ Instacart
Interesting AI Projects and Updates
Research: Levanter - Training Foundation Models with JAX
Stanford researchers' new codebase is built "for training reproducible, legible foundation models using JAX," with "a number of checkpoints for models trained with Levanter on its HF page."
[GH] vllm: Memory-Efficient Transformer Optimization
A high-throughput and memory-efficient inference and serving engine for LLMs.
Research: Textbooks Are All You Need
This Arxiv report introduces "phi-1," a new small-scale LLM with 1.3B parameters trained on "textbook-quality data" that shows surprisingly strong performance.
Wanda attempts to reduce LLM complexity by dropping less-significant network weights.
Research: Reducing Hallucinations With SequenceMatch
The report discusses a new method of using reinforcement learning when processing data sequences, which are often a root cause of hallucinations.
[GH] FlagAI - Training and Fine-Tuning Large Models
FlagAI (Fast LArge-scale General AI models) is an extensible toolkit for large-scale models.
This report analyzes modern AI models' ability to self-repair--a rare capability that only GPT-4 seems to posses at the moment.
[GH] LMFlow: Fine-Tuning Foundation Models
An extensible toolkit for fine-tuning and inference of large foundation models.T
Did we miss something? Email thehumansintheloop@heavybit.com and let us know.