Ivan Zhou

View Original

Joining Databricks Mosaic

I am thrilled to be joining the Databricks Mosaic team to work on applied AI research. This is a fantastic opportunity for me to continue pushing the boundaries of AI advancement and contribute to building a successful AI platform for enterprises. I am eager to collaborate with this amazing group of people in Databricks and build exciting things together!

Why I am excited about this opportunity?

The Team

I have been closely following the Mosaic team since MPT-7B. Shortly after that, the arena of training state-of-the-art LLMs has become significantly crowded. The Mosaic stands out from its peers with its unwavering commitment to making LLM technologies more accessible (in terms of cost and tooling) for enterprise adoption as well as the broader AI ecosystem. The team has consistently stayed at the forefront of development, creating popular tools that make the technology more usable. There are numerous examples to exhibit their work: llm-foundry, StreamingDataset, training MoE, DSPy, quantization, MixAttention, and more.

This team is deeply technical and excel at engineering execution; at the same time, through my limited interactions with the team so far, I have been captivated by their culture and vibe. I had great conversations with people like Xing Chen, Jonathan Frankle, Jasmine Collins, and others. I resonate with their mission and direction, and feel a strong connection with the team's overall vibe. I am truly excited about the prospect of working closely with them and building great things together.

Enterprise AI

Throughout my career, I have focused on Enterprise AI. At Landing AI, I helped manufacturing companies achieve human-level automation using the latest computer vision and data-centric AI techniques; at Uber's Applied AI, I built foundation models to automate document processing for Uber's global operations. I thrive on diving into problem domains within the enterprise context and developing AI techniques for problem-solving and value creation. Since its founding, Databricks has been dedicated to helping enterprises adopt the latest data and AI technologies to achieve success. It is undoubtedly the perfect place for me to continue helping enterprises harness the value of the latest AI developments.

The company is growing at an astronomical rate, with Mosaic AI being a crucial part of its business story and strategy. I see a strong, sustained incentive for them to continue investing heavily in AI, staying at the cutting edge of technology, and importantly, building practical tools for the vast number of customers on their data-intelligent platform.

What am I looking for from the opportunity?

I want to continue growing my technical depth, from applied AI research to building successful AI products.

These are the areas that I want to focus on:

Data-centric AI: Systematic approaches to improving training data quality to build high-performing AI systems. In the pre-LLM era, our team at Landing AI observed the critical importance of data quality from the AI applications we built with manufacturers, so we explored techniques like consensus labeling and mislabel detection to help surface label inconsistencies and curate high-quality training data. The importance of data quality persists with Generative AI, if not more so for tasks like fine-tuning and RAG. What’s different now is that LLMs themselves are powerful tools for data iteration cycles: assisting in automating label creation, synthesizing more examples, rejecting mislabels, and more. The data-centric AI remains an important domain with great potentials to evolve.

Multimodal: This seems to be a significant theme of the year, with the introduction of GPT-4 (proprietary) and Llama vision models (open source). There has been rapid progress in technical development. However, from what I have seen, there remains a significant gap between the capabilities of multimodal LLMs and human-level performance. There is ample room for improvement in challenging multimodal tasks that require reasoning and high accuracy.

Enterprise AI adoption: I am excited to witness the continuous breakthroughs in Generative AI technologies to be converted into transformative effects and tremendous value for enterprises. I am inspired to be part of the movement and build new, useful products that truly benefit people's day-to-day work and life.


Lastly, going forward, I want to resume my efforts to share my learning and work through blog posts and public talks. I will try to keep updating my blogs and reading notes more regularly.

Let’s keep building!