I gave a presentation at the Ray Summit on my work building Multimodal Foundation Models for Document Automation at Uber. It is always a great pleasure to publicly share what I have been building over the past year!
Read MoreComputer Vision
Things I Learned at Landing AI
Over the past four and a half years at Landing AI, I have had the incredible opportunity to work with Andrew Ng, Dillon Laird and other amazing people to build AI applications across various industries. Each project has brought its unique challenges, pushing me to dive deeper into the ever-evolving world of AI. As I look back at this enriching journey, I am grateful and humble to share the lessons that I've learned in the hope of inspire others in the field.
Read MoreFast and Simple Image Search with Foundation Models
In this blog post, I will walk you through how to build a fast and simple image search tool. I developed an image search application that uses multimodal foundation models to search for highly accurate and relevant results. By following this blog post and our code base, you can easily build one yourself!
Read MorePaper Explained - LAION-5B
In this blog post, I cover one of the awarded papers in NeurIPS 2022. This paper presents LAION-5B, a dataset consisting of 5.9 billion image-text pairs, to further push the scale of open datasets for training and studying state-of-the-art language-vision models. With this large scale, it gives strong increases to zero-shot transfer and robustness.
Read MoreTech Talk on Photography →
I did this tech talk at Landing AI this week and I’d like to share it out on my website. It’s about the core concepts in photography and a little on the recent trend of computational photography.
I removed a section that talks about the imaging solution design in one of our internal projects, due to IP restriction. I will later share a blog post on that topic.
Read More