I am an incoming CS Ph.D. student at the University of Southern California (Fight On!), co-advised by Prof. Yue Wang and Prof. Daniel Seita. I am currently an AI Resident at the FPT Software AI Center, working closely with Prof. Anh Nguyen. Previously, I graduated with a bachelor's degree in Computer Science at Ho Chi Minh City University of Science. These days, my research interest lies in the intersection of Robotics, Multimodal Learning, and Generative Modeling.
2024-12: I accept the CS Ph.D. offer from the University of Southern California for Fall 2025. I will work with Prof. Yue Wang and Prof. Daniel Seita! 🦾 🦿
2024-12: I attend ACCV 2024, hosted in my home country. My beloved Hanoi! 🇻🇳 ⛩️
2024-09: I attend ECCV 2024 in-person and give one oral and one poster presentation. Hello Milan! 🇮🇹 🤌
2024-07: One paper on language-driven 6-DoF grasp detection gets accepted to ECCV 2024 as Oral presentation!!!
2024-06: One paper on crowd navigation gets accepted to IROS 2024.
2024-01: Two papers on text-based affordance-pose learning and open-vocab affordance detection get accepted to ICRA 2024.
2023-09: One paper on language-driven scene synthesis gets accepted to NeurIPS 2023.
2023-09: I attend IROS 2023 in-person and give one oral and one poster presentation. First time abroad, Hello Michigan!!! 🇺🇸 🌆
2023-09: Our paper is nominated for best overall paper and best student paper awards at IROS 2023!!! This is a great honor!
2023-06: One paper on open-vocabulary affordance detection gets accepted to IROS 2023.
We introduce a novel diffusion model incorporating the new concept of negative prompt guidance learning to tackle the task of 6-DoF grasp detection in cluttered point clouds.
We introduce HabiCrowd, a new dataset and benchmark for crowd-aware visual navigation that surpasses other benchmarks in terms of human diversity and computational utilization.
We address the task of language-driven affordance-pose detection in 3D point clouds. Our method simultaneously detect open-vocabulary affordances and generate affordance-specific 6-DoF poses.
We introduce Language-Driven Scene Synthesis task, which involves the leverage of human-input text prompts to generate physically plausible and semantically reasonable objects.