I am currently an AI Research Resident at FSoft AI Center, working closely with Prof. Anh Nguyen. My research lies in the intersection of Robotics, Multimodal Learning, and Generative Modeling.
Previously, I was an Undergraduate Research Assistant at AISIA Lab. I graduated with a bachelor's degree in Computer Science at Ho Chi Minh City University of Science.
In the long term, I envision and yearn for a world where robots assist us in every aspect of daily life.
As a football lover, I am especially excited about a future where robots can not only dexterously and effectively play sports like football with us, but also coach us to improve our skills.
A recent research by DeepMind on table tennis has fueled my excitement even more.
In the short term, I believe that building world models is a critical step in vastly enriching the data needed for robot training, with generative models playing a key role.
The recent work by 1X has given me so much renewed hope for this future.
News
2024-12: I attend ACCV 2024, hosted in my home country. My beloved Hanoi! 🇻🇳 ⛩️
2024-09: I attend ECCV 2024 in-person and give one oral and one poster presentation. Hello Milan! 🇮🇹 🤌
2024-07: One paper on language-driven 6-DoF grasp detection gets accepted to ECCV 2024 as Oral presentation!!!
2024-06: One paper on crowd navigation gets accepted to IROS 2024.
2024-01: Two papers on text-based affordance-pose learning and open-vocab affordance detection get accepted to ICRA 2024.
2023-09: One paper on language-driven scene synthesis gets accepted to NeurIPS 2023.
2023-09: I attend IROS 2023 in-person and give one oral and one poster presentation. First time abroad, Hello Michigan!!! 🇺🇸 🌆
2023-09: Our paper is nominated for best overall paper and best student paper awards at IROS 2023!!! This is a great honor!
2023-06: One paper on open-vocabulary affordance detection gets accepted to IROS 2023.
We introduce a novel diffusion model incorporating the new concept of negative prompt guidance learning to tackle the task of 6-DoF grasp detection in cluttered point clouds.
We introduce HabiCrowd, a new dataset and benchmark for crowd-aware visual navigation that surpasses other benchmarks in terms of human diversity and computational utilization.
We address the task of language-driven affordance-pose detection in 3D point clouds. Our method simultaneously detect open-vocabulary affordances and generate affordance-specific 6-DoF poses.
We introduce Language-Driven Scene Synthesis task, which involves the leverage of human-input text prompts to generate physically plausible and semantically reasonable objects.