Toan Nguyen

/twan ŋwɪn/

I am currently an AI Research Resident at FSoft AI Center, working closely with Prof. Anh Nguyen. My research lies in the intersection of Robotics, Multimodal Learning, and Generative Modeling. Previously, I was an Undergraduate Research Assistant at AISIA Lab. I graduated with a bachelor's degree in Computer Science at Ho Chi Minh City University of Science.

Email  /  CV  /  Google Scholar  /  Github  /  Twitter  /  LinkedIn

profile photo
Thoughts on Intelligent Robots

In the long term, I envision and yearn for a world where robots assist us in every aspect of daily life. As a football lover, I am especially excited about a future where robots can not only dexterously and effectively play sports like football with us, but also coach us to improve our skills. A recent research by DeepMind on table tennis has fueled my excitement even more.

In the short term, I believe that building world models is a critical step in vastly enriching the data needed for robot training, with generative models playing a key role. The recent work by 1X has given me so much renewed hope for this future.

News

  • 2024-12: I attend ACCV 2024, hosted in my home country. My beloved Hanoi! 🇻🇳 ⛩️
  • 2024-09: I attend ECCV 2024 in-person and give one oral and one poster presentation. Hello Milan! 🇮🇹 🤌
  • 2024-07: One paper on language-driven 6-DoF grasp detection gets accepted to ECCV 2024 as Oral presentation!!!
  • 2024-06: One paper on crowd navigation gets accepted to IROS 2024.
  • 2024-01: Two papers on text-based affordance-pose learning and open-vocab affordance detection get accepted to ICRA 2024.
  • 2023-09: One paper on language-driven scene synthesis gets accepted to NeurIPS 2023.
  • 2023-09: I attend IROS 2023 in-person and give one oral and one poster presentation. First time abroad, Hello Michigan!!! 🇺🇸 🌆
  • 2023-09: Our paper is nominated for best overall paper and best student paper awards at IROS 2023!!! This is a great honor!
  • 2023-06: One paper on open-vocabulary affordance detection gets accepted to IROS 2023.
  • Publications

    * indicates equal contribution

    Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
    Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Dung Nguyen, Ngan Le, Thieu Vo, Anh Nguyen
    European Conference on Computer Vision (ECCV), 2024, Oral
    [arXiv] [Project] [Code]

    We introduce a novel diffusion model incorporating the new concept of negative prompt guidance learning to tackle the task of 6-DoF grasp detection in cluttered point clouds.

    dise HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
    An Vuong, Toan Nguyen, Minh Nhat Vu, Baoru Huang, Dung Nguyen, Binh Huynh, Thieu Vo, Anh Nguyen
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024
    [arXiv] [Project] [Code]

    We introduce HabiCrowd, a new dataset and benchmark for crowd-aware visual navigation that surpasses other benchmarks in terms of human diversity and computational utilization.

    dise Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
    Toan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Vo, Vy Truong, Ngan Le, Thieu Vo, Bac Le, Anh Nguyen
    IEEE International Conference on Robotics and Automation (ICRA), 2024
    [arXiv] [Project] [Code]

    We address the task of language-driven affordance-pose detection in 3D point clouds. Our method simultaneously detect open-vocabulary affordances and generate affordance-specific 6-DoF poses.

    dise Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
    Tuan Vo, Minh Nhat Vu, Baoru Huang, Toan Nguyen, Ngan Le, Thieu Vo, Anh Nguyen
    IEEE International Conference on Robotics and Automation (ICRA), 2024
    [arXiv] [Code]

    We introduce a new open-vocabulary affordance detection method using knowledge distillation and text-point correlation.

    Language-Driven Scene Synthesis Using Multi-Conditional Diffusion Model
    An Vuong, Minh Nhat Vu, Toan Nguyen, Baoru Huang, Dung Nguyen, Thieu Vo, Anh Nguyen
    Conference on Neural Information Processing Systems (NeurIPS), 2023
    [arXiv] [Project] [Code]

    We introduce Language-Driven Scene Synthesis task, which involves the leverage of human-input text prompts to generate physically plausible and semantically reasonable objects.

    dise Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen, Minh Nhat Vu, An Vuong, Dung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, Best Overall & Best Student Paper Awards Finalist
    [arXiv] [Project] [Code]

    Our method detects potentially unlimited textual affordance labels.

    dise Boosting Insect Pest Recognition with Deep-Wide Learning
    Toan Nguyen, Huy Nguyen, Huy Ung, Hieu Ung, Binh Nguyen
    Under Review
    [arXiv] [Code]

    DeWi obtains state-of-the-art performances on insect pest classification benchmarks by combining image data augmentation and representation learning.

    Services

  • Conference reviewer:ICCV 2025, ECCV 2024, ICRA 2024, IROS 2024.
  • Journal reviewer: RA-L 2025.

  • Website Template