研究方向与介绍
计算机视觉(目标检测与跟踪、人群计数、视线估计、服装编辑),三维点云(法向估计、点云生成),数字人,多模态大模型,图像生成与多视角合成
本人长期致力于深度学习、计算机视觉、多模态大模型等人工智能前沿算法与应用研究,秉承“理论结合应用”的学术理念,提出了一系列高性能视觉学习模型,显著提升了模型在多种视觉应用场景下的鲁棒性与可迁移性。目前,已在图像分类、目标检测、无人机视频压缩与感知、目标计数、图像生成与编辑、AI图像篡改定位、视线估计等多个视觉任务方向取得系列研究成果,相关成果发表于TNNLS、TMM、Pattern Recognition、ACM MM、ICME、PRCV、WACV、CGF、CAD、ICMR、ICASSP等国际高水平期刊与会议。多项研究已在真实场景中成功落地并引起广泛关注,如大模型相关研究成果落地杭州市萧山区社会治理中心,目标检测相关成果落地浙江电信等,此外,团队还与香港中文大学、浙江大学、南京大学、字节跳动、中国移动等国内知名高校及企业建立了广泛合作关系。欢迎具备扎实数学与计算机基础,且对人工智能感兴趣、热衷于学术研究或技术开发应用的同学报考。
教育经历
2021年09月-2022年10月 加拿大圭尔夫大学,计算机科学,博士
2020年11月-2021年05月 西湖大学深度学习实验室访学
2017年09月-2021年04月 加拿大纽芬兰纪念大学,计算机科学,博士
2014年09月-2017年01月 天津大学,软件工程,硕士
2010年09月-2014年06月 浙江理工大学,信息与计算科学,学士
科研项目与发表论文
谷歌学术:https://scholar.google.com/citations?user=MfsLOqcAAAAJ&hl=zh-CN
科研项目:
[1] 主持国家自然科学基金委、省基金委纵向项目各一项;
[2] 主持浙江大学JKW创新团队项目子课题一项;
[3] 主持企业横向一项。
论文发表:
[1] Yuan, S., Shen, Y., Yi, Z., Zhou, J., Gong, M., & Wang, M.* (2026). PatternDiff: A New Benchmark for Tailored Garment Design with a Pattern-centric Multimodal Diffusion Model. IEEE Transactions on Multimedia (TMM).(SCI一区,Top期刊,学生一作,大修)
[2] Wang, M., Li, Z., Dai, Y., Eric, B., & Gong, M. (2026). VLCounting: Taming Zero-shot Counting via Language-driven Exemplar Grounding. Pattern Recognition (PR).(SCI一区,Top期刊,小修)
[3] Wang, M.*, Yuan, S., Han, X., & Yi, Z. (2025). Draw What You Hear: High-fidelity Image Generation and Manipulation via SoundAdapter. IEEE Transactions on Neural Networks and Learning Systems (TNNLS). (SCI一区,Top期刊)
[4] Wang, M., Li, Y., Zhou, J., Taylor, G. W., & Gong, M. (2024). GCNet: Probing Self-similarity Learning for Generalized Counting Network. Pattern Recognition (PR).(SCI一区,Top期刊)
[5] Wang, M., Cai, H., Han, X. F., Zhou, J., & Gong, M. (2023). STNet: Scale Tree Network with Multi-level Auxiliator for Crowd Counting. IEEE Transactions on Multimedia (TMM).(SCI一区,Top期刊)
[6] Wang, M., Zhou, J., Cai, H., & Gong, M. (2023). CrowdMLP: Weakly-supervised Crowd Counting via Multi-granularity MLP. Pattern Recognition (PR).(SCI一区,Top期刊)
[7] Wang, M., Cai, H., Zhou, J., & Gong, M. (2021). Interlayer and Intralayer Scale Aggregation for Scale-invariant Crowd Counting. Neurocomputing.(SCI二区,Top期刊)
[8] Wang, M., Cai, H., Dai, Y., & Gong, M. (2023). Dynamic Mixture of Counter Network for Location-Agnostic Crowd Counting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)(CORE A会议)
[9] Wang, M., Cai, H., Zhou, J., & Gong, M. (2020). Stochastic Multi-scale Aggregation Network for Crowd Counting. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).(Oral)(CCF B会议)
[10] Wang, M., Cai, H., Huang, X., & Gong, M. (2020). ADNet: Adaptively Dense Convolutional Neural Networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).(CORE A会议)
[11] Wang, M., Zhou, J., Mao, W., & Gong, M. (2019). Multi-scale Convolution Aggregation and Stochastic Feature Reuse for DenseNets. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).(CORE A会议)
[12] Wang, M., Yuan, S., Li, Z., Zhu, L., Buys, E., & Gong, M. (2024). Language-Guided Zero-Shot Object Counting. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW).
[13] Li, Y., Wang, M., Gong, M, Lu, Y., & Liu L. (2024). FER-former: Multi-modal Transformer for Facial Expression Recognition. IEEE Transactions on Multimedia (TMM).(SCI一区,Top期刊)
[14] Cai, H., Wang, M.*, Mao, W., & Gong, M. (2020). No-reference Image Sharpness Assessment based on Discrepancy Measures of Structural Degradation. Journal of Visual Communication and Image Representation (JVCIR).(SCI期刊)
[15] Hou, J., Lu, Y., Wang, M., Ouyang, W., Yang, Y., Zou, F., ... & Liu, Z. (2024). A Markov Chain Approach for Video-based Virtual Try-on with Denoising Diffusion Generative Adversarial Network. Knowledge-Based Systems (KBS).(SCI一区,Top期刊)
[16] Huang, X., Wang, M., & Gong, M. (2019). Hierarchically-fused Generative Adversarial Network for Text to Realistic Image Synthesis. In Proceedings of the Conference on Computer and Robot Vision (CRV).(获得最佳论文奖)
[17] Xu, J., Tang, B., Wang, M., Li, M., & Ma, M. (2023). CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME).(CCF B会议)
[18] Xie, T., Liao, L., Bi, C., Tang, B., Yin, X., Yang, J., Wang, M., ... & Ma, Z. (2021). Towards Realistic Visual Dubbing with Heterogeneous Sources. In Proceedings of the ACM International Conference on Multimedia (ACM MM).(CCF A会议)
[19] Zhou, J., Jin, W., Wang, M.*, Liu, X., Li, Z., & Liu, Z. (2023). Improvement of Normal Estimation for Point Clouds via Simplifying Surface Fitting. Computer-Aided Design (CAD).(SCI期刊)
[20] Zhou, J., Jin, W., Wang, M.*, Liu, X., Li, Z., & Liu, Z. (2022). Fast and Accurate Normal Estimation for Point Cloud via Patch Stitching. Computer-Aided Design (CAD).(SCI期刊)
[21] Mao, W., Wang, M., Huang, H., & Gong, M. (2022). A Robust Framework for Multi-view Stereopsis. The Visual Computer (TVC).(SCI期刊)
[22] Zhou, J., Wang, M., Mao, W., Gong, M., & Liu, X. (2020). SiamesePointNet: A Siamese Point Network Architecture for Learning 3D Shape Descriptor. Computer Graphics Forum (CGF).(SCI期刊)