💻 Selected Research Papers
My full paper list is shown at my personal homepage.
🎙 Multi-modal(view) Learning
ICME 2023
Deep Metric Multi-View Hashing for Multimedia Retrieval, Jian Zhu, Xiaohu Ruan, Yongli Cheng, Zhangmin Huang, Yu Cui, Lingfang ZengNeurIPS 2019
FastSpeech: Fast, Robust and Controllable Text to Speech, Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan LiuNeurIPS 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech, Yi Ren, Jinglin Liu, Zhou Zhao, Project ||
AAAI 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism, Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Zhou Zhao, Project ||
|
ICML 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models, Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou ZhaoACL 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training, Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Xiang Yin and Zhou ZhaoACL 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models, Ziyue Jiang, Qian Yang, Jialong Zuo, Zhenhui Ye, Rongjie Huang, Yi Ren and Zhou ZhaoACL 2023
Revisiting and Incorporating GAN and Diffusion Models in High-Fidelity Speech Synthesis, Rongjie Huang, Yi Ren, Ziyue Jiang, Chenye Cui, Jinglin Liu and Zhou ZhaoACL 2023
Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech, Rongjie Huang, Chunlei Zhang, Yi Ren, Zhou Zhao and Dong YuICLR 2023
Bag of Tricks for Unsupervised Text-to-Speech, Yi Ren, Chen Zhang, Shuicheng YanACL 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier, Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao |NeurIPS 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech, Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye,NeurIPS 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech, Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao
👄 Talkingface Generation
ICLR 2023
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis, Zhenhui Ye, Ziyue Jiang`, Yi Ren, Jinglin Liu, Jinzheng He, Zhou ZhaoAAAI 2022
Parallel and High-Fidelity Text-to-Lip Generation, Jinglin Liu, Zhiying Zhu, Yi Ren, Wencan Huang, Baoxing Huai, Nicholas Yuan, Zhou ZhaoACM-MM 2020
FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire, Jinglin Liu, Yi Ren, Zhou Zhao, Chen Zhang, Baoxing Huai, Jing Yuan
📚 Machine Translation
ACL 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation, Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin and Zhou ZhaoICLR 2023
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation, Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou ZhaoACL 2020
SimulSpeech: End-to-End Simultaneous Speech to Text Translation, Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Qin Tao, Zhou Zhao, Tie-Yan LiuICLR 2019
Multilingual Neural Machine Translation with Knowledge Distillation, Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu
🎼 Music Generation
ACM-MM 2020
PopMAG: Pop Music Accompaniment Generation, Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
🧑🎨 Generative Model
ICLR 2022
Pseudo Numerical Methods for Diffusion Models on Manifolds, Luping Liu, Yi Ren, Zhijie Lin, Zhou Zhao ||