and evolution trends of post-training algorithms for multimodal large models (e.g., RLHF, DPO, Curriculum Reinforcement Learning..., with a deep understanding of multimodal large models and the reinforcement learning post-training technology stack Core...