Curriculum Learning

Curriculum Learning: Theories, Approaches, Applications, Tools, and Future Directions in the Era of Large Language Models Xin Wang, Yuwei Zhou, Hong Chen, Wenwu Zhu Media and Network Lab, Tsinghua University

# Abstract

This tutorial focuses on curriculum learning (CL), an important topic in machine learning, which gains an increasing amount of attention in the research community. CL is a learning paradigm that enables machines to learn from easy data to hard data, imitating the meaningful procedure of human learning with curricula. As an easy-to-use plug-in, CL has demonstrated its power in improving the generalization capacity and convergence rate of various models in a wide range of scenarios such as computer vision, natural language processing, data mining, reinforcement learning, etc. Therefore, it is essential introducing CL to more scholars and researchers in the machine learning community. However, there have been no tutorials on CL so far, motivating the organization of our tutorial on CL at WWW 2024. To give a comprehensive tutorial on CL, we plan to organize it from the following aspects: (1) theories, (2) approaches, (3) applications, (4) tools and (5) future directions. First, we introduce the motivations, theories and insights behind CL. Second, we advocate novel, high-quality approaches, as well as innovative solutions to the challenging problems in CL. Then we present the applications of CL in various scenarios, followed by some relevant tools. In the end, we discuss open questions and the future direction in the era of large language models. We believe this topic is at the core of the scope of WWW and is attractive to the audience interested in machine learning from both academia and industry.

Keywords: Curriculum Learning, Machine Learning Paradigm, Training Strategy, Machine Learning Library and Tool, Large Language Models

# Topic and Relevance

# Description

Curriculum learning is a machine learning strategy that trains a model from easy to hard, mimicking the way that humans learn with curricula. It can guide and denoise the machine learning process, thereby accelerating model convergence and improving model generalization ability. Bengio et al. first give its formal definition and propose a simple method named Baby Step. After that, various methods have been continuously emerging, including Self-Paced Learning, Transfer Teacher, Reinforcement Learning Teacher, and others. The key components of curriculum learning include a difficulty measurer to tell what is hard to learn and a learning scheduler to decide when to learn the harder part. Via this tutorial, we are going to depict a comprehensive developing skeleton of curriculum learning.

# Scope

The scope of the tutorial includes theories, approaches, applications, tools and future direction of curriculum learning. We will try our best to comprehensively cover all relevant aspects and advocate novel, high-quality research findings of CL.

# Importance

We believe this tutorial is important and necessary to be included in the tutorial program at WWW 2024 for two-fold reasons.

CL is a research topic worthy of studying, which can help models to generalize better and converge faster and is always easy to use.
This tutorial can ease the usage of CL both theoretically and practically, helping audiences to form potentially novel solutions for scenarios involving difficulty-measurement and noisy data with CL.

# Relevance to WWW

The Web Conference is a premier venue to present and discuss progress in research, development, standards, and applications of the topics related to the Web. With the development of web technology, machine learning and many other relevant fields methods have also been applied to it. Curriculum learning, as an easy-to-use training strategy of machine learning, enables to address multimedia data with noise, and data collected from the web often comes with noise more or less. Therefore, CL can be an essential technology to be employed when training a machine learning model using tremendous data from the Web, and thus be a highly relevant topic to this conference.

# Eligibility

Research Achievements. As the presenters of this tutorial, both Wenwu Zhu and Xin Wang have been deeply involved in CL research with a considerable number of recent publications, including one survey paper on general curriculum learning in TPAMI~(1), one survey paper on curriculum graph machine learning in IJCAI~(2), and several technical papers in top-tier conferences covering multimodal learning~(3), neural architecture search~(4-5), meta-learning~(6), video grounding~(7-8), combinatorial optimization problem~(9) and recommendation~(10-12) as well as an open-source library~(13).

Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning. In TPAMI, 2021.
Haoyang Li, Xin Wang, and Wenwu Zhu. Curriculum machine learning on graphs: A survey. In IJCAI, 2023.
Yuwei Zhou, Xin Wang, Hong Chen, Xuguang Duan, and Wenwu Zhu. Intra- and Inter-Modal Curriculum for Multimodal Learning. In ACM Multimedia, 2023.
Yijian Qin, Xin Wang, Ziwei Zhang, Hong Chen, Wenwu Zhu, Multi-task graph neural architecture search with task-aware collaboration and curriculum. In NeurIPS, 2023.
Yuwei Zhou, Xin Wang, Hong Chen, Xuguang Duan, Chaoyu Guan, and Wenwu Zhu. Curriculum-nas: Curriculum weight-sharing neural architecture search. In ACM Multimedia, 2022.
Yudong Chen, Xin Wang, Miao Fan, Jizhou Huang, Shengwen Yang, and Wenwu Zhu. Curriculum meta-learning for next poi recommendation. In ACM SIGKDD, 2021.
Xiaohan Lan, Yitian Yuan, Hong Chen, Xin Wang, Zequn Jie, Lin Ma, Zhi Wang, and Wenwu Zhu. Curriculum multi-negative augmentation for debiased video grounding. In AAAI, 2023.
Houlun Chen, Xin Wang, Xiaohan Lan, Hong Chen, Xuguang Duan, Jia Jia, Wenwu Zhu. Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding. In ACM Multimedia, 2023.
Zeyang Zhang, Ziwei Zhang, Xin Wang, and Wenwu Zhu. Learning to solve travelling salesman problem with hardness-adaptive curriculum. In AAAI, 2022.
Hong Chen, Yudong Chen, Xin Wang, Ruobing Xie, Rui Wang, Feng Xia, and Wenwu Zhu. Curriculum disentangled recommendation with noisy multi-feedback. In NeurIPS, 2021.
Xin Wang, Zirui Pan, Yuwei Zhou, Hong Chen, Chendi Ge, and Wenwu Zhu. Curriculum co-disentangled representation learning across multiple environments for social recommendation. In ICML. PMLR, 2023.
Zihao Wu, Xin Wang, Hong Chen, Kaidong Li, Yi Han, Lifeng Sun, Wenwu Zhu. Diff4Rec: Sequential Recommendation with Curriculum-scheduled Diffusion Augmentation. In ACM Multimedia Brave New Idea, 2023.
Yuwei Zhou, Hong Chen, Zirui Pan, Chuanhao Yan, Fanqi Lin, Xin Wang, and Wenwu Zhu. Curml: A curriculum machine learning library. In ACM Multimedia, 2022.

Previous Tutorials. The presenters have given 10 tutorials on machine learning related topics over the past 4 years, with the following titles and presenting conferences.

"Disentangled Representation Learning for Multimedia" in ACM Multimedia 2023.
"Towards Out-of-Distribution Generalization on Graphs" in IJCAI 2023.
"Towards Out-of-Distribution Generalization on Graphs" in ACM Web Conference 2023.
"Video Grounding and Its Generalization" in ACM Multimedia 2022.
"Disentangled Representation Learning: Approaches and Applications" in IJCAI 2022.
"Out-of-Distribution Generalization and Its Applications for Multimedia" in ACM Multimedia 2021.
"Automated Machine Learning on Graph" in ACM SIGKDD 2021.
"Meta-learning and AutoML: Approaches and Applications" in IJCAI 2020.
"Multimedia Intelligence: When Multimedia Meets Artificial Intelligence" in ACM Multimedia 2020.
"Meta-learning and Automated Machine Learning for Multimedia" in ACM Multimedia 2019.

Teaching. Xin Wang and Wenwu Zhu are faculties at department of computer science and technology, Tsinghua University. They have rich experiences in teaching both undergraduate and graduate level courses.

# Style

It is a lecture-style half-day tutorial (lasting 3 hours).

# Content

# Introduction

Curriculum learning (CL) has continuously gained attention since its first advent from Bengio et al. (Bengio et al. 2009). CL borrows the idea of human learning curricula from easy content to hard content, forming a general training strategy for various machine learning models and applications (Wang, Chen, and Zhu 2021). Given that CL can help models to generalize better and converge faster (Bengio et al. 2009; Gong et al. 2016; Weinshall, Cohen, and Amir 2018), researchers have proposed numerous CL algorithms and shown their effectiveness in a wide range of tasks. Therefore, we believe that it is necessary to introduce CL to more researchers in machine learning community and provide an overall picture of CL, which includes comprehensible and elaborate answers to the following questions: (1) Theories: what are the definitions of CL and why are they effective? (2) Approaches: what methods should be included? (3) Applications: how to design different curricula for different scenarios? (4) Tools: what tools are available for ease of use and understanding of CL?

Figure 1: Illustration of curriculum learning concept (Wang, Chen, and Zhu 2021).

# Theories

A curriculum is a sequence of training criteria over $T$ steps: $C = ⟨ Q_{1}, \dots, Q_{t}, \dots, Q_{T} ⟩$ . Each criterion $Q_{t}$ is a reweighting of the target distribution $P (z)$ : $Q_{t} (z) \propto W_{t} (z) P (z), \forall z \in D$ , where $z$ is an example and $D$ is the training set and the following three conditions are satisfied:

The entropy of distributions increases, i.e., $H (Q_{t}) < H (Q_{t + 1})$ .
The weights increase, i.e., $W_{t} (z) \leq W_{t + 1} (z), \forall z \in D$ .
$Q_{T} (z) = P (z)$ .

Since the concept of Original Curriculum Learning was formally proposed as above, the academic community follows and further extends the definition of CL within the spirit of "training from easier data (tasks) to harder data (tasks)", i.e., relaxing the conditions in its definition to enable more flexible CL strategies. For example, Data-level Generalized Curriculum Learning is defined as a sequence of reweighting of target training distribution over $T$ training steps, discarding the three conditions. Generalized Curriculum Learning is defined as a sequence of training criteria over $T$ training steps, discarding the three conditions and the definition of $Q_{t}$ .

Another important topic in theory is why on earth does this human-curriculum-like training strategy work? Basically, existing analyses uncover the essence of CL from the perspectives of optimization problem and data distribution, based on which we can further summarize the two main motivations for applying CL: to guide and to denoise.

To begin with, from the perspective of optimization problem, Bengio et al. (Bengio et al. 2009) initially point out that CL can be seen as a particular continuation method (Allgower and Georg 2012), which shares the same spirit with simulated annealing to provide a sequence of optimization objectives starting with a heavily smoothed objective throughout the training. In this way, CL guides the training towards better regions in parameter space.

On the other hand, researchers also analyze the CL mechanism from the perspective of data distribution. In the CL setting, the noisy data corresponds to harder examples in the datasets while the cleaner data form the easier part. Since CL strategy encourages training more on the easier data, an intuitive hypothesis is that CL learner wastes less time with the harder and noisy examples to achieve faster training, reducing the negative impacts from low-confidence noisy examples, thus denoising the training process.

Figure 2: Theoretical analysis on curriculum learning (Wang, Chen, and Zhu 2021).

# Approaches

A general framework for curriculum design consists of two core components: Difficulty Measurer and Training Scheduler, which decide two things respectively: 1) What kind of training data is supposed to be easier than other data? 2) When should we present harder data for training, and how much more?

According to these two components, we can divide existing CL methods into two types: when both the Difficulty Measurer and Training Scheduler are designed by human prior knowledge with no data-driven algorithms involved, we call the CL method predefined CL (Spitkovsky, Alshawi, and Jurafsky 2010). If any (or both) of the two components are learned by data-driven models or algorithms, then we denote the CL method as automatic CL (Kumar, Packer, and Koller 2010; Weinshall, Cohen, and Amir 2018; Matiisen et al. 2019; Shu et al. 2019; Saxena, Tuzel, and DeCoste 2019; Wang et al. 2020; Zhou, Wang, and Bilmes 2020; Castells, Weinzaepfel, and Revaud 2020; Sinha, Garg, and Larochelle 2020; Kong et al. 2021). All representative approaches of both categories will be reviewed and discussed in this tutorial.

# Applications

CL has a wide range of applications. In this tutorial, we discuss them in terms of six aspects:

Combinatorial optimization problems: traveling salesman problem, secretary problem, etc.
Computer vision: image classification, object detection, semantic segmentation, face recognition, image generation and translation, video processing, etc.
Natural language processing: text classification, machine translation, question answering, etc.
Graph machine learning: node classification, graph classification, link prediction, etc.
Robotics: navigation, control, games, etc.
Other cutting-edge areas: meta-learning, continual learning, neural architecture search, etc.

# Tools

This tutorial will also introduce our contributed Curriculum Machine Learning library, CurML, which is the first public open-source library for CL. We implement a considerable number of existing CL algorithms through a unified and extensive framework, which is easy to use and extend.

# Future Directions

Evaluation benchmarks. In existing literature, the datasets and metrics are diverse in different applications. It is necessary but challenging to design a unified dataset with unified metrics to evaluate and compare the CL algorithms.

More advanced theories. Existing theoretical analyses provide different angles for understanding CL. Nevertheless, more theories are still required to guarantee the effectiveness of CL, and the application of CL in a specific task.

Application on LLM. Previous research on CL has mainly focused on smaller models. But with the rise of LLMs in today's landscape, there's a pressing question: How can we effectively combine CL with LLMs? Specifically, can CL help with tasks like pre-training, fine-tuning, and prompting in LLMs, ultimately speeding up learning, improving generalization, and making the models more adaptable to new tasks or domains? We'll explore some examples and analyze how CL can be applied to LLMs.

# Q&A

This tutorial includes 15 minutes for questioning and answering. We welcome any questions about CL from the audiences.

# Audience and Background

Target Audience. This tutorial will be highly accessible to the whole machine learning community, including researchers, scholars, engineers and students with related backgrounds in computer vision, natural language processing, graph machine learning, reinforcement learning, meta-learning, etc. The expected number of attendees will be around 100 for this tutorial.

Prerequisite Knowledge. This tutorial will be highly accessible to the whole machine learning community, including researchers, scholars, engineers and students with related backgrounds on computer vision, natural language processing, graph machine learning, reinforcement learning, meta-learning, etc., and it is self-contained and designed for introductory and intermediate audiences. No special prerequisite knowledge is required to attend this tutorial.

Potential Learning Outcomes.

To promote the importance of CL in advancing machine learning research, as well as reduce the marginal cost of studying CL.
To encourage the audience to combine their research with CL, which can be a possibly promising solution for problems involving difficulty-measurement and noisy data.
To push forward the development of CL research by pointing out future directions.

To summarize, CL is a promising research area that will have a great effect and positive impact on machine learning, so we believe it can benefit audience interested in machine learning a lot and inspire them to produce exciting research results with this topic.

# Tutorial Materials

This tutorial provides detailed materials from a particularly relevant IEEE TPAMI survey by the organizers (https://arxiv.org/abs/2010.13166 (opens new window)) and an open-source code library by the organizers for hands-on coding demonstrations (https://github.com/THUMNLab/CurML (opens new window)). There are no copyright issues.

# CVs of presenters

Wenwu Zhu (https://scholar.google.com/citations?user=7t2jzpgAAAAJ (opens new window)) is currently a Professor at the Department of Computer Science and Technology, Tsinghua University, and the Vice Dean of National Research Center for Information Science and Technology. His email is wwzhu@tsinghua.edu.cn. Prior to his current post, he was a Senior Researcher and Research Manager at Microsoft Research Asia. He was the Chief Scientist and Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He received his Ph.D. degree from New York University in 1996. His research interests include graph machine learning, curriculum learning, data-driven multimedia, big data. He has published over 400 referred papers, and is inventor of over 80 patents. He received ten Best Paper Awards, including ACM Multimedia 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001 and 2019. He served as EiC for IEEE Transactions on Multimedia (2017-2019). He served as the Chair of the steering committee for IEEE Transactions on Multimedia (2020-2022) and member of the steering committee for IEEE Transactions on Mobile Computing (2007-2010). He serves as General Co-Chair for ACM Multimedia 2018 and ACM CIKM 2019. He is an AAAS Fellow, IEEE Fellow, ACM Fellow, SPIE Fellow, and a member of Academia Europaea.

Xin Wang (http://mn.cs.tsinghua.edu.cn/xinwang/ (opens new window)) is currently an Assistant Professor at the Department of Computer Science and Technology, Tsinghua University. His email is xin_wang@tsinghua.edu.cn. He got both of his Ph.D. and B.E degrees in Computer Science and Technology from Zhejiang University, China. He also holds a Ph.D. degree in Computing Science from Simon Fraser University, Canada. His research interests include curriculum learning, multimedia intelligence and recommendation in social media. He has published over 100 high-quality research papers in top conferences and journals including IEEE TPAMI, IEEE TKDE, ICML, NeurIPS, ACM KDD, WWW, ACM SIGIR, ACM Multimedia, etc. He is the recipient of 2020 ACM China Rising Star Award and 2022 IEEE TCMC Rising Star Award.

Hong Chen is currently a Ph.D. student at the Department of Computer Science and Technology, Tsinghua University. His email is h-chen20@mails.tsinghua.edu.cn. He received his B.E. degree from the Department of Electronic Engineering, Tsinghua University. His main research interests include curriculum learning and auxiliary learning.

Yuwei Zhou is currently a Ph.D. student at the Department of Computer Science and Technology, Tsinghua University. His email is zhou-yw21@mails.tsinghua.edu.cn. He received his B.E. degree from the Department of Computer Science and Technology, Tsinghua University. His main research interests include curriculum learning and multimodal learning.