View text source at Wikipedia


Meta-learning (computer science)

Meta-learning[1][2] is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.[1]

Flexibility is important because each learning algorithm is based on a set of assumptions about the data, its inductive bias.[3] This means that it will only learn well if the bias matches the learning problem. A learning algorithm may perform very well in one domain, but not on the next. This poses strong restrictions on the use of machine learning or data mining techniques, since the relationship between the learning problem (often some kind of database) and the effectiveness of different learning algorithms is not yet understood.

By using different kinds of metadata, like properties of the learning problem, algorithm properties (like performance measures), or patterns previously derived from the data, it is possible to learn, select, alter or combine different learning algorithms to effectively solve a given learning problem. Critiques of meta-learning approaches bear a strong resemblance to the critique of metaheuristic, a possibly related problem. A good analogy to meta-learning, and the inspiration for Jürgen Schmidhuber's early work (1987)[1] and Yoshua Bengio et al.'s work (1991),[4] considers that genetic evolution learns the learning procedure encoded in genes and executed in each individual's brain. In an open-ended hierarchical meta-learning system[1] using genetic programming, better evolutionary methods can be learned by meta evolution, which itself can be improved by meta meta evolution, etc.[1]

Definition

[edit]

A proposed definition[5] for a meta-learning system combines three requirements:

Bias refers to the assumptions that influence the choice of explanatory hypotheses[6] and not the notion of bias represented in the bias-variance dilemma. Meta-learning is concerned with two aspects of learning bias.

Common approaches

[edit]

There are three common approaches:[8]

  1. using (cyclic) networks with external or internal memory (model-based)
  2. learning effective distance metrics (metrics-based)
  3. explicitly optimizing model parameters for fast learning (optimization-based).

Model-Based

[edit]

Model-based meta-learning models updates its parameters rapidly with a few training steps, which can be achieved by its internal architecture or controlled by another meta-learner model.[8]

Memory-Augmented Neural Networks

[edit]

A Memory-Augmented Neural Network, or MANN for short, is claimed to be able to encode new information quickly and thus to adapt to new tasks after only a few examples.[9]

Meta Networks

[edit]

Meta Networks (MetaNet) learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization.[10]

Metric-Based

[edit]

The core idea in metric-based meta-learning is similar to nearest neighbors algorithms, which weight is generated by a kernel function. It aims to learn a metric or distance function over objects. The notion of a good metric is problem-dependent. It should represent the relationship between inputs in the task space and facilitate problem solving.[8]

Convolutional Siamese Neural Network

[edit]

Siamese neural network is composed of two twin networks whose output is jointly trained. There is a function above to learn the relationship between input data sample pairs. The two networks are the same, sharing the same weight and network parameters.[11]

Matching Networks

[edit]

Matching Networks learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.[12]

Relation Network

[edit]

The Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting.[13]

Prototypical Networks

[edit]

Prototypical Networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve satisfied results.[14]

Optimization-Based

[edit]

What optimization-based meta-learning algorithms intend for is to adjust the optimization algorithm so that the model can be good at learning with a few examples.[8]

LSTM Meta-Learner

[edit]

LSTM-based meta-learner is to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime. The parametrization allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner (classifier) network that allows for quick convergence of training.[15]

Temporal Discreteness

[edit]

Model-Agnostic Meta-Learning (MAML) is a fairly general optimization algorithm, compatible with any model that learns through gradient descent.[16]

Reptile

[edit]

Reptile is a remarkably simple meta-learning optimization algorithm, given that both of its components rely on meta-optimization through gradient descent and both are model-agnostic.[17]

Examples

[edit]

Some approaches which have been viewed as instances of meta-learning:

References

  1. ^ a b c d e Schmidhuber, Jürgen (1987). "Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook" (PDF). Diploma Thesis, Tech. Univ. Munich.
  2. ^ a b c Schaul, Tom; Schmidhuber, Jürgen (2010). "Metalearning". Scholarpedia. 5 (6): 4650. Bibcode:2010SchpJ...5.4650S. doi:10.4249/scholarpedia.4650.
  3. ^ P. E. Utgoff (1986). "Shift of bias for inductive concept learning". In R. Michalski; J. Carbonell; T. Mitchell (eds.). Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann. pp. 163–190. ISBN 978-0-934613-00-2.
  4. ^ Bengio, Yoshua; Bengio, Samy; Cloutier, Jocelyn (1991). Learning to learn a synaptic rule (PDF). IJCNN'91.
  5. ^ Lemke, Christiane; Budka, Marcin; Gabrys, Bogdan (2013-07-20). "Metalearning: a survey of trends and technologies". Artificial Intelligence Review. 44 (1): 117–130. doi:10.1007/s10462-013-9406-y. ISSN 0269-2821. PMC 4459543. PMID 26069389.
  6. ^ Brazdil, Pavel; Carrier, Christophe Giraud; Soares, Carlos; Vilalta, Ricardo (2009). Metalearning - Springer. Cognitive Technologies. doi:10.1007/978-3-540-73263-1. ISBN 978-3-540-73262-4.
  7. ^ Gordon, Diana; Desjardins, Marie (1995). "Evaluation and Selection of Biases in Machine Learning" (PDF). Machine Learning. 20: 5–22. doi:10.1023/A:1022630017346. Retrieved 27 March 2020.
  8. ^ a b c d Weng, Lilian (30 November 2018). "Meta-Learning: Learning to Learn Fast". OpenAI Blog. Retrieved 27 October 2019.
  9. ^ Santoro, Adam; Bartunov, Sergey; Wierstra, Daan; Lillicrap, Timothy. "Meta-Learning with Memory-Augmented Neural Networks" (PDF). Google DeepMind. Retrieved 29 October 2019.
  10. ^ Munkhdalai, Tsendsuren; Yu, Hong (2017). "Meta Networks". arXiv:1703.00837 [cs.LG].
  11. ^ Koch, Gregory; Zemel, Richard; Salakhutdinov, Ruslan (2015). "Siamese Neural Networks for One-shot Image Recognition" (PDF). Toronto, Ontario, Canada: Department of Computer Science, University of Toronto.
  12. ^ Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. (2016). "Matching networks for one shot learning" (PDF). Google DeepMind. Retrieved 3 November 2019.
  13. ^ Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P. H. S.; Hospedales, T. M. (2018). "Learning to compare: relation network for few-shot learning" (PDF).
  14. ^ Snell, J.; Swersky, K.; Zemel, R. S. (2017). "Prototypical networks for few-shot learning" (PDF).
  15. ^ Ravi, Sachin; Larochelle, Hugo (2017). Optimization as a model for few-shot learning. ICLR 2017. Retrieved 3 November 2019.
  16. ^ a b c d Finn, Chelsea; Abbeel, Pieter; Levine, Sergey (2017). "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks". arXiv:1703.03400 [cs.LG].
  17. ^ Nichol, Alex; Achiam, Joshua; Schulman, John (2018). "On First-Order Meta-Learning Algorithms". arXiv:1803.02999 [cs.LG].
  18. ^ Schmidhuber, Jürgen (1993). "A self-referential weight matrix". Proceedings of ICANN'93, Amsterdam: 446–451.
  19. ^ Hochreiter, Sepp; Younger, A. S.; Conwell, P. R. (2001). "Learning to Learn Using Gradient Descent". Proceedings of ICANN'01: 87–94.
  20. ^ Andrychowicz, Marcin; Denil, Misha; Gomez, Sergio; Hoffmann, Matthew; Pfau, David; Schaul, Tom; Shillingford, Brendan; de Freitas, Nando (2017). "Learning to learn by gradient descent by gradient descent". Proceedings of ICML'17, Sydney, Australia. arXiv:1606.04474.
  21. ^ Schmidhuber, Jürgen (1994). "On learning how to learn learning strategies" (PDF). Technical Report FKI-198-94, Tech. Univ. Munich.
  22. ^ Schmidhuber, Jürgen; Zhao, J.; Wiering, M. (1997). "Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement". Machine Learning. 28: 105–130. doi:10.1023/a:1007383707642.
  23. ^ Schmidhuber, Jürgen (2006). "Gödel machines: Fully Self-Referential Optimal Universal Self-Improvers". In B. Goertzel & C. Pennachin, Eds.: Artificial General Intelligence: 199–226.
  24. ^ Zintgraf, Luisa; Schulze, Sebastian; Lu, Cong; Feng, Leo; Igl, Maximilian; Shiarlis, Kyriacos; Gal, Yarin; Hofmann, Katja; Whiteson, Shimon (2021). "VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning". Journal of Machine Learning Research. 22 (289): 1–39. ISSN 1533-7928.
  25. ^ Greenberg, Ido; Mannor, Shie; Chechik, Gal; Meirom, Eli (2023-12-15). "Train Hard, Fight Easy: Robust Meta Reinforcement Learning". Advances in Neural Information Processing Systems. 36: 68276–68299.
  26. ^ Begoli, Edmon (May 2014). "Procedural-Reasoning Architecture for Applied Behavior Analysis-based Instructions". Doctoral Dissertations. Knoxville, Tennessee, USA: University of Tennessee, Knoxville: 44–79. Retrieved 14 October 2017.
  27. ^ "Robots Are Now 'Creating New Robots,' Tech Reporter Says". NPR.org. 2018. Retrieved 29 March 2018.
  28. ^ "AutoML for large scale image classification and object detection". Google Research Blog. November 2017. Retrieved 29 March 2018.
[edit]