Learning by experiencing versus learning by registering

You are here : Olivier Georgeon Radical Interactionism Learning by experiencing

This text is based upon the article "Learning by experiencing versus learning by registering" by Olivier L. Georgeon (2014) Constructivist Foundations, 9(2): 211-213 (open peer commentary of "Subsystem formation driven by double contingency" by Bernd Porr and Paolo Di Prodi).

Abstract

Agents that learn from perturbations of closed control loops are considered constructivist by virtue of the fact that their input (the perturbation) does not convey ontological information about the environment. That is, they learn by actively experiencing their environment through interaction, as opposed to learning by registering directly input data characterizing the environment. Generalizing this idea, the notion of learning by experiencing provides a broader conceptual framework than cybernetic control theory for studying the double contingency problem, and may yield more progress in constructivist agent design.

Glasersfeld differentiated radical constructivism from realist epistemology by the relation between knowledge and reality:

"Whereas in the traditional view of epistemology, as well as of cognitive psychology, that relation is always seen as a more or less picture-like (iconic) correspondence or match, radical constructivism sees it as an adaptation in the functional sense" (Glasersfeld 1984:20).

This suggests differentiating constructivist artificial agents from realist artificial agents by the relation between their input data and their environment. As illustrated by Bernd Porr and Paolo Di Prodi's implementation, in cybernetic theory, the agent's input (called perturbation) does not hold an "iconic correspondence" with the environment but rather consists of feedback from the agent's output (called action). In contrast, as we shall develop below, most machine-learning algorithms implement this iconic correspondence because they implement and exploit the agent's input as if it directly characterized the environment, thus representing a direct access to the ontological essence of reality.

Here, we call learning by experiencing those learning mechanisms that implement and exploit input data as feedback from the agent's output, and learning by registering those learning mechanisms that implement and exploit input data as a direct observation of the environment (either a simulated environment or the real world in the case of robots). This formulation complies, for example, with Etienne Roesch et al.'s (2013) formulation that constructivist epistemology considers knowledge as resulting from experience of interaction with the environment, as opposed to existing "in an ontic reality [...] available to registration from the physical world" (Roesch et al. 2013:26).

Partially Observable Markov Decision Process models (POMDP, Kaelbling, Littman & Cassandra 1998) well exemplify learning by registering because they typically formalize the agent's input as a function of the environment's state only. A similar argumentation can show that many other machine-learning approaches learn by registering, even supposedly constructivist approaches based upon schema mechanisms (e.g., Drescher 1991) and many approaches based upon multi-agent systems, such as Roesch et al.'s (2013) agents, as we discussed in our open peer commentary (Georgeon & Hassas 2013).

For the sake of argument, consider a POMDP in which the agent's input (called observation) is reduced to a single bit. A subset S0 of the set S of all the environment's states are observed as "0", and the states in the complementary subset S1 are observed as "1". Because of stochastic noise, some elements of S0 may occasionally be observed as "1" and the other way around. Yet the observation statistically reflects the state of the environment, and the agent's policy generally exploits this assumption to try to construct an internal model of the agent's situation. To our knowledge, there is no POMDP implementation that would exhibit interesting behaviors with as little observation as a single bit when the number of states is great. This limitation is known as the perceptual aliasing problem (Whitehead & Ballard 1991), and is inherent to learning by registering.

Note that some variations of POMDPs have been proposed in which the scope of the observation depends on the previous action, thus involving a form of active perception (e.g., McCallum 1996). However, the observation still reflects the state of the environment, as if the environment was observed through a filter that varied with the action.

In contrast, mechanisms of learning by experiencing implement the agent's input such that it conveys information about the effect of an "experiment" performed by the agent. In the case of a single input bit, this bit indicates one out of two possible outcomes of the experiment. The same particular state of the environment induces different input bits depending on the experiment initiated by the agent. No partitioning of the set of states S can be made according to the input bit because all states may induce input "0" or "1" depending on the experiment. In this case, the learning algorithm must not exploit the agent's input as if it statistically and partially corresponded to the state of reality, because it does not. In contrast with learning by registering, there exist single-input-bit learning-by-experiencing agents that exhibit interesting learning behaviors (e.g., Georgeon & Hassas 2013; Georgeon & Marshall 2013).

Besides cybernetic control theory, in 68, Porr and Di Prodi mention other examples of learning by experiencing: Sutton et al.'s (2011) Horde architecture, and our work. Horde relies on a swarm of reinforcement-learning agents to learn hierarchical temporal regularities of interaction through experience. More broadly, learning by experiencing implements a form of conceptual inversion of the perception-action cycle recommended by some authors (e.g., Pfeifer & Scheier 1994; Tani & Nofti 1999). In learning by experiencing, however, calling the input a perception or an observation is misleading because the input does not hold a direct correspondence with reality.

Concerning our approach, we shall clarify that it does not only "act in discrete space," as Porr and Di Prodi wrote in 68. Instead, our agents are indifferent to the structure of their environment's space, which is precisely an advantage of learning by experiencing. We demonstrated that our algorithms could control agents in continuous two-dimensional simulated environments (Georgeon & Sakellariou 2012) and robots in the real world (Georgeon, Wolf & Gay 2013). It is true that our agent's set of possibilities of experience (the relational domain defined by the coupling between the agent and the environment, e.g., Froese & Ziemke 2009) is discrete, but this does not prevent the agent from learning interesting behaviors in continuous space.

Since learning-by-experiencing (LbE) agents do not directly access the state of the environment, they incorporate no reward function or heuristics defined as a function of the state of the environment. This places LbE agents in sharp contrast with reinforcement-learning agents and problem solving agents. Notably, LbE agents even differ from reinforcement-learning agents with an intrinsic reward (e.g., Singh, Barto & Chentanez 2005), which consider some elements of the state of the world to be internal to the agent. As a generality, an LbE agent gives value to the mere fact of enacting interactive behaviors rather than to the state resulting from behaviors. We expect LbE agents to demonstrate that they learn to "master the laws of sensorimotor contingencies" (O'Regan & No 2001). Consequently, as some authors in the domain of intrinsic motivation also argued (e.g., Oudeyer, Kaplan & Hafner 2007), we recommend assessing LbE agent's learning through behavioral analysis rather than through a measure of their performance in reaching specific goals.

In accordance with our view on LbE agent assessment, Porr and Di Prodi assess their agent's learning through behavioral analysis (Section 4). Their agents are motivated to interact with entities present in the environment by controlling sensorimotor loops (approaching food or other agents, 18). For each sensorimotor loop, Porr and Di Prodi define Prediction Utilization as a measure of the agent's commitment to control this loop. We wish to support their effort in specifying this kind of measure. This effort contributes to defining general quantifiers that could be used with other learning-by-experiencing approaches to characterize the agent's engagement in interactive behaviors.

As Porr and Di Prodi noted in 68, simple linear control theory does not realize "the generation of more complex actions, the switching of actions and the sequencing of actions". However, other learning by experiencing approaches tackle these issues. Addressing the double contingency problem with approaches that generate such learning would allow more sophisticated subsystem organization because each subsystem could control more sophisticated interactions than a linear control loop. Therefore, we anticipate that addressing the problem of subsystem formation driven by double contingency within the general framework of learning by experiencing would allow more advances in constructivist agent design.

References

Drescher G. L. (1991). Made-up minds, a constructivist approach to artificial intelligence. Cambridge, MA: MIT Press.

Froese T. & Ziemke T. (2009). Enactive artificial intelligence: Investigating the systemic organization of life and mind. Artificial Intelligence 173(3-4): 466-500.

Georgeon O. & Hassas S. (2013) Single Agents Can Be Constructivist too. Constructivist Foundations 9(1): 40-42.

Georgeon O. & Marshall J. (2013). Demonstrating sensemaking emergence in artificial agents: A method and an example. International Journal of Machine Consciousness 5(2): 131-144.

Georgeon O. & Sakellariou I. (2012). Designing Environment-Agnostic Agents. In proceedings of the Adaptive Learning Agents workshop (ALA), at the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). Valencia, Spain, 25-32.

Georgeon O., Wolf C. & Gay S. (2013). An Enactive Approach to Autonomous Agent and Robot Learning. Third Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics (EPIROB2013). Osaka, Japan.

Glasersfeld E. von (1984). An introduction to radical constructivism. In P. Watzlawick (Ed.), The invented reality (pp. 16-38). New York, NY (USA): Norton.

Kaelbling L., Littman M. & Cassandra A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence 101: 99-134.

McCallum A. (1996). Learning to use selective attention and short-term memory in sequential tasks. In proceedings of the Fourth International Conference on Simulating Adaptive Behavior.

Pfeifer R. & Scheier C. (1994). From perception to action: The right direction? In P. Gaussier and J.-D. Nicoud (Eds.), From Perception to Action (pp. 1-11). IEEE Computer Society Press.

O'Regan J. K. & No A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences 24(5): 939-1031.

Oudeyer P.-Y., Kaplan F. & Hafner V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation 11(2): 265-286.

Roesch E., Spencer M., Nasuto S., Tanay T., & Bishop J.-M. (2013). Exploration of the Functional Properties of Interaction: Computer Models and Pointers for Theory. Constructivist Foundations 9(1): 26-32.

Singh S., Barto A. & Chentanez N. (2005). Intrinsically motivated reinforcement learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds), Advances in Neural Information Processing Systems (pp. 1281-1288). Cambridge, MA: MIT Press.

Sutton R., Modayil J., Delp M., Degris T., Pilarski P. M., White A. & Precup D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In: Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems (AAMAS'11), Volume 2: 761-776. IFAAMAS, Taipei.

Tani J. & Nolfi S. (1999). Learning to percieve the world as articulated : an approach for hierarchical learning in sensory-motor systems. Neural Networks 12: 1131-1141.

Whitehead S. D. & Ballard D. H. (1991). Learning to perceive and act by trial and error. Machine Learning 7(1): 45-83.

Back to the Radical Interactionism Home Page.

Last updated February 19th 2014.