1. What do RI agents do? Do they try to learn to predict things for example?
Yes, they learn to predict if some particular sequences of behaviors can successfully be enacted in some particular contexts. To do so, they simultaneously learn categories of behaviors and categories of contexts. Precisely, they represent contexts by the behaviors that they afford. Then, they tend to enact complex behaviors when they see that the context affords these behaviors. This results in exhibiting a form of intrinsic motivation of being "in control" of their activity.
2. What does it buy me? If I adopt your RI approach, and I'm making a practical robot system to work in a home environment let's say - what advantage will I get by using your system? Give me a specific example.
Mostly, you will have fun interacting with a robot that looks self-motivated to you. It won't be more useful to you than your cat or your dog, but people still pay a lot to have cats and dogs. On the long term, it will open the way towards more intelligent systems. Cognitive theories have strong arguments that we must go through this if we want to move on towards more intelligent systems anyway. Knowledge must be grounded in experience of interaction.
3. What about the existing community that works on intrinsic motivation? Oudeyer, Singh/Barto, Schmidhuber, haven't they already got techniques for making self-motivated robots? They also talk about curiosity. What's special about your RI approach?
None of these methods allow designing self-programming agents. There is self-programming when the data that is learned by the agent is executable data. That is, the agent executes previously learned programs in appropriate contexts. Self-programming is a crucial aspect to create robots that look intrinsically motivated because it is what makes the robot behave as if it had free will, (as opposed to executing pre-programmed behaviors). Two instances of robots with the same initial algorithm develop different behaviors and make different choices depending on their own individual experience.
4. Self-programming is interesting, but how does it differ from genetic programming or inductive logic programming which can also learn programs?
In genetic programming, new programs are created from one generation of agents to the next, but each agent does not acquire new programs during its individual life. That is, genetic programming implements phylogenetic evolution whereas RI implements ontogenetic evolution. Note that the two approaches could be combined. Also, most genetic programming techniques only recombine pre-defined programs. If you know a genetic programming technique that truly creates new programs by assembling individual instructions from scratch, please signal it to me.
As far as I can tell, inductive logic programming works in a formalized problem space, and would not apply to agents without ontological assumptions about their environment.
5. I'm not clear on where the thing learns a program for itself, or where the human designers craft the next version of the agent.
RI agents begin with a predefine set of possibilities of interaction, called "primitive interactions". This is the set of possible interactions that the robot can experience given its motors and sensors. The environment affords regularities in sequences of interactions that can be enacted. The possibilities of interaction and their regularities define a specific "interactional domain". From one version of the agent to the next, we test increasingly complex relational domains.
Through exploring a given interactional domain, the agent learns episodes of interactions that capture regularities. Autonomously finding useful episodes is a hard problem known as "automatic segmentation of sequences". We have found solutions to address this problem but are still exploring new ones. Additionally, the agent must organize episodes hierarchically in memory to permit open-ended incremental learning. Again we have solutions to that but still need to improve them. These are the improvements that the agent does not acquire by itself, they are improvements in our algorithms that we make from one version of the agent to the next.
Once the agent has learned useful episodes, it tends to reenact them as a single "composite interaction" when it finds contexts in which it believes they can be successfully enacted (that's the self-programming effect). Contexts are represented a set of interactions (primitive or previously learned composite). That is, the agent "perceives" the world as a set of affordances. We improve this capacity of representation from one version of the agent to the next. Finally, the agent must learn to successfully reenact episodes not only in time but also in space. This is also a hard problem that we are still exploring. We cannot fully anticipate what will happen during the life of an instance of the agent (its ontogenetic development). We have to run it and observe how it works, and improve its working in the next version. This is why we are progressing slowly and incrementally.
6. You say that traditional machine learning algorithms treat the agent's input as if it conveyed direct ontological information about the world. I don't agree. The reinforcement Learning approaches tend to learn things based on the evidence from experience, and can learn probabilities over what the expected future experiences might be. They adjust whenever things turn out differently to what was expected. "Direct ontological information about the world" does not seem to be assumed by modern AI for the most part, as far as I can see. I'd call it something more associated with old-fashioned AI. Modern AI uses a lot of probabilistic approaches, which assume the real world is not accessible, and we just make some guesses, which are adjusted with every new bit of evidence. Consider "bag of words" things, e.g. in computer vision, or "topic model". Or even POMDPs. The general assumption made is the reality of the world is not accessible.
POMDPs are the perfect example because it is well formalized. In POMDPs, the agent's input (called "observation") is partial and noisy; this is true. However it is still a function of the environment's state. This becomes very clear when your reduce the agent's input to a single bit for the sake of the argument: you have a subset S0 of the set of states S that is observed as "0", and the states belonging to the complementary subset S1 are observed as "1" (if you implement stochastic noise, some states of S0 will occasionally be observed as "1" and conversely, but that does not change the argument). No reinforcement-learning algorithm can generate interesting behaviors in a POMDP in which the set of states S is large and in which the observation is a single bit. It is widely acknowledge that POMDPs don't "scale up".
In contrast, you can design an RI agent with a single bit input. In RI, you don't design this bit as a function of the state but as a response to an experiment initiated by the agent. In short, this bit tells the agent whether the experiment succeeded or failed. However, the agent doesn't know what the experiment means. An RI agent only learns (discovers, records, exploits) regularities in experiments (its experience interacting with the world). The agent's input is not a direct function of the state: the same state may generate input 0 or 1 depending on the experiment performed by the agent (again, you can add stochastic noise, this does not change the argument).
7. What do you mean by "learning by experiencing versus learning by registering"? A POMDP environment seems to me observable - you have observations. You don't know how those correspond to states, but you still have observations. But something not observable - can you give me an example?
That's really all about the way the "input data" is implemented. In POMDP, the input data reflects the state: a given state induces a given observation (partially, and set apart the noise). This is why you can call it an "observation of the state". You don't know how observations correspond to states but they do (statistically), and you assume that they do, so you are trying to learn how they do.
In learning by experiencing the "input data" does not correspond to the state at all (not even statistically). If your algorithm tries to learn how the input data corresponds to the state then it will not learn anything because the input data does not correspond to the state at all. That's what i mean by saying that an RI agent cannot "observe" the environment but can only "experience" it. In the same state, the input data depends on the previous agent's output.
8. Doesn't sense data also statistically correspond to the state, even if a robot moves randomly around the world?
You think it does because intuitively you think that you receive data that informs you about the state of the world, but this intuition is misleading. Books of psychology of perception (e.g., O'Regan's book "Why red does not sound like a bell?") explain well why you should renounce this naive intuitive view of perception.
9. Is that really true? Sounds odd - won't there be some statistical correlation between the input data and the state of the world?
Imagine that the agent has an experiment e1 at its disposal that returns the input data "1" when the world is in state s and "0" when the world is in any other state. Now imagine that the agent also has another experiment e2 at its disposal that returns input data "0" when the world is in state s and "1" when the world is in any other state. Then the input data does not correlate with the state of the world at all.
10. But why should I implement such a bizarre agent?
Precisely because there is no theory that supports the hypothesis that animals' sensory system reflects the state of reality. On the contrary, constructivist theory suggests that making this hypothesis would be incorrect because framing the input as a representation of reality implies committing to a predefined model of reality.
11. I am annoyed by the fact that, in RI, interactions have a predefined numerical valence. It seams that it would be easier to not make this hypothesis and program the agent to seek to "get a full stomach" rather than specifying the interactions (ingesting food) that the agent should enact to "get a full stomach".
Reasoning in terms of interactions rather than in terms of states of the world (including the state of the agent's stomach) has the advantage of allowing chaining interactions hierarchically. But more fundamentally, it is consistent with constructivist epistemology and phenomenology, which state that behavior is first, and knowledge is second. See, for example, Glasersfeld's radical constructivism.
12. Ok but then you are dependent on the number of primitive interactions. If this number is too great, things will go wrong. I agree it is different from reinforcement learning but I am not sure it avoids the "curse of dimensionality".
It is true that the system is dependent on the number of primitive interactions, but this is not necessarily a problem. Nothing obliges the agent to exploit all the possibilities of interaction at its disposal. We should clarify what we mean by "things will go wrong". To me, the most important is that the agent does not depend on the complexity of the world itself. The RI approach lets you design agents that generate satisfying behaviors in infinitely complex environments. In our experiments, these satisfying behaviors are still modest but RI allows imagining smarter behaviors in the future. For more about complexity within the constructivist paradigm, see, for example, Alexander Riegler's article "The radical constructivist dynamics of cognition":
"As we can no longer speak of information input and the vicissitude of stimuli, organisms are no longer exposed to information overload as a result of processing the entirely available information. They no longer need to devote their cognitive resources to filter out irrelevant information in order to retain useful knowledge. It becomes clear that even insect brains can accomplish navigational tasks and sophisticated cognitive deeds in nontrivial environments without falling prey to the frame problem. Therefore, cognitive research on perception should not focus on filtering mechanisms and data reduction. [...] Cognitive overload should not be considered a problem of the environment, as it is the case when talking, e.g., about the overload that comes with the information flood on the internet. Perception has to be explored in terms of the organism that performs the perceptive act" (Riegler, 2007).
We gratefully thank Frank Guerin and Stéphane Doncieux for their questions and their participation to this discussion. See other public discussions about this page or ask new questions by clicking on the Google+ Share button:
Back to the Radical Interactionism Home Page.
Last updated August 11th 2014.