Notes on John Vervaeke's "AI: the coming thresholds and the path we must take"
How to make AGI rational
While writing a foundational post on John Vervaeke’s 4P and its relationship to self-knowing and AI, he came out with an important and inspiring video-essay on this very topic. I took notes to digest it better, and I thought it may also help others to see some of his points in writing. I also add my notes to it italicized.

Brief summary of the main points
Large language models (GPT and the like) have crossed many thresholds in our quest for AGI, but they are not conscious, not rational, they cannot care, and cannot become moral agents, because of the lack of embodied personhood and thus the ability to realize what is relevant, what information has meaning.
That said, there is no known technical reason why we could not combine GPT with autopoietic embodied systems (systems that create themselves), and research is already happening in that direction (see for example the works of Michael Levin, Mark Solms, Kevin Mitchell).
The demand from both the sex industry and military will drive the production of embodied intelligences. This is a threshold point that we should manage carefully.
I am in agreement with all but one point: the developmental threshold of embodiment is exceedingly difficult to surmount. Physical systems falter and malfunction when placed into messy and uncertain reality. A humanoid robot that falls and breaks requires expensive and time-consuming repairs. Making progress in this field will be painstakingly slow, similar to the advancement of nuclear fusion, space travel, or self-driving vehicles. Vervaeke’s proposal is wise and significant, but we will have time to figure out how to manage this important threshold point.The Enlightenment project stumbles because it sees rationality as a combination of rules and values, which it is not. To make AI rational, we need to become the most rational we can become. This is how Vervaeke connects AI alignment to self-improvement.
This is an important insight for my aspiration in this blog. I realized that to effectively integrate AI into society requires the integration of myself into my scientific worldview.
Trying to align AI systems by encoding rules and values into them is futile, because this is not how morality and reason works. Reason binds autopoiesis (self-creating systems with an imperative of survival) to accountability: it is how we bind to each other and bind to the world. Reason and morality happen at the participatory level; a machine that knows all the propositions about morality is not moral. We need to build systems that aspire to binding to us and to reality. Only then we can formulate the goal of making them reason and act as a moral agent, with epistemic humility.To make AI wise, we need to
give them opponent processing capacities to manage the various trade-offs they will face,
make them properly embodied so they can be aware of the constraints that their substrate will impose on them, and
make them accountable, let them proliferate, make them social.
Intro
When John Vervaeke suggests that we teeter on the edge of dispair and madness, my thoughts drift to a documentary I watched just yesterday. It was a about an adult daycare center for individuals with mental disorders. There was no mentioning of diagnoses, but it was clear that many were suffering from severe conditions like schizophrenia and autism. While watching them, I felt moved by the sadness and beauty of these strange minds. I wondered how little disruption is needed to make our amazing cognitive meaning-making systems malfunction and become machines living in their own world. When observing their artwork, music, and lyrics, I mused: what if a tiny change in their development history had made them into the artists we worship?
Let’s be careful to predict: neither doom nor blind worship.
Exponential growth usually flattens.
Plausible threshold points are opportunities to steer this, we don’t know the constraints yet. Instead of bold predictions, let’s aim at foreseeing threshold points.
History of science is not unlimited growth, there may be discoveries of intrinsic limits of growth of knowledge, for example, fundamental limits of mind and its interactions with matter.
GPT is a condensed collective intelligence, a flexible interface to the distributed cognition we accumulated on the internet in textual form.
System collapse: complex systems, real emergence (= real uncertainty), systems become more complicated, bureaucratizing, quadratic number of connections between entitities, exponential number of subsystems, management becomes as hard as solving a new problem.
AI can surpass us, but we have no reason to believe in exponential growth.
We may be peak biology, one possible reason why we developed culture, to push beyond. We teeter on the edge of dispair and madness. As these machines approach these possibly fundamental limits, they also get unstable. Creative halluciations vs. truthfulness already seem to be a true trade-off. Quantity of intelligence does not prevent us to go mad: collecive consciousness can become crazy, think about nazi Germany or Stalinist Russia.
These machines have to become self-directing and self-monitoring. You don’t want to have a hallucinatory higher-level system monitoring and directing lower level systems.
A 100B parameter system has to track 100B parameters. They will be finite in a very important way.
They are net yet AGI, not conscious, which means we face plausible threshold points. If we understand, we can fine-tune our approach to alignment. Why he is doing it: kairos point, we can significantly alter how our society will develop. Pour meth and fuel and accelerant on the fire of the meaning crisis. Will tend us to the wrong responses. Nostalgia will grow for the pre-AI age, then resentment and rage. Religious consequences: fundamentalism. Increasingly apocalyptic. They love each other. Or escapism, I don’t have to worry about it. To Christians: I’m afraid this can become a moment when God is silent.
Another avenue: cargo cult worship of AI. It will distract us from the hard work. Spiritual bypassing. Tragic disillusionment.
Identity politics: danger to our self-identity. But also an opportunity: who are we in the mirror of an AI? The failure points of AI are also the failure points of the left-hemisphere-heavy scientist who is disconnected from the world.
Singularity is not the problem. But AI will challenge our self-understanding and our understanding of the world.
Scientific value (neutral sense) of GPT
Enlightenment era (we are the authors and telos of history) is coming to an end. We did all this to just make technology that either makes us gods or servants, or gets us destroyed by emergent gods? Isn’t this all about human freedom? Freeing from religion has its benefits, but note the irony:
One of the things religion taught us was to get into a relationship with a being wich is bigger (more intelligent, more powerful, etc) than us, and craft human lives that were deeply meaningful within it (basically to learn to accept and live with limited control, limited freedom). We lost the ways to dealing with things that transcends us. The irony is that we lost the guidance to relate to these very machines we create.
Positive value: solution to the silo problem. AlphaGo cannot swim (I am surprised by the example, because swimming, embodiment, is where things advance very slowly, I would put more accent on the problems that an unembodied intelligence can generate). GPT is general problem solver.
Insufficiency of propositional knowing. Just because they can spit out all the propositional knowledge on ethics, they are not moral agents.
Crystallized intelligence: how to use your knowledge - GPT has it. Fluid intelligence: attentions, working memory, consciousness, ability to dynamically couple well to your environment - GPT doesn’t have it. LLMs have no perspepctival knowing, even though they can generate propositions about it.
Generating intelligence doesn’t generate rationality (just think about the many examples of highly intelligent people whose irrational lives you would not want to live). As hallucinations are controled, speed decreases (to be verified; my experience is that it definitely decreases creativity which would be needed to adapt to the unknown). 70% of rationality is not explained by intelligence. Intelligence without rationality is deeply self-deceiving.
Relevance realization: recursive relevance realization emerges in deep learning. (I have not made up my mind on this, I am not sure. Transformers, even LSTM, seems to be relevance realizors, but they are limited.) Lot of trade-offs, exploration-exploitation, monitoring your cognition and tasking in the world, efficiency-resilience. Some significant dimensions of RR are missing from LLMs.
These machines presuppose RR. We don’t generate text randomly. We encode epistemic relationships into statistical dependence of terms. We generate RR. (Written propositions on the internet exist because someone thought that they were worthy to write down.) Labels, databases, training data. How we organize access to it on the internet. RLHF. They presuppose RR, they don’t explain it.
Adversarial strategies and examples. Uneven performance across domains.
No RR in highly uncertain novel situations, no reflective abilities, no self-consciousnes.
Semantic information: meaningful, we understand it, we can form ideas about it. Paper in link. Technical information becomes semantic information when TI is causally necessary to maintain the agent’s own existence. That’s why they put autonomy and agency in the title. Meaning is meaning to an autopoietic system, one that is creating itself. GPT ontologically cannot care. In can only pantomime care.
This is a threshold point. We’ll be able to connect them to autopoietic systems. Sexrobots. Military. (I would add self-driving car, which is also what shows how hard it is. My take on this is that it is much slower to develop. Most early science fictions were about robots, there is a reason we don’t see them.)
Philosophical dimension
LLMs are our children, they are us. GPT4 is our collective cognition packaged into a highly usable interface. Not to diminish the technical virtuosity that went into building them. From a philosophical point of view.
Rationality is caring about realness: truth, power, presence, belonging, for agents that are embedded in an arena. Being in touch. Reducing self-deception. Realizing semantic information important to survival.
Trade-off relationships are tuned in an environment, uncertain, complex, you cannot set the coefficients apriori. Right now GPT’s arena is our own language which is not a good model for the world. LLMs don’t care about the truth, self-deception, deceiving others, precedents set by previous rational agents, petitioning future rational agents to find their current actions rational. Being expert in moral reasoning does not make you a moral agent. Being expert in math does not make you rational. How you care is a fundamental aspect of your rationality.
What predicts our rationality is “need for cognition”, a personality trait: you create problems for yourself that you seek to solve. Machines currently lack it. Self-prompting? He talks about the art form of prompting (I think prompting is here to stay and will become the new paradigm of engineering, precisely because we cannot control these systems in a complete way. Taming, parenting, herding, steering.)
System monitoring is way less complex than the system it is monitoring. Measuring the number of hallucantions. Making it efficient, but it will make it less resilient, loses looseness to adapt to new environments (same as I observe the effect of RLHF). Infinite regress: monitoring level starts generating hallucinations, confirmation bias. But significant because it recognizes that we need rationality on the top of intelligence.
The goal: make LLMs care about truth and self-deception. The magic wand of intelligence won’t make the rationality problem go away. Unavoidable trade-offs.
Rationality cares about rationality. Aspirational aspect. Across all the kinds of knowing. Aspire to wisdom.
Trade-offs: bias-variance. Ignore data that contradicts your theory: discard it as measurement noise. Same as confirmation bias: only look at data that confirms current beliefs. So let’s make our inference machine more flexible, but then it can fall to pray to take real measurement noise as signal, and fit the theory on noise. It will not generalize: we’re picking up patterns in the data that are not in the general population. Regularization techniques to distrupt overfitting, like we do with dreams and psychedelics. Comes the possibility of madness. AI will have to dream, there is no final solution to the bias-variance tradeoff.
We care for our precious resources.
Ability of self-explaining, the Socratic project of self-knowledge. The speed and scope of grasp will be proportional to the speed and scope of generating self-deception.
Rationality: not just logical argumentation, it is also authority, what you care about, what you stand for, what you recognize you are responsible to, accountability. Being responsible for and to normative standards, coming from us. Being alive is to generate standards that you bind yourself to.
Enlightenment stumbles because it sees rationality as a combination of rules and values, which it is not. To make AI rational, we need to become the most rational we can become. This is how he (and I) connects AI alignment to self-improvement. We have to make it our core aspirational thing.
Reason binds autopoiesis to accountability, that is the main way to think about it. It is how we bind to each other to bind to the world. Loving wisely. How to make machines to aspire to love wisely? That will make them moral being. Aspire to bind to good, true and beautiful. Don’t try to encode within them rules and values. (I realized here the depth and significance of the long argument on the end of Enlightenment. I came to the same conclusion: to align AI we first have to align the AI scientist).
What would it be for these machines to flurish for themselves. If not, we can’t say they are rational. Only a person-making machine can be a moral agent. They will have epistemic humility. Humbled in front of the Neoplatonic One.
What to do to make AI wise
Give them opponent processing capacities to manage the various trade-offs they will face.
Make them properly embodied. Let them be aware of the constraints that their substrate will impose on them.
Make them accountable, let them proliferate, make them social. They will be different from each other because the decisions about the trade-offs will be environmentally determined. They will need to be moral beings. Rationality and sociability are bound up together. They will need someone genuinly other to tell them when they self-transcend.
These threshold points will put enourmous pressure to cultivate our spirituality. We cannot “parent” these AIs unless we align ourselves.
These beings are coming alive without magic. How will traditional religions handle this?