What is Consciousness? - Philip Dorrell

Self-Modifying Intelligence

Hypothesis: Human Consciousness is a system of self-modifying intelligence that contains mechanisms for preventing instability.

What we call Free Will is the ability of human consciousness to make a small temporary modification to itself.

If the same temporary modification is made repeatedly, then it will gradually become a permanent modification.

However, the ability of the system to make temporary modifications is constrained by the requirement that the modifications have to result in better than expected achievement of biological goals, ie better than what would have been expected if the modifications were not made.

If the achievement of biological goals is worse than expected, then the ability of the system to self-modify is reduced, and if it is reduced close to zero then the system enters a depressive state.

If the system spuriously convinces itself that it is achieving biological goals better than expected, then it will enter an unstable manic state. This is especially likely to happen if the self-modification has the effect of corrupting the evaluation of goal achievement - in effect the system modifies itself in a way that it is effectively “cheating”.

The problem of Instability

Instability is a fundamental problem for any intelligent self-modifying system.

A self-modifying system can modify itself into a state where it no longer acts in accordance with the defined goals of the system.

Modern LLM-based AI is a form of intelligence. However it is not self-modifying.

If you naively try to create a self-modifying AI based on an LLM, you will encounter the problem of stability.

The Persistent Failure of “Self Help”

A large portion of the “Self Help” industry is based on the following observations:

Many problems in life are happen because a person often or always does what they feel like doing in the moment.
A conscious person can consciously decide to act differently, not according to their default inclinations.
If a person regularly behaves differently, according to their conscious decision to do so, that new way of behaving will eventually become the default.

These observations suggest a simple solution to many of life’s problems:

Determine what you need to do to solve your problems.
Do those things, even though it goes against your default inclinations.

On the one hand, human consciousness does include the ability to self-modify, and that’s its basic feature.

On the other hand, the ability to self-modify has to be strongly constrained, otherwise the system would become unstable.

Any naive self-help plan based on an assumption that a person’s consciousness has an unlimited ability to self-modify will fail when, eventually, it comes up hard against the constraints on self-modification of that consciousness.

Some self-help gurus grudgingly do admit that there are limits as to how much “willpower” a person has, and this somewhat limits the ability of a person to just change themselves into whoever they want to be.

But to come up with a truly effective theory self-help, first we need to better understand what are the constraints on the ability of consciousness to modify itelf.

Introspection

Self-awareness and the ability to introspect are sometimes given as defining characteristics of consciousness.

My hypothesis is that the fundamental characteristic of consciousness is that it is self-modifying.

However, in order to modify something, you have to have some awareness of the thing that you are modifying.

So having self-awareness is a necessary prerequisite for being a self-modifying system.

How to Make a Conscious LLM-based AI

LLM AIs show many signs of what we call intelligence. They can give useful answers to questions. They can write software applications from specifications written in English.

But LLM AIs are not self-modifying.

So according to my hypothesis, they cannot be conscious.

But the question then follows - can we use an LLM AI to make an AI system that is self-modifying?

The first thing to decide is how an LLM AI might modify itself.

The state of an LLM is the result of the following inputs:

Raw training data
The algorithm used to train a raw LLM that can do next token prediction.
Fine-tuning the LLM, including RLHF
The System Prompt

Of all these inputs, by far the cheapest and quickest thing to modify is the System Prompt:

It typically consists of a list of sentences, maybe with some minimal document structure.
Reasonably sophisticated System Prompts might be on 30k words.

Training the initial raw LLM is a very slow and expensive process, so changing the LLM by changing either the training data or the initial training algorithm is going to be slow and expensive.

Fine-tuning operations may be less expense and not as slow as initial training, but they are still probably more expensive and slower than doing an update to the System Prompt.

The other thing about the System Prompt is that it consists of a bunch of sentences, which is exactly the sort of thing that the LLM itself processes.

Also, LLMs have existed for long enough that information about how they work and how they might be changed is already in the training data of newer LLMs, so in that sense LLMs already “know” about the relation between the inputs to constructing an LLM AI and how changes to those inputs might affect the behaviour of an LLM.

LLM Self-Modification via System Prompt Modification

All this leads to a basic plan:

We want to make an LLM AI system self-modifying.
The easiest part of an LLM AI to modify is the system prompt.
Therefore, construct an LLM that modifies it’s own system prompt.

Of course just telling an LLM to change it’s own system prompt doesn’t give the LLM any information about what kind of changes to make, so it might just make random changes.

To make the plan coherent, we need to specify some overall goal that the LLM AI is trying to achieve.

Then we can tell the LLM -

Here is the overall goal that you are trying to achieve.
Here is your system prompt.
Make a change to your system prompt so that it makes you better at achieving your overall goal.

If the AI is sometimes being asked to make changes to it’s own System Prompt in order to better achieve it’s overall goal, then the System Prompt will itself have to include some guidance about what to do when such a request is made.

But if the AI then changes that part of the System Prompt, that particular guidance might get lost.

On not being able to go back and try doing it a different way

One difference between human learning and AI learning is that in many situations a person has to make a decision, and they can observe the consequences, but there is no way to go back and make a different decision in exactly the same situation. Whereas with AI one can simply run multiple A/B style tests to find out if a particular change to something (such as the System Prompt) makes things on average worse or better.

So it may be that some aspects of human consciousness are to deal with the limitation that for most people in most situations there is no way to simply duplicate the person in order to test the result of making some specific change to how a person would act in that situation.

Whereas when we have an AI system implemented as a file of data being processed by an executing software application, such duplication is relatively trivial to do.

So it may ultimately be the case that AI doesn’t need to be conscious, because it can achieve the same results in other ways.