Author here. One particularly topical observation (topical because HN has recent...

TTPrograms · on Oct 21, 2020

I think you are hiding a key assumption - that an AGI must be capable of performing optimally on a task with Archimedean measure. This is a very strong assumption - performing eps-close to optimal may certainly fall in the definition of AGI, and this might be achieved by a non-Archimedean approximation. Defining AGI as performing optimally on any set of tasks is problematic from a computational theory perspective in general - even with real reward signals.

Furthermore, infinite rewards are not compatible with human behavior. Humans never optimize for a single event at infinite expense w.r.t. other goals.

Animats · on Oct 22, 2020

That an AGI must be capable of performing optimally on a task with Archimedean measure.

If that's correct, this may be just one of those many problems where the optimal solution is far harder than a near-optimal solution. Examples include linear programming and the traveling salesman problem, where a true optimum is NP-hard to find, but you can get very close with far less work.

xamuel · on Oct 22, 2020

Hi, thanks for looking at my paper. I do not assume that an AGI must be capable of performing optimally on all tasks in general--indeed, that's quite impossible. When measuring the performance of RL agents, one must come up with some way of aggregating performance across many environments, but that's beside the point of this paper. The point of this paper is that if you're forced to use real numbers as rewards, you can't even communicate all environments to the agent without misleading the agent. Whether the agent could perform well or poorly in the environments once communicated, is beside the point.

gwern · on Oct 22, 2020

"Assuming AGI agents are Turing computable, no individual AGI can possibly comprehend codes for all computable ordinals, because the set of codes of computable ordinals is badly non-computably-enumerable."

I was going to criticize this paper as crankery in the vein of Penrose, but first I thought I'd just compute all possible ordinals in my brain to make sure I'm a general intelligence.

brb.

xamuel · on Oct 22, 2020

Hi, thanks for looking at my paper. If you're interested in the relation between Lucas-Penrose stuff and enumeratability of ordinal codes, you might like I.J. Good (1969), "Godel's theorem is a red herring" (2 pages). Can you elaborate on what it is about my paper that strikes you as crankery? I'm a fan of yours so it would be much appreciated.

gwern · on Oct 22, 2020

Sadly, I was only able to count through a vanishingly small subset of the reals in a finite time, and therefore am not a general intelligence, and so it would be foolish of me, a machine made of a handful of atoms, to try to criticize this paper. It sure would be nice if I could appreciate music, but you've proven that's impossible, so it is what it is.

xamuel · on Oct 23, 2020

Whether man is machine can't be trivially answered using a few handwavy applications of Godel's theorem nor of observations about the structure of the reals. My paper does not make any attempt to weigh on that question, neither does it claim nor imply that humans have any supernatural powers of enumerating reals or ordinals, and you grossly misrepresent me by implying it does.

Rather, my paper is on the less ambitious question of whether the traditional RL model (with its real-valued rewards) accurately captures the full set of reward-giving environments an AGI should be capable of comprehending.

YeGoblynQueenne · on Oct 22, 2020

None, one, many, all.

Hah. Still got it.

sn41 · on Oct 22, 2020

I think you are right in constructing situations where real numbers are inadequate. It is also right that you do not claim that hyperreals or surreals suffice, you are merely pointing out that they may help you to do better than the reals.

But I have often wondered - why are people hung up on linear ordering? Why not non-total partial orders?

https://en.wikipedia.org/wiki/Ordinal_optimization

Is this insistence on linear orders because of simplistic modeling on the part of cognitive science and AI people, or is there some problem with general partial orders?

xamuel · on Oct 22, 2020

One could certainly contemplate versions of RL with non-linear orderings. I guess the reason people care about linear ordering is because you want the agent to at least understand "this outcome is better than that outcome". How would we hope for a good nonlinear-RL agent to behave in an environment with 2 buttons, one of which always gives reward X, and the other of which always gives reward Y, where X and Y are incomparable?

sn41 · on Oct 22, 2020

That's understandable. But many decisions in life are like that. Do you want to be close to your roots and your parents, or do you want a high-flying career in a remote city? Choices involve sacrifices as well as gains, and many meaningful outcomes are incomparable among themselves.

nx2059 · on Oct 21, 2020

The article was pretty over my head, but does your argument hold to the various augmented neural net systems such as Neural Turing Machines, or Differentiable Neural Computers?

xamuel · on Oct 21, 2020

The argument isn't so much about the type of agent (which I think is what Neural Turing Machines etc. are about), it's about the type of environment. In traditional Reinforcement Learning, environments give real-number-valued rewards (or even rational-number-valued rewards which is even more constrained). Presumably this was a decision that was made with hardly a second thought because real numbers are most familiar to people... but such "appeals to familiarity" are totally irrelevant for such an alien field as AGI :) The point of the paper is that a genuine AGI should be able to comprehend environments that involve rewards with a more sophisticated structure than can be accurately represented using real numbers.

visarga · on Oct 22, 2020

There are multi-reward agents and multi-task agents but all rewards get added into a final scalar value. And gradient based methods need to have this one scalar value to derive gradients from.

Since you mentioned higher dimensional representations for rewards, I want to remind that the sub-fields of Inverse RL and Model-based RL are concerned with reward representation and prediction by neural nets.

Also, it doesn't seem like a good idea to try to disprove an entire field with a purely theoretical (a-priori) argument. There should be at least some consideration given to the state of the art in the field.

xamuel · on Oct 23, 2020

I make no attempt to disprove an entire field, rather to question an implicit assumption, namely that real number rewards are flexible enough to capture all relevant environments an AGI should be able to comprehend. Quite the opposite, I indicate ways reinforcement learning could be modified to get past the roadblock I point out with real numbers.

ssivark · on Oct 21, 2020

IIUC, the claim is that the very idea of a (real valued) “objective function” to be “optimized” is broken?

xamuel · on Oct 21, 2020

Broken in the sense that it's not flexible enough to apply to all conceivable environments a genuine AGI could navigate, without misleading that AGI. But I should stress that real number objective functions are probably fine for many specific interesting environments, I'm not trying to say that real number objective functions are useless. Just that they aren't flexible enough to cover all environments :)

ssivark · on Oct 21, 2020

Fair enough. It would be interesting/instructive to construct (relatively simple) examples where we can see that they’re broken :-)

xamuel · on Oct 22, 2020

Easy to come up with examples using exotic money-related constructions. Suppose there's something called a "superdollar". If you have a superdollar, you can use it to create an arbitrary number of dollars for yourself, any time you want, which you can trade for goods and services. If you want, you can also trade the superdollar itself. Now picture an environment with two buttons, one of which always rewards you one dollar, and the other of which always rewards you one superdollar. Shoe-horning this environment into traditional RL, you'd have to assign the superdollar button some finite reward, say a million. But then you would mislead the traditional-RL-agent into thinking a million dollars was as good as one superdollar, which clearly is not true.

IshKebab · on Oct 21, 2020

But isn't the sophisticated structure what leads to the real number. If you change the rewards to something more complex surely you still have to pick between actions and at some point you'll have to evaluate which one is "better" and I can't see why you couldn't use real numbers to represent utility.

I mean, humans are general intelligences, and you can translate pretty much any human reward into money, which is a real number.

The paper is super long though. Maybe someone can give a TL;DR that makes sense.

xamuel · on Oct 22, 2020

Question: when you say "I can't see why you couldn't use real numbers to represent utility", does your reasoning for that have anything to do with Dedekind cuts, Cauchy sequences, or complete ordered fields? Because that's what the real numbers _are_. If your reasoning has nothing to do with these sort of things, then it can't possibly be sound because in order to argue that X has such-and-such property, you need to know what X actually _is_.

To repeat an example I posted for someone else: Suppose there's something called a "superdollar". If you have a superdollar, you can use it to create an arbitrary number of dollars for yourself, any time you want, which you can trade for goods and services. If you want, you can also trade the superdollar itself. Now picture an environment with two buttons, one of which always rewards you one dollar, and the other of which always rewards you one superdollar. Shoe-horning this environment into traditional RL, you'd have to assign the superdollar button some finite reward, say a million. But then you would mislead the traditional-RL-agent into thinking a million dollars was as good as one superdollar, which clearly is not true.

IshKebab · on Oct 22, 2020

Good example, although what if you just assigned it a reward of like 100 trillion dollars? It might not be exactly correct but then you're assuming that exactly correct rewards are required for AGI which seems like a pretty big assumption.

Actually I thought about this some more, and maybe money wasn't the best example, but I think there must be some internal measure of utility that humans use that can be represented by real numbers.

Imagine you are presented with an array of possible actions with associated (possibly estimated) rewards. You can only pick one. Maybe there are some doors but you can only open one - behind the first is $1m, behind the second is a superdollar, behind the third is a button that cures world hunger, behind the 4th is your loving family, whatever.

As a human I can pick one. No matter what the rewards are. Even if one reward is "you essentially become God". That means I can order them, and therefore that they can be represented by real numbers (plus infinity for the god option).

I don't see why the infinity would cause an issue: the "you can now do literally anything" reward is worth more than every other reward, but it's the only one. Also it doesn't actually exist so who cares?

Actually I guess it can exist in games, e.g. God mode in Quake. But that should have an infinite reward and agents should choose it over everything else so I can't see the problem really.

eli_gottlieb · on Oct 22, 2020

>I mean, humans are general intelligences, and you can translate pretty much any human reward into money, which is a real number.

A lot of people have written quite a lot of arguments that this is false.

IshKebab · on Oct 22, 2020

A lot of people have written a lot of arguments about everything. Has anyone actually demonstrated that it isn't true?

bra-ket · on Oct 21, 2020

do your claims apply in complex-valued RL context?

xamuel · on Oct 22, 2020

I'm surprised to see that that's actually a thing people write about. I'm afraid I can't answer your question right now, as I have no idea why anyone would do that (as opposed to, say, vector-valued rewards---i.e., why is the ability to complex-multiply two complex rewards relevant, as opposed to merely comparing separate components of rewards from R^2). I'll have to read up on the subject :)