Saturday, December 22, 2007

Python 3000 - how mutable is immutable?

I just downloaded Py3k alpha 2 and decided to play with it a bit. Well, beside the print becoming a function, things seemed to be more or less the same as in 2.5. Then I decided to play with the more esoteric thingies in Python.
It has been long known that Python immutables aren't really immutable. For example, if you enter the following code in 2.5:


a = ([1], 2)
a[0] += [2]


you will get an error. However, printing a reveals that it has indeed been modified (a is now ([1, 2], 2)). Ok, so what happened to this little bug/hack/feature/gotcha/whatever-you-want-to-call-it in Py3k? Well, entering the code above also raises the same error, and again it actually modifies a. Now, I don't know is this "feature" is documented anywhere, but I really don't see any point in it being a part of the language (if someone can give me an example where it would be useful to have this kind of behavior, let me know). So, why is it still present in Py3k, when it's been known for a while now? I guess someone either thinks this is a good thing or they just forgot about it, either way---guys please change this, it's annoying me :) (and what better motivation to change the language than a rant by some guy on the Internet that you never met or heard of?)
But, the fun doesn't stop here kids! Try the following:


a = ({1:2}, 2)
a[0][2] = 3


This little piece of code gets executed without any problems, both in 2.5 and in Py3k. I wonder what is the rationale behind the fact that if we try to change a list (in place) that is part of a tuple, we get an error, while changing a dict happens silently. Ok, I do admit that doing


a = ([1], 2)
a[0].append(2)


also doesn't throw an error, but this only confuses me more. So, I'd like to know---are tuples mutable or immutable? Or better yet, how mutable are immutables?
As a final thought, consider the following code:


a = ({1:2}, 2)
a[0][2] = a
print a


In 2.5, this code will execute with no problems, and printing a will yield in
({1: 2, 2: ({...}, 2)}, 2)
thereby showing us that we have an infinite dictionary on our hands (or that's what I think these three dots stand for). However, doing the same thing in Py3k (i.e., running the above code and trying to print a---remember that in Py3k print is a function), we get

Traceback (most recent call last):
File "", line 1, in
TypeError: __repr__ returned non-string (type bytes)

Whoa! WTF?? What does this mean? What ever it means it tells me nothing about what is actually happening here. As far as I can see, there is something wrong with the __repr__ function---it isn't cut out to handle this kind of objects. So, the first thing that came to my mind was "Hey, cool, they won't print infinite dictionaries any more. They'll allow their construction, but when I try to print them, I'll get an error (a weird one, to say the least, but an error still). This kinda suxx, but ok, I guess I'll have to live with it." No. I was wrong. Try

print(a[0])

The infinite dict in the first position of the tuple prints with no problems. So, why is it a problem to print the tuple (which basically has one number, two parenthesis, and one comma extra)? No idea. Of course, trying to print a[0][3] yields an error, print a[0][3][0] goes without a problem, and so on. I guess that's why they call it alpha version :) I sincerely hope that this won't be a part of the language when a stable release of Py3k is out.






4 comments:

Leo Soto said...

I can see where the confusion come from: tuples are not mutable, but their contents may be.

So, the tuple ([1], [2]) is inmutable, but the first element ([1]) is a list object, so it can be changed using its methods (as you show later using the append method)

Now, the weird cases you point are caused by "magic methods". One is __iadd__ that is called if the left side of the += operand implements it. As list implements __iadd__, that explains why the tuple contents are modified. The same happens with the dict case: x[i] = y is equivalent to x.__setitem__(i, y).

Lie said...

As Leo Soto M. have said, immutable objects like tuple may contain a mutable object like list. The mutable object itself may still change, and this does not make the containing immutable tuple not immutable.

The code you've given here:
a = ([1], 2)
a[0] += [2]

would raise an error because, as the doc says: "[__ixxx__] should *attempt* to do the operation in-place (modifying self) and return the result (which could be, but *does not have to be*, self). If ... not defined, ... falls back to the normal methods." (emphasizes and ellipses on me)

In short, doing:
a = ([1], 2)
a[0] += [2]

is equivalent:
a = ([1], 2)
b = a[0] + [2]
a[0] = b # raises Error


but why doesn't this make error too?
a = ({1:2}, 2)
a[0][2] = 3

it's because a[0][2] = 3 directly mutates the dictionary, while leaving a untouched.

You've got to understand python's object model to fully understand why these things happened. In short, it is because python is pass-by-object

Filox said...

Well, Lie, I see now that I didn't explain very well what I meant by this post. I am very well aware of WHY these things happen (all the calls to underlying functions etc), what I'm unhappy about is the fact that they exist in the language. One of the things I love about Python is the fact that it's very clean, and stuff happens as you expect it would. Now, imagine explaining to someone new to Python that he should never use these idioms. Doesn't look very clean any more, does it? This kind of stuff can be fixed, what I don't understand is why it's not. Will we really have to include special examples like this in every Python book so novices don't get bitten?

Tim Parkin said...

Isn't it just simply the fact that python uses references? If you don't change the reference value then it's not changed.. perhaps the repr of a tuple should report the reference not the contents of the reference..