Modifying a list while looping through it in Python

Here is an interesting thing that I found. x is a SymPy Symbol:
Code block 1

I would have expected to get a = [], but it only removes the first item. And yes, x + 1 passes the condition:

>>> (x + 1).has(x)

Clearly, it is a bad idea to modify a list while I am looping through it. I should instead be doing something like this:
Code block 2
But I am intrigued as to why exactly this fails. If any of the readers of the blog thinks that he know, please post in the comments. Or, if I figure it out, I will post an update. Also, here is a similar example, with a strange result:
Code block 3

(Sorry for the images by the way. WordPress’s so called “code” blocks are impervious to indentation.)

UPDATE (a few minutes later):
Well, I figured it would come to me as to why this was happening, and it didn’t take long. While I haven’t read the actual Python Language Reference, this is what I am assuming is happening. This is all just my guessing on how Python is implemented. Please correct me if I am wrong.

So, obviously, a Python list is just a C array. It is probably an array of pointers, which is the only way I can see that would let it be mutable with different objects (this is how compiled languages with dynamic typing like Objective-C more or less pull it off). Now C does not have the for i in list syntax that Python has (nor does any other language that I know of. That is one of the reasons that Python is so awesome!), so if you want to recurse a list (C array), you have to do the usual for (i=0; i<=len(list); i++) { from C (or it probably uses a while loop, which would allow for things like iterators, but a for loop in C is literally just a wrapper around a while loop anyway). Then of course, inside of the loop, you just have list[i] blocks. So when I was going through my list, for example, the list of numbers in the last example, it would hit item 0 (the first item), remove it, which would amount to rebuilding the list as [2, 3, 4, 5], then it would hit item 1, which is now 3, remove it, rebuilding the list, and so on. So the even numbered elements remain because it skips every element after one that it removes. CPython must have good error handling, because eventually this would cause the indices to go beyond the length of the list. It seems to me that this behavior is not very well defined. Personally, I think that whatever you are looping through in a for loop should become immutable within the loop block. I checked Python 3.1, and the behavior is exactly the same.

Based on this, .remove() rebuilds the list each time. I would have thought it would just set the value in the array to Null, but I guess that would make it more difficult to test equality with an equivalent list that doesn’t have Null values. It is good to know that .remove() does that, because it means that can be an expensive operation.


22 Responses to Modifying a list while looping through it in Python

  1. Fabian says:

    has looks in args, and x.args is [] (the same with any Atom like Integer, Symbol, pi, etc), that’s why x.has(x) is False.

  2. Fabian says:

    oops, I was wrong. x.has(x) returns True …

  3. Fabian says:

    Another solution would be to make a copy of the list, i.e.:

    for i in a[:]:

    And now you can safely remove, add, elements in a

  4. Andy Terrel says:

    You should use filter for this.

    a = filter(lambda i: i.has(x), a)

    And Java has the “for i in obj” as long as the obj implements the Iterable interface. Lots of languages have this, even Bash.

    • Aaron Meurer says:

      Thanks for the filter tip. I also didn’t know that list[:] created a copy. And, I guess I was wrong about the for i in list syntax. It looks like several languages support it ( Just not C, which other than Python, is the import one for me because that is what my programs for classes have to be in.

      Of course, none of this answers my question, which is why exactly does it do this? Is my theory right or is it some other thing?

      • Andy Terrel says:

        More or less why doesn’t matter because it will change in any implementation. You can postulate about CPython but JPython and IronPython are in languages with the for each syntax.

        Essentially the iteration though the sequence is not defined if the sequence is manipulated. This is true in most languages.

        Also a python list is not just a c array. Python lists support resizing, insertion, and deletion; c arrays do not.

      • Andy Terrel says:

        It looks like you are basically correct about the array inside the list object being blown away and rebuilt. See Objects/listobject.c::listremove

    • Ronan Lamy says:

      The recommended way (I think) of doing this is to avoid filter and use a list comprehension instead:

      a = [i for i in a if not i.has(x)]

      • Aaron Meurer says:

        I agree. I changed it to that. A list comprehension is much clearer and it avoids the use of the easily obfuscatory lambda keyword.

        By the way, it needs to be a = [i for i in a if i.has(x)].

  5. Aaron Meurer says:

    I downloaded the source and had a look. It looks like a list is not an “array” per se, but a struct that contains a “Vector of pointers to list elements”, which are realloced when the list size changes (with some optimizations, but that is the basic idea).

    I couldn’t find the relevant code on for loops, but I think it checks the size of a each time. For example, “>>> a = [1]; for i in a: a.append(i)” hangs forever, and when you Control-C, a is a huge list of 1’s.

    By the way, I get the same thing in jython (WordPress will probably kill this. It is the same as the last example above):
    Jython 2.5.0 (Release_2_5_0:6476, Jun 16 2009, 13:33:26)
    [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_13
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> a = [1, 2, 3, 4, 5]
    >>> for i in a:
    … a.remove(i)

    >>> a
    [2, 4]

  6. Sebastian Haase says:

    Apparently it is more than just “good style” to not modify a list while iterating over it.
    But for your particular use case I always go backwards (!) and use the index (see below).
    That just solves the problem because the Remove() (or del a[i]) “invalidates” only indices AFTER the one removed – which have already been dealt with if you go backwards. So like this:
    for i in range(len(a)-1,-1,-1):
    if …. a[i]….:
    del a[i]

    • Nobody in particular says:

      Ouch, this is wrong on two counts:

      One, you still modify the list while iterating. It might work currently, but it may blow up in the future. Nasal Demons aren’t any fun.

      Two, range(len(a)-1,-1,-1) creates a list. Use xrange, it only creates a generator. This is both faster and uses less memory if a is large. This is especially important on implementations that don’t return integers to the pool.

  7. Jeremy says:

    The “for…in” syntax is known as the “iterator” pattern, and just FYI, lots of languages support this construct – including JavaScript, Java, Ruby, Visual Basic, C# – and those that do not can still accomplish the same thing it’s just manual, such as C and C++.

    As far as your loop “bug” goes, this is exactly why most languages raise an exception when modifying a list while iterating over it. I had no idea what Python would do in that situation, I personally would prefer an exception.

    It’s an example of an imperative programming bug. You are modifying the list that you are iterating over, so the result depends on implementation of the language.

    Nice post! It surprised me that Python even allows this.

    • Veky says:

      There is no way Python would even _know_ list has been modified in the widest sense (though it would know if len has changed, which it does exploit if you iterate through a set for example). You can refer to objects in a list from different places.

  8. […] Python Behavior (can someone please explain to me what is going on here?) Every once in a while, seemingly really simple Python code does something completely unexpected for me. Look at the […]

  9. […] Modifying a list while looping through it in Python July 200913 comments 4 […]

  10. […] for another one of my WTF Python blog posts. Yesterday, I randomly typed this in a Python session (it was late […]

  11. […] of this blog know that I sometimes like to write about some strange, unexpected, and unusual things in Python that I stumble across. This post is another one of […]

  12. i have just had the same issue

    the issue as i saw it is I was deleting an element of the list shortening it, then the for loop “goes out of range” because your using i or n or whatever in the for loop to target some part of the array (i’ll call it that if i want).. i had quite a few frankenstein solutions but to be honest I think the logical one is a try: and except:

    it just doesn’t make sense to build any crazy workarounds (i am already doing this on a nested list, so i could have an extra variable to be like a delete after loop tickbox) but yeah exceptions are fine, if you’re really OCD you program that particular exception in and you can then still be on the lookout for actual exceptions you don’t know about (i am not this ocd)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: