I am familiar with the concept of "vectorization", and how pandas employs vectorized techniques to speed up computation. Vectorized functions broadcast operations over the entire series or DataFrame to achieve speedups much greater than conventionally iterating over the data.
However, I am quite surprised to see a lot of code (including from answers on Stack Overflow) offering solutions to problems that involve looping through data using
for loops and list comprehensions. Having read the documentation, and with a decent understanding of the API, I am given to believe that loops are "bad", and that one should "never" iterate over arrays, series, or DataFrames. So, how come I see users suggesting loopy solutions every now and then?
So, to summarise... my question is:
for loops really "bad"? If not, in what situation(s) would they be better than using a more conventional "vectorized" approach?1
1 - While it is true that the question sounds somewhat broad, the truth is that there are very specific situations when
for loops are usually better than conventionally iterating over data. This post aims to capture this for posterity.