By Bene


2017-07-12 12:40:59 8 Comments

I am using python and multiple libaries like pandas and scipy to prepare data so I can start deeper analysis. For the preparation purpose I am for instance creating new columns with the difference of two dates.
My code is providing the expected results but is really slow so I cannot use it for a table with like 80K rows. The run time would take ca. 80 minutes for the table just for this simple operation.

The problem is definitely related with my writing operation:

tableContent[6]['p_test_Duration'].iloc[x] = difference


Moreover python is providing a Warning:

enter image description here

complete code example for date difference:

import time
from datetime import date, datetime

tableContent[6]['p_test_Duration'] = 0

#for x in range (0,len(tableContent[6]['p_test_Duration'])):
for x in range (0,1000):
    p_test_ZEIT_ANFANG = datetime.strptime(tableContent[6]['p_test_ZEIT_ANFANG'].iloc[x], '%Y-%m-%d %H:%M:%S')
    p_test_ZEIT_ENDE = datetime.strptime(tableContent[6]['p_test_ZEIT_ENDE'].iloc[x], '%Y-%m-%d %H:%M:%S')
    difference = p_test_ZEIT_ENDE - p_test_ZEIT_ANFANG

    tableContent[6]['p_test_Duration'].iloc[x] = difference

the correct result table:

---

3 comments

@user7330431 2017-07-12 14:13:10

The other answers are fine, but I would recommend that you avoid chained indexing in general. The pandas docs explicitly discourage chained indexing as it either produces unreliable results or is slow (due to multiple calls to __getitem__). Assuming your data frame is multi-indexed, you might replace:

tableContent[6]['p_test_Duration'].iloc[x] = difference

with:

tableContent.loc[x, (6, 'p_test_Duration')] = difference

You can sometimes get around this issue, but why not learn the method least likely to cause problems in the future?

@piRSquared 2017-07-12 13:46:02

You can vectorize the conversion of dates by using pd.to_datetime and avoid using apply unnecessarily.

tableContent[6]['p_test_Duration'] = (
    pd.to_datetime(tableContent[6]['p_test_ZEIT_ENDE']) -
    pd.to_datetime(tableContent[6]['p_test_ZEIT_ANFANG'])
)

Also, you were getting the SettingWithCopy warning because of the chained indexing assingnment

tableContent[6]['p_test_Duration'].iloc[x] = difference

Which you don't have to worry about if you go about it in the way I suggested.

@WeNYoBen 2017-07-12 13:50:31

This answer is better ~For efficiency purpose should not consider apply too.

@Meitham 2017-07-12 12:48:42

Take away the loop, and apply the functions to the whole series.

ZEIT_ANFANG = tableContent[6]['p_test_ZEIT_ANFANG'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
ZEIT_ENDE = tableContent[6]['p_test_ZEIT_ENDE'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
tableContent[6]['p_test_Duration'] = ZEIT_ENDE - ZEIT_ANFANG

@Bene 2017-07-12 13:20:17

Thank you so much. It is working perfect! Just one comment on your code: there are two brackets missing to close the .apply() method.

@Meitham 2017-07-12 13:29:40

Glad it worked, added the missing brackets, I don't know how people manage with lisp brackets ;-)

Related Questions

Sponsored Content

23 Answered Questions

[SOLVED] Does Python have a ternary conditional operator?

62 Answered Questions

[SOLVED] Calling an external command in Python

20 Answered Questions

10 Answered Questions

[SOLVED] Does Python have a string 'contains' substring method?

19 Answered Questions

[SOLVED] Accessing the index in 'for' loops?

  • 2009-02-06 22:47:54
  • Joan Venge
  • 1904197 View
  • 3270 Score
  • 19 Answer
  • Tags:   python loops list

25 Answered Questions

[SOLVED] How can I safely create a nested directory?

28 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 3455984 View
  • 2844 Score
  • 28 Answer
  • Tags:   python list indexing

16 Answered Questions

[SOLVED] What are metaclasses in Python?

22 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

10 Answered Questions

[SOLVED] Iterating over dictionaries using 'for' loops

Sponsored Content