By user9092346


2018-06-13 14:10:46 8 Comments

!! The aim is to apply the working method to text in a Pandas Dataframe !!

Given that I have sentences like the following ones:

"He invited 2 people and pet 3 dogs."

"She invited 3 friends and pet 1 cat."

For each sentence I want to count in a variable how many humans are invited and how many pets are pet. This works easily via regex:

sentence = 'He invited 2 people and pet 3 dogs.'

human = [r'(\d+) people', r'(\d+) friend']

for h in human:
    number = re.search(h, sentence, re.IGNORECASE)
    if number is not None:
        number = number.group(1)

print('humans invited: ',number)

Now the sentences are in a Pandas Dataframe in the column "sentence". The Dataframe also has a column called "humans" and one called "pets". I now want to take the first sentence, process it like shown above, write the result for humans into the column "humans", do the same for pets and write it into the column pets. However, I am not sure how to apply this to a Pandas Dataframe row by row.

2 comments

@ALollz 2018-06-13 14:37:49

If there are only ever 2 numbers in the sentences and you always expect humans to come before pets you can get it all at once:

df[['humans', 'pets']] = df.sentence.str.extract('(\d+).*?(\d+)', expand=True)

df is now:

                                          sentence humans    pets
0              He invited 2 people and pet 3 dogs.      2       3
1             She invited 3 friends and pet 1 cat.      3       1
2        She invited 13 friends and pet 145 frogs.     13     145
3  She invited 11243 friends and pet 141415 frogs.  11243  141415

@Ben.T 2018-06-13 14:22:01

With pandas, you can use str.extract such as:

df['humans'] = df['sentence'].str.extract('(\d+) (?:people|friend)', re.IGNORECASE, expand=False)

and same for pets

Related Questions

Sponsored Content

13 Answered Questions

[SOLVED] Delete column from pandas DataFrame using del df.column_name

21 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

14 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

27 Answered Questions

[SOLVED] Renaming columns in pandas

17 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

11 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

11 Answered Questions

[SOLVED] Select rows from a DataFrame based on values in a column in pandas

13 Answered Questions

[SOLVED] "Large data" work flows using pandas

11 Answered Questions

[SOLVED] shuffling/permutating a DataFrame in pandas

  • 2013-04-02 18:50:12
  • user248237dfsf
  • 53147 View
  • 56 Score
  • 11 Answer
  • Tags:   python numpy pandas

3 Answered Questions

Sponsored Content