By user9092346

2018-06-13 14:10:46 8 Comments

!! The aim is to apply the working method to text in a Pandas Dataframe !!

Given that I have sentences like the following ones:

"He invited 2 people and pet 3 dogs."

"She invited 3 friends and pet 1 cat."

For each sentence I want to count in a variable how many humans are invited and how many pets are pet. This works easily via regex:

sentence = 'He invited 2 people and pet 3 dogs.'

human = [r'(\d+) people', r'(\d+) friend']

for h in human:
    number =, sentence, re.IGNORECASE)
    if number is not None:
        number =

print('humans invited: ',number)

Now the sentences are in a Pandas Dataframe in the column "sentence". The Dataframe also has a column called "humans" and one called "pets". I now want to take the first sentence, process it like shown above, write the result for humans into the column "humans", do the same for pets and write it into the column pets. However, I am not sure how to apply this to a Pandas Dataframe row by row.


@ALollz 2018-06-13 14:37:49

If there are only ever 2 numbers in the sentences and you always expect humans to come before pets you can get it all at once:

df[['humans', 'pets']] = df.sentence.str.extract('(\d+).*?(\d+)', expand=True)

df is now:

                                          sentence humans    pets
0              He invited 2 people and pet 3 dogs.      2       3
1             She invited 3 friends and pet 1 cat.      3       1
2        She invited 13 friends and pet 145 frogs.     13     145
3  She invited 11243 friends and pet 141415 frogs.  11243  141415

@Ben.T 2018-06-13 14:22:01

With pandas, you can use str.extract such as:

df['humans'] = df['sentence'].str.extract('(\d+) (?:people|friend)', re.IGNORECASE, expand=False)

and same for pets

