By Thanx


2009-04-13 12:48:44 8 Comments

I want my Python function to split a sentence (input) and store each word in a list. My current code splits the sentence, but does not store the words as a list. How do I do that?

def split_line(text):

    # split the text
    words = text.split()

    # for each word in the line:
    for word in words:

        # print the word
        print(words)

9 comments

@BlackBeard 2018-10-24 09:06:30

If you want all the chars of a word/sentence in a list, do this:

print(list("word"))
#  ['w', 'o', 'r', 'd']


print(list("some sentence"))
#  ['s', 'o', 'm', 'e', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e']

@tgray 2009-04-13 14:24:18

Depending on what you plan to do with your sentence-as-a-list, you may want to look at the Natural Language Took Kit. It deals heavily with text processing and evaluation. You can also use it to solve your problem:

import nltk
words = nltk.word_tokenize(raw_sentence)

This has the added benefit of splitting out punctuation.

Example:

>>> import nltk
>>> s = "The fox's foot grazed the sleeping dog, waking it."
>>> words = nltk.word_tokenize(s)
>>> words
['The', 'fox', "'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 
'waking', 'it', '.']

This allows you to filter out any punctuation you don't want and use only words.

Please note that the other solutions using string.split() are better if you don't plan on doing any complex manipulation of the sentence.

[Edited]

@hobs 2011-12-14 13:10:06

split() relies on white-space as the separator, so it will fail to separate hyphenated words--and long-dash separated phrases will fail to split too. And if the sentence contains any punctuation without spaces, those will fail to stick. For any real-world text parsing (like for this comment), your nltk suggestion is much better than split()`.

@Mark Amery 2016-01-25 17:52:04

Potentially useful, although I wouldn't characterise this as splitting into "words". By any plain English definition, ',' and "'s" are not words. Normally, if you wanted to split the sentence above into "words" in a punctuation-aware way, you'd want to strip out the comma and get "fox's" as a single word.

@AnneTheAgile 2016-09-20 20:57:07

Python 2.7+ as of April 2016.

@zalew 2009-04-13 12:50:21

Splits the string in text on any consecutive runs of whitespace.

words = text.split()      

Split the string in text on delimiter: ",".

words = text.split(",")   

The words variable will be a list and contain the words from text split on the delimiter.

@Tarwin 2013-11-28 16:33:44

shlex has a .split() function. It differs from str.split() in that it does not preserve quotes and treats a quoted phrase as a single word:

>>> import shlex
>>> shlex.split("sudo echo 'foo && bar'")
['sudo', 'echo', 'foo && bar']

@nstehr 2009-04-13 12:54:16

text.split()

This should be enough to store each word in a list. words is already a list of the words from the sentence, so there is no need for the loop.

Second, it might be a typo, but you have your loop a little messed up. If you really did want to use append, it would be:

words.append(word)

not

word.append(words)

@gimel 2009-04-13 12:54:58

str.split()

Return a list of the words in the string, using sep as the delimiter ... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

>>> line="a sentence with a few words"
>>> line.split()
['a', 'sentence', 'with', 'a', 'few', 'words']
>>> 

@gimel 2015-12-16 09:27:50

@warvariuc - should have linked to docs.python.org/2/library/stdtypes.html#str.split

@Colonel Panic 2013-07-30 15:32:43

How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as we're.

>>> text
"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"

>>> text.split()
["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]

>>> import string
>>> [word.strip(string.punctuation) for word in text.split()]
['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']

@Mark Amery 2016-01-29 00:02:08

Nice, but some English words truly contain trailing punctuation. For example, the trailing dots in e.g. and Mrs., and the trailing apostrophe in the possessive frogs' (as in frogs' legs) are part of the word, but will be stripped by this algorithm. Handling abbreviations correctly can be roughly achieved by detecting dot-separated initialisms plus using a dictionary of special cases (like Mr., Mrs.). Distinguishing possessive apostrophes from single quotes is dramatically harder, since it requires parsing the grammar of the sentence in which the word is contained.

@Colonel Panic 2016-09-30 08:57:34

@MarkAmery You're right. It's also since occurred to me that some punctuation marks—such as the em dash—can separate words without spaces.

@Aditya Mukherji 2009-04-13 13:17:13

I think you are confused because of a typo.

Replace print(words) with print(word) inside your loop to have every word printed on a different line

@dbr 2009-04-13 13:46:06

I want my python function to split a sentence (input) and store each word in a list

The str().split() method does this, it takes a string, splits it into a list:

>>> the_string = "this is a sentence"
>>> words = the_string.split(" ")
>>> print(words)
['this', 'is', 'a', 'sentence']
>>> type(words)
<type 'list'> # or <class 'list'> in Python 3.0

The problem you're having is because of a typo, you wrote print(words) instead of print(word):

Renaming the word variable to current_word, this is what you had:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(words)

..when you should have done:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(current_word)

If for some reason you want to manually construct a list in the for loop, you would use the list append() method, perhaps because you want to lower-case all words (for example):

my_list = [] # make empty list
for current_word in words:
    my_list.append(current_word.lower())

Or more a bit neater, using a list-comprehension:

my_list = [current_word.lower() for current_word in words]

Related Questions

Sponsored Content

10 Answered Questions

[SOLVED] Iterating over dictionaries using 'for' loops

58 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

39 Answered Questions

[SOLVED] How to make a flat list out of list of lists

30 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2217170 View
  • 3237 Score
  • 30 Answer
  • Tags:   python list

76 Answered Questions

[SOLVED] How do I iterate over the words of a string?

  • 2008-10-25 08:58:21
  • Ashwin Nanjappa
  • 2104034 View
  • 2817 Score
  • 76 Answer
  • Tags:   c++ string split

34 Answered Questions

[SOLVED] How do I sort a dictionary by value?

44 Answered Questions

[SOLVED] Replacements for switch statement in Python?

28 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 3244044 View
  • 2674 Score
  • 28 Answer
  • Tags:   python list indexing

35 Answered Questions

[SOLVED] How to read a file line-by-line into a list?

16 Answered Questions

[SOLVED] Convert bytes to a string?

Sponsored Content