By harijay


2009-12-12 18:19:03 8 Comments

I was wondering what the simplest way is to convert a string list like the following to a list:

x = u'[ "A","B","C" , " D"]'

Even in case user puts spaces in between the commas, and spaces inside of the quotes. I need to handle that as well to:

x = ["A", "B", "C", "D"] 

in Python.

I know I can strip spaces with strip() and split() using the split operator and check for non alphabets. But the code was getting very kludgy. Is there a quick function that I'm not aware of?

15 comments

@kinzleb 2019-05-01 03:54:25

Inspired from some of the answers above that work with base python packages I compared the performance of a few (using Python 3.7.3):

Method 1: ast

import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195

Method 2: json

import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424

Method 3: no import

list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502

I was disappointed to see what I considered the method with the worst readability was the method with the best performance... there are tradeoffs to consider when going with the most readable option... for the type of workloads I use python for I usually value readability over a slightly more performant option, but as usual it depends.

@JCMontalbano 2019-01-08 23:24:24

you can save yourself the .strip() fcn by just slicing off the first and last characters from the string representation of the list (see third line below)

>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
...     print(entry)
...     type(entry)
... 
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>

@Coding bat 2018-11-15 18:32:20

Let's assume your string is t_vector = [34, 54, 52, 23] and you want to convert this into a list. You can use the below 2 steps:

ls = t_vector.strip('][')
t_vector = ls.split(' ')

t_vector contains the list.

@ruohola 2019-04-11 11:20:45

This will not work, it will give the result t_vector = ['34,', '54,', '52,', '23'], you need to use split(',') like I've used in my answer.

@ruohola 2018-08-28 13:02:10

Without importing anything;

x = u'[ "A","B","C" , " D"]'

ls = x.strip('][').split(',')

@Hassan Kamal 2018-10-03 18:12:01

Cautionary note: this could potentially be dangerous if any of the strings inside list has a comma in between.

@passs 2018-08-06 10:54:36

So, following all the answers I decided to time the most common methods:

from time import time
import re
import json


my_str = str(list(range(19)))
print(my_str)

reps = 100000

start = time()
for i in range(0, reps):
    re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    json.loads(my_str)
print("json method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    [n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)



    regex method:    6.391477584838867e-07
    json method:     2.535374164581299e-06
    ast method:      2.4425282478332518e-05
    strip method:    4.983267784118653e-06

So in the end regex wins!

@Jordy Van Landeghem 2018-06-01 09:32:00

I would like to provide a more intuitive patterning solution with regex. The below function takes as input a stringified list containing arbitrary strings.

Stepwise explanation: You remove all whitespacing,bracketing and value_separators (provided they are not part of the values you want to extract, else make the regex more complex). Then you split the cleaned string on single or double quotes and take the non-empty values (or odd indexed values, whatever the preference).

def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only

testsample: "['21',"foo" '6', '0', " A"]"

@CptHwK 2018-04-27 13:56:02

To further complete @Ryan 's answer using json, one very convenient function to convert unicode is the one posted here: https://stackoverflow.com/a/13105359/7599285

ex with double or single quotes:

>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']

@Roger Pate 2009-12-12 18:30:49

>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

ast.literal_eval:

With ast.literal_eval, you can safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

@Paul Kenjora 2017-11-18 21:15:48

Per comment below, this is dangerous as it simply runs whatever python is in the string. So if someone puts a call to delete everything in there, it happily will.

@user2357112 2018-03-19 23:29:28

@PaulKenjora: You're thinking of eval, not ast.literal_eval.

@abarnert 2018-03-30 00:12:13

ast.literal_eval is safer than eval, but it's not actually safe. As recent versions of the docs explain: "Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler." It may, in fact, be possible to run arbitrary code via a careful stack-smashing attack, although as far as I know nobody's build a public proof of concept for that.

@Ryan 2016-02-17 15:39:52

The json module is a better solution whenever there is a stringified list of dictionaries. The json.loads(your_data) function can be used to convert it to a list.

>>> import json
>>> x = u'[ "A","B","C" , " D"]'
>>> json.loads(x)
[u'A', u'B', u'C', u' D']

Similarly

>>> x = u'[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
[u'A', u'B', u'C', {u'D': u'E'}]

@Mansoor Akram 2016-11-14 20:20:46

however i don't want the returned list in unicode format. but seems like even if i remove u' ' from the string it stills treats the data as unicode.

@Paul Kenjora 2017-11-18 21:16:53

This works for ints but not for strings in my case because each string is single quoted not double quoted, sigh.

@Skippy le Grand Gourou 2019-06-19 09:33:07

As per @PaulKenjora's comment, it works for '["a","b"]' but not for "['a','b']".

@kiriloff 2013-11-01 10:12:26

with numpy this is working a very simple way

x = u'[ "A","B","C" , " D"]'
list_string = str(x)
import numpy as np
print np.array(list_string)

gives

>>> 
[ "A","B","C" , " D"]

@River 2016-07-18 13:23:17

This doesn't work. It simply makes a 0-d array of the string. Any array operations, such as accessing an element, fail with error.

@dirkjot 2009-12-12 22:18:37

Assuming that all your inputs are lists and that the double quotes in the input actually don't matter, this can be done with a simple regexp replace. It is a bit perl-y but works like a charm. Note also that the output is now a list of unicode strings, you didn't specify that you needed that, but it seems to make sense given unicode input.

import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
--->  [u'A', u'B', u'C', u'D']

The junkers variable contains a compiled regexp (for speed) of all characters we don't want, using ] as a character required some backslash trickery. The re.sub replaces all these characters with nothing, and we split the resulting string at the commas.

Note that this also removes spaces from inside entries u'["oh no"]' ---> [u'ohno']. If this is not what you wanted, the regexp needs to be souped up a bit.

@PaulMcG 2009-12-12 21:38:54

If you know that your lists only contain quoted strings, this pyparsing example will give you your list of stripped strings (even preserving the original Unicode-ness).

>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']

If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar - like this one on the pyparsing wiki, which will handle tuples, lists, ints, floats, and quoted strings. Will work with Python versions back to 2.4.

@Mansoor Akram 2016-11-14 20:14:24

would you let me know how to use "parseString().asList()", if i have this kind of string: '[ "A","B","C" , ["D"]]', as you have stated that pyparsing can do that as well. but o don't seem to have found the right way to do it.

@PaulMcG 2016-11-14 22:39:02

"If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar" - please see the link I provided in my answer for a parser that will handle nested lists, and various other data types.

@PaulMcG 2019-03-18 10:56:07

Pyparsing is no longer hosted at wikispaces. The parsePythonValue.py example is now on GitHub at github.com/pyparsing/pyparsing/blob/master/examples/…

@Mark Byers 2009-12-12 18:29:08

The eval is dangerous - you shouldn't execute user input.

If you have 2.6 or newer, use ast instead of eval:

>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]

Once you have that, strip the strings.

If you're on an older version of Python, you can get very close to what you want with a simple regular expression:

>>> x='[  "A",  " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']

This isn't as good as the ast solution, for example it doesn't correctly handle escaped quotes in strings. But it's simple, doesn't involve a dangerous eval, and might be good enough for your purpose if you're on an older Python without ast.

@Aaryan Dewan 2017-07-17 01:56:08

Could you please tell me what why did you say “The eval is dangerous - you shouldn’t execute user input.”? I am using 3.6

@Abhishek Menon 2017-09-21 23:28:01

@AaryanDewan if you use eval directly, it will evaluate any valid python expression, which is potentially dangerous. literal_eval solves this problem by only evaluating Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

@tosh 2009-12-12 18:29:02

import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]

@Alexei Sholik 2009-12-12 18:24:11

There is a quick solution:

x = eval('[ "A","B","C" , " D"]')

Unwanted whitespaces in the list elements may be removed in this way:

x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]

@tosh 2009-12-12 18:26:34

this would still preserve the spaces inside the quotes

@Nicholas Knight 2009-12-12 18:29:16

This is an open invitation to arbitrary code execution, NEVER do this or anything like it unless you know with absolute certainty that the input will always be 100% trusted.

@Alexei Sholik 2009-12-12 19:42:00

@tosh: it won't.

@Manish Ranjan 2016-03-11 20:44:09

I could use this suggestion because I knew my data was always gonna be in that format and was a data processing work.

Related Questions

Sponsored Content

58 Answered Questions

[SOLVED] How do I read / convert an InputStream into a String in Java?

22 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3387387 View
  • 3474 Score
  • 22 Answer
  • Tags:   python directory

62 Answered Questions

[SOLVED] What is the difference between String and string in C#?

56 Answered Questions

[SOLVED] How to replace all occurrences of a string?

3 Answered Questions

17 Answered Questions

[SOLVED] Convert bytes to a string?

10 Answered Questions

[SOLVED] Does Python have a string 'contains' substring method?

38 Answered Questions

43 Answered Questions

[SOLVED] How do I convert a String to an int in Java?

18 Answered Questions

[SOLVED] Why is char[] preferred over String for passwords?

Sponsored Content