By Mark Ransom

2011-09-28 15:14:07 8 Comments

There appears to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface

Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')


@Antti Haapala 2017-07-23 20:35:05

The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode defaults to 'utf-8' ever since Python 3.0. Thus the best way is

b = mystring.encode()

This will also be faster, because the default argument results not in the string "utf-8" in the C code, but NULL, which is much faster to check!

Here be some timings:

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

Despite the warning the times were very stable after repeated runs - the deviation was just ~2 per cent.

Using encode() without an argument is not Python 2 compatible, as in Python 2 the default character encoding is ASCII.

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

@abarnert 2018-06-23 05:22:46

There's only a sizable difference here because (a) the string is pure ASCII, meaning the internal storage is already the UTF-8 version, so looking up the codec is almost the only cost involved at all, and (b) the string is tiny, so even if you did have to encode, it wouldn't make much difference. Try it with, say, '\u00012345'*10000. Both take 28.8us on my laptop; the extra 50ns is presumably lost in the rounding error. Of course this is a pretty extreme example—but 'abc' is just as extreme in the opposite direction.

@Antti Haapala 2018-06-23 07:19:30

@abarnert true, but even then, there is no reason pass the argument as a string.

@agf 2011-09-28 15:27:58

If you look at the docs for bytes, it points you to bytearray:

bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

So bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.

Edit: I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you're just skipping a level of indirection if you call encode yourself.

Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.

@Serdalis 2011-09-28 15:30:32

+1 for having a good argument and quotes from the python docs. Also unicode_string.encode(encoding) matches nicely with bytearray.decode(encoding) when you want your string back.

@hamstergene 2011-09-28 15:41:42

bytearray is used when you need a mutable object. You don't need it for simple strbytes conversions.

@agf 2011-09-28 15:43:53

@EugeneHomyakov This has nothing to do with bytearray except that the docs for bytes don't give details, they just say "this is an immutable version of bytearray" so I have to quote from there.

@holdenweb 2017-08-20 10:09:48

Just a cautionary note from Python in a Nutshell about bytes: Avoid using the bytes type as a function with an integer argument. In v2 this returns the integer converted to a (byte)string because bytes is an alias for str, while in v3 it returns a bytestring containing the given number of null characters. So, for example, instead of the v3 expression bytes(6), use the equivalent b'\x00'*6, which seamlessly works the same way in each version.

@lmiguelvargasf 2017-09-04 12:42:51

You can simply convert string to bytes using:


and you can simply convert bytes to string using:


bytes.decode and str.encode have encoding='utf-8' as default value.

The following functions (taken from Effective Python) might be useful to convert str to bytes and bytes to str:

def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode() # uses 'utf-8' for encoding
        value = bytes_or_str
    return value # Instance of bytes

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode() # uses 'utf-8' for encoding
        value = bytes_or_str
    return value # Instance of str

@gerardw 2017-04-05 16:16:21

so_string = 'stackoverflow'
so_bytes = so_string.encode( )

@Mark Ransom 2017-04-05 17:35:53

If you read the whole question again you'll see that this doesn't really answer it.

@Toby Speight 2017-04-05 21:56:27

Although this code may help to solve the problem, it doesn't explain why and/or how it answers the question. Providing this additional context would significantly improve its long-term value. Please edit your answer to add explanation, including what limitations and assumptions apply.

@gerardw 2017-04-06 14:43:50

As explained at, "Pythonic" code is brief and uses standard language idioms; because the code I present in the example is the simplest and most direct way of accomplishing the task, as working coder that's what I'd like to see. My traditional assumption for stack overflow had been that participants are working programmers helping each other solve problems, rather than nit picking about stupid stuff. Lesson learned.

@erm3nda 2017-04-23 05:37:12

I think stackoverflow has grow so much then ... :-) it's unvaluable resource for learning. Even if the main target are programmers... u was SOO lazy on your reply. You even wrote more on your explanation comment than on the answer itself :-)

@Davos 2017-11-23 05:52:54

One day when the internet becomes sentient and starts coding itself it will learn from S.O. Even with amazing semantic understanding and natural language processing it still won't be able to learn from answers like this. Maybe that's a good thing.

@hasanatkazmi 2013-07-06 07:09:28

Its easier than it is thought:

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

@agf 2013-09-30 17:50:42

He knows how to do it, he's just asking which way is better. Please re-read the question.

@Mike 2014-08-13 09:33:19

FYI: str.decode(bytes) didn't work for me (Python 3.3.3 said "type object 'str' has no attribute 'decode'") I used bytes.decode() instead

@jfs 2015-06-22 11:51:23

@Mike: use obj.method() syntax instead of cls.method(obj) syntax i.e., use bytestring = unicode_text.encode(encoding) and unicode_text = bytestring.decode(encoding).

@Ted 2017-03-17 13:44:35

Mike and shenshin fixed the errors in the answer -- it is working now for py 3.6

@Vladimircape 2017-05-14 20:20:35

You should be very carefull because encode create bytes but class will still be str, bytes method create bytes class.

@cpburnz 2017-06-16 21:13:32

This answer looks more like a comment to me. How does this actually answer the question?

@Antti Haapala 2017-07-23 20:36:46

str.encode(my_str) really should be my_str.encode()...

@Antti Haapala 2018-04-11 07:41:12

... i.e. you're needlessly making an unbound method, and then calling it passing the self as the first argument

@Kolob Canyon 2018-05-01 21:56:32

@agf who cares? it helps people who come to this page looking to perform this operation

@abarnert 2018-06-23 05:16:39

@KolobCanyon The question already shows the right way to do it—call encode as a bound method on the string. This answer suggests that you should instead call the unbound method and pass it the string. That's the only new information in the answer, and it's wrong.

Related Questions

Sponsored Content

60 Answered Questions

[SOLVED] How do I read / convert an InputStream into a String in Java?

3 Answered Questions

59 Answered Questions

[SOLVED] What is the difference between String and string in C#?

15 Answered Questions

[SOLVED] What are metaclasses in Python?

44 Answered Questions

[SOLVED] How do I convert a String to an int in Java?

13 Answered Questions

[SOLVED] Is there a way to substring a string?

  • 2009-03-19 17:29:41
  • Joan Venge
  • 2440049 View
  • 1818 Score
  • 13 Answer
  • Tags:   python string

17 Answered Questions

[SOLVED] Python string formatting: % vs. .format

17 Answered Questions

[SOLVED] Does Python have a string 'contains' substring method?

38 Answered Questions

16 Answered Questions

[SOLVED] Convert bytes to a string?

Sponsored Content