By Berry Tsakala


2013-08-20 14:18:18 8 Comments

sample code:

>>> import json
>>> json_string = json.dumps("ברי צקלה")
>>> print json_string
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML).

Is there a way to serialize objects into UTF-8 JSON strings (instead of \uXXXX)?

11 comments

@sivi 2019-10-18 08:59:53

Thanks for the original answer here. With python 3 the following line of code:

print(json.dumps(result_dict,ensure_ascii=False))

was ok. consider trying not writing too much text in the code if it's not imperative

@Martijn Pieters 2013-08-20 14:33:20

Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

>>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
>>> json_string
b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
>>> print(json_string.decode())
"ברי צקלה"

If you are writing to a file, just use json.dump() and leave it to the file object to encode:

with open('filename', 'w', encoding='utf8') as json_file:
    json.dump("ברי צקלה", json_file, ensure_ascii=False)

Caveats for Python 2

For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

with io.open('filename', 'w', encoding='utf8') as json_file:
    json.dump(u"ברי צקלה", json_file, ensure_ascii=False)

Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(u"ברי צקלה", ensure_ascii=False)
    # unicode(data) auto-decodes data to unicode if str
    json_file.write(unicode(data))

In Python 2, when using byte strings (type str), encoded to UTF-8, make sure to also set the encoding keyword:

>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}

>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
>>> json.loads(s)['1']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> json.loads(s)['2']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
ברי צקלה

@Chandan Sharma 2019-07-30 11:07:59

If you are loading JSON string from a file & file contents arabic texts. Then this will work.

Assume File like: arabic.json

{ 
"key1" : "لمستخدمين",
"key2" : "إضافة مستخدم"
}

Get the arabic contents from the arabic.json file

with open(arabic.json, encoding='utf-8') as f:
   # deserialises it
   json_data = json.load(f)
   fh.close()


# json formatted string
json_data2 = json.dumps(json_data, ensure_ascii = False)

To use JSON Data in Django Template follow below steps:

# If have to get the JSON index in Django Template file, then simply decode the encoded string.

json.JSONDecoder().decode(json_data2)

done! Now we can get the results as JSON index with arabic value.

@Trần Quang Hiệp 2016-11-14 09:35:26

To write to a file

import codecs
import json

with codecs.open('your_file.txt', 'w', encoding='utf-8') as f:
    json.dump({"message":"xin chào việt nam"}, f, ensure_ascii=False)

To print to stdin

import codecs
import json
print(json.dumps({"message":"xin chào việt nam"}, ensure_ascii=False))

@Alex 2017-05-17 07:08:59

SyntaxError: Non-ASCII character '\xc3' in file json-utf8.py on line 5, but no encoding declared; see python.org/dev/peps/pep-0263 for details

@Karim Sonbol 2018-06-29 09:48:42

Thank you! I didn't realize it was that simple. You only need to be careful if the data you are converting to json is untrusted user input.

@Gabriel Fair 2018-07-09 06:58:25

@Alex did you figure out how to avoid that issue?

@Alex 2018-08-29 05:32:54

@Gabriel frankly, I don't remember. It was not something so important to put snippet aside :(

@Nik 2019-01-20 13:56:37

As of Python 3.7 the following code works fine:

from json import dumps
result = {"symbol": "ƒ"}
json_string = dumps(result, sort_keys=True, indent=2, ensure_ascii=False)
print(json_string)

Output:

{"symbol": "ƒ"}

@Berry Tsakala 2019-02-13 17:20:40

also in python 3.6 (just verified).

@Yulin GUO 2018-08-14 06:58:31

Use codecs if possible,

with codecs.open('file_path', 'a+', 'utf-8') as fp:
    fp.write(json.dumps(res, ensure_ascii=False))

@Cheney 2017-01-07 13:11:16

The following is my understanding var reading answer above and google.

# coding:utf-8
r"""
@update: 2017-01-09 14:44:39
@explain: str, unicode, bytes in python2to3
    #python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
    #1.reload
    #importlib,sys
    #importlib.reload(sys)
    #sys.setdefaultencoding('utf-8') #python3 don't have this attribute.
    #not suggest even in python2 #see:http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
    #2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)
    #too complex
    #3.control by your own (best)
    #==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)
    #see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes

    #how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
    #http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
"""

from __future__ import print_function
import json

a = {"b": u"中文"}  # add u for python2 compatibility
print('%r' % a)
print('%r' % json.dumps(a))
print('%r' % (json.dumps(a).encode('utf8')))
a = {"b": u"中文"}
print('%r' % json.dumps(a, ensure_ascii=False))
print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))
# print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'
print('')

# python2:bytes=str; python3:bytes
b = a['b'].encode('utf-8')
print('%r' % b)
print('%r' % b.decode("utf-8"))
print('')

# python2:unicode; python3:str=unicode
c = b.decode('utf-8')
print('%r' % c)
print('%r' % c.encode('utf-8'))
"""
#python2
{'b': u'\u4e2d\u6587'}
'{"b": "\\u4e2d\\u6587"}'
'{"b": "\\u4e2d\\u6587"}'
u'{"b": "\u4e2d\u6587"}'
'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'

'\xe4\xb8\xad\xe6\x96\x87'
u'\u4e2d\u6587'

u'\u4e2d\u6587'
'\xe4\xb8\xad\xe6\x96\x87'

#python3
{'b': '中文'}
'{"b": "\\u4e2d\\u6587"}'
b'{"b": "\\u4e2d\\u6587"}'
'{"b": "中文"}'
b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'

b'\xe4\xb8\xad\xe6\x96\x87'
'中文'

'中文'
b'\xe4\xb8\xad\xe6\x96\x87'
"""

@Neit Sabes 2016-08-26 09:56:06

Here's my solution using json.dump():

def jsonWrite(p, pyobj, ensure_ascii=False, encoding=SYSTEM_ENCODING, **kwargs):
    with codecs.open(p, 'wb', 'utf_8') as fileobj:
        json.dump(pyobj, fileobj, ensure_ascii=ensure_ascii,encoding=encoding, **kwargs)

where SYSTEM_ENCODING is set to:

locale.setlocale(locale.LC_ALL, '')
SYSTEM_ENCODING = locale.getlocale()[1]

@monitorius 2014-09-27 19:41:39

UPDATE: This is wrong answer, but it's still useful to understand why it's wrong. See comments.

How about unicode-escape?

>>> d = {1: "ברי צקלה", 2: u"ברי צקלה"}
>>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')
>>> print json_str
{"1": "ברי צקלה", "2": "ברי צקלה"}

@jfs 2015-05-11 08:09:09

unicode-escape is not necessary: you could use json.dumps(d, ensure_ascii=False).encode('utf8') instead. And it is not guaranteed that json uses exactly the same rules as unicode-escape codec in Python in all cases i.e., the result might or might not be the same in some corner case. The downvote is for an unnecessary and possibly wrong conversion. Unrelated: print json_str works only for utf8 locales or if PYTHONIOENCODING envvar specifies utf8 here (print Unicode instead).

@Martijn Pieters 2015-06-06 23:55:05

Another issue: any double quotes in string values will lose their escaping, so this'll result in broken JSON output.

@Gank 2016-04-18 13:59:54

error in Python3 :AttributeError: 'str' object has no attribute 'decode'

@Worker 2016-05-11 11:33:17

unicode-escape works fine! I would accept this answer as correct one.

@turingtested 2018-11-27 10:09:56

@jfs No, json.dumps(d, ensure_ascii=False).encode('utf8') is not working, for me at least. I'm getting UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position ...-error. The unicode-escape variant works fine however.

@jfs 2018-11-27 11:42:38

@turingtested the error is likely in your other code. It is hard to say without a minimal complete code example that reproduces the issue.

@Jonathan Ray 2015-01-19 20:14:42

Peters' python 2 workaround fails on an edge case:

d = {u'keyword': u'bad credit  \xe7redit cards'}
with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(d, ensure_ascii=False).decode('utf8')
    try:
        json_file.write(data)
    except TypeError:
        # Decode data to Unicode first
        json_file.write(data.decode('utf8'))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 25: ordinal not in range(128)

It was crashing on the .decode('utf8') part of line 3. I fixed the problem by making the program much simpler by avoiding that step as well as the special casing of ascii:

with io.open('filename', 'w', encoding='utf8') as json_file:
  data = json.dumps(d, ensure_ascii=False, encoding='utf8')
  json_file.write(unicode(data))

cat filename
{"keyword": "bad credit  çredit cards"}

@Martijn Pieters 2015-01-27 07:42:32

The 'edge case' was simply a dumb untested error on my part. Your unicode(data) approach is the better option rather than using exception handling. Note that the encoding='utf8' keyword argument has nothing to do with the output that json.dumps() produces; it is used for decoding str input the function receives.

@jfs 2015-02-07 17:43:19

@MartijnPieters: or simpler: open('filename', 'wb').write(json.dumps(d, ensure_ascii=False).encode('utf8')) It works whether dumps returns (ascii-only) str or unicode object.

@Martijn Pieters 2015-02-07 17:46:47

@J.F.Sebastian: right, because str.encode('utf8') decodes implicitly first. But so does unicode(data), if given a str object. :-) Using io.open() gives you more options though, including using a codec that writes a BOM and you are following the JSON data with something else.

@jfs 2015-05-11 07:55:22

@MartijnPieters: .encode('utf8')-based variant works on both Python 2 and 3 (the same code). There is no unicode on Python 3. Unrelated: json files should not use BOM (though a confirming json parser may ignore BOM, see errate 3983).

@Max L 2016-02-07 18:30:41

adding encoding='utf8' to json.dumps solves the problem. P.S. I have a cyrillic text to dump

@JeffThompson 2017-07-23 13:51:03

This is the only one that worked with my text.

@Vadorequest 2018-11-17 18:35:26

Worked where so many other attempts just failed.

@Ryan X 2014-01-05 02:25:35

Using ensure_ascii=False in json.dumps is the right direction to solve this problem, as pointed out by Martijn. However, this may raise an exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1: ordinal not in range(128)

You need extra settings in either site.py or sitecustomize.py to set your sys.getdefaultencoding() correct. site.py is under lib/python2.7/ and sitecustomize.py is under lib/python2.7/site-packages.

If you want to use site.py, under def setencoding(): change the first if 0: to if 1: so that python will use your operation system's locale.

If you prefer to use sitecustomize.py, which may not exist if you haven't created it. simply put these lines:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Then you can do some Chinese json output in utf-8 format, such as:

name = {"last_name": u"王"}
json.dumps(name, ensure_ascii=False)

You will get an utf-8 encoded string, rather than \u escaped json string.

To verify your default encoding:

print sys.getdefaultencoding()

You should get "utf-8" or "UTF-8" to verify your site.py or sitecustomize.py settings.

Please note that you could not do sys.setdefaultencoding("utf-8") at interactive python console.

@jfs 2014-01-05 02:49:12

no. Don't do it. Modifying default character encoding has nothing to do with json's ensure_ascii=False. Provide a minimal complete code example if you think otherwise.

@Martijn Pieters 2014-05-15 00:09:29

You only get this exception if you either feed in non-ASCII byte strings (e.g. not Unicode values) or try to combine the resulting JSON value (a Unicode string) with a non-ASCII byte string. Setting the default encoding to UTF-8 is essentially masking an underlying problem were you are not managing your string data properly.

Related Questions

Sponsored Content

8 Answered Questions

[SOLVED] How to escape text for regular expression in Java

  • 2008-09-12 23:36:36
  • Matt
  • 209611 View
  • 308 Score
  • 8 Answer
  • Tags:   java regex escaping

13 Answered Questions

[SOLVED] Unicode (UTF-8) reading and writing to files in Python

2 Answered Questions

[SOLVED] json.dumps \u escaped unicode to utf8

7 Answered Questions

5 Answered Questions

[SOLVED] How to get pretty output from rest_framework serializer

3 Answered Questions

[SOLVED] python byte string encode and decode

1 Answered Questions

[SOLVED] Alternatives to pickle's `persistent_id`?

Sponsored Content