Handling control characters (escaping) in python for json and mysql
JSON loads() and dumps(): how control characters are handled
The json lib escapes control characters when you dump(s) a python object into a json string.
import jsonOutput:
aa = "String with\ttab and\nnewline"
print "1. Py string with control chars, unescaped: '%s'" % aa
aa = json.dumps(aa, indent = 2, encoding="utf8") # dumping py-obj into json-string will cause control chars to be escaped
print "\n2. JSON Encoded String with control chars, escaped: '%s'" % aa
aa = json.loads(aa) # loading json-string into py-obj will leave control chars unescaped
print "\n3. Py object (= string) with control chars, unescaped: '%s'" % aa
1. Py string with control chars, unescaped: 'String with tab andNote that json.loads(s) will raise an exception if the json string (passed as its first argument) contains unescaped control chars. (This is the default behaviour, which can be changed by setting the underlying JsonDecoder's strict argument to False.) For example, the following code will raise a ValueError: 'Invalid control character at: line 1 column 11 (char 11)'
newline'
2. JSON Encoded String with control chars, escaped: '"String with\ttab and\nnewline"'
3. Py object (= string) with control chars, unescaped: 'String with tab and
newline'
Note for python 2.x : when calling json.dumps() do not set 'ensure_ascii = False'. This will result in a unicode string with a different character set than 'utf8'. Instead, pass the value "utf8" (not "utf-8") via the encoding parameter.
aa = '{"test\n":"my"}'
aa = json.loads(aa) # aa should contain no un-escaped control chars
print aa
Storing json into mysql : double escaping requiredMake sure, when storing json into a db, to escape the json string for the db as well. For example, take the following workflow:
- you load a json string (e.g. from the db) into a python obj
- you modify the python obj (e.g. add a string to an array or object)
- you call dumps(obj) to turn the python object back into a json string
- you store the json string into the mysql db
- you load the new json string from mysql
For example, a json string that looks like this (after step 3)
'{"test":"my string with \\n newline"}' should look like this after step 4:'{\"test\":\"my string with \\\n newline\"}' and will be retrieved from the db (in step 5) as '{"test":"my string with \\n newline"}'To do this, use the escape_string() function of the MysqlDb connection object. This will also escape single quotes, so you can safely write your insert or update stmts like this:import MySQLdbconn.escape_string() can handle unicode, but make sure your python unicode string has the same character set as your database table. Otherwise, conn.escape_string() will implicitly call decode(), which may result in a UnicodeEncodeError being raised.
conn = MySQLdb.connect(host= dbhost, user= dbuser, passwd= dbpwd, db= dbname, use_unicode = True)
cursor = conn.cursor()
mysql_json_string = conn.escape_string(json_string)
query = "UPDATE mytable SET json_data = '%s' WHERE id = %s" % (mysql_json_string, id)
cursor.execute(query)
conn.commit()
cursor.close()
conn.close()
Comments