Handling control characters (escaping) in python for json and mysql


JSON loads() and dumps(): how control characters are handled

The json lib escapes control characters when you dump(s) a python object into a json string.
import json

aa = "String with\ttab and\nnewline"
print "1. Py string with control chars, unescaped: '%s'" % aa

aa = json.dumps(aa, indent = 2, encoding="utf8") # dumping py-obj into json-string will cause control chars to be escaped
print "\n2. JSON Encoded String with control chars, escaped: '%s'" % aa

aa = json.loads(aa) # loading json-string into py-obj will leave control chars unescaped
print "\n3. Py object (= string) with control chars, unescaped: '%s'" % aa
Output:
1. Py string with control chars, unescaped: 'String with tab and
newline'

2. JSON Encoded String with control chars, escaped: '"String with\ttab and\nnewline"'

3. Py object (= string) with control chars, unescaped: 'String with tab and
newline'
Note that json.loads(s) will raise an exception if the json string (passed as its first argument) contains unescaped control chars. (This is the default behaviour, which can be changed by setting the underlying JsonDecoder's strict argument to False.) For example, the following code will raise a ValueError: 'Invalid control character at: line 1 column 11 (char 11)'

Note for python 2.x : when calling json.dumps() do not set 'ensure_ascii = False'. This will result in a unicode string with a different character set than 'utf8'. Instead, pass the value "utf8" (not "utf-8") via the encoding parameter.
aa = '{"test\n":"my"}'
aa = json.loads(aa) # aa should contain no un-escaped control chars
print aa

Storing json into mysql : double escaping required

Make sure, when storing json into a db, to escape the json string for the db as well. For example, take the following workflow:
  1. you load a json string (e.g. from the db) into a python obj
  2. you modify the python obj (e.g. add a string to an array or object)
  3. you call dumps(obj) to turn the python object back into a json string
  4. you store the json string into the mysql db
  5. you load the new json string from mysql
If you haven't escaped the json string in step 4, any control character will results in a ValueError when you call loads() in step 5. This is because the control characters only have a single escape character (step 3) on mysql insert, which will be removed by the insert cmd.

For example, a json string that looks like this (after step 3)
'{"test":"my string with \\n newline"}'
should look like this after step 4:
'{\"test\":\"my string with \\\n newline\"}'
and will be retrieved from the db (in step 5) as
'{"test":"my string with \\n newline"}'
To do this, use the escape_string() function of the MysqlDb connection object. This will also escape single quotes, so you can safely write your insert or update stmts like this:
import MySQLdb

conn = MySQLdb.connect(host= dbhost, user= dbuser, passwd= dbpwd, db= dbname, use_unicode = True)
cursor = conn.cursor()
mysql_json_string = conn.escape_string(json_string)
query = "UPDATE mytable SET json_data = '%s' WHERE id = %s" % (mysql_json_string, id)
cursor.execute(query)
conn.commit()
cursor.close()
conn.close()
conn.escape_string() can handle unicode, but make sure your python unicode string has the same character set as your database table. Otherwise, conn.escape_string() will implicitly call decode(), which may result in a UnicodeEncodeError being raised.

Comments

Popular posts from this blog

python port sniffer with pcapy and impacket

Django field, form and model validation process