I was actually testing my final year project today and there was so much noise in the data, I was really frustrated by the end of it.
One of the most troublesome and difficult to figure out was urllib.quote(movie) function.
You should see movie titles people update, here are a few.
I ♥ Bollywood, Funniest movies ever ツ
That really seemed a challenge to be sent over for an API call. We thought of stripping them off in the sentence, but there are a few French and Italian movies which always have some or the other odd character in them. I used quote() function and was getting KeyError exception.
Finally I figured it out, you have to encode it into UTF-8 so that they can be sent across. So while calling a URL, if it has any special characters in it, better encode it and sent it accross.
Example: (Google App Engine)
import urllib
from google.appengine.api import urlfetch
data = u'♥+ツ'
url = 'http://www.google.com?search='
response = urlfetch.fetch(url + urllib.quote(data.encode(encoding = 'UTF-8')))
if response.status_code == 200:
output = response.content
self.response.out.write(output)
I wasted almost an hour on figuring out what to do. If you ever get a KeyError when you are using URLLib.Quote, then this is the solution.