problems importing some feeds
February 10, 2005 | Filed Under Computer |As some may have noticed, importing entries from rss feeds to planet #luusa fails sometimes. planet.py seems to choke on some characters in an url, the traceback is the following:
Traceback (most recent call last):
File “/var/www/luusa/planet/planetlib.py”, line 240, in cache_read
self.update(cache_uri)
File “/var/www/luusa/planet/planetlib.py”, line 184, in update
self._update(baseuri, data)
File “/var/www/luusa/planet/planetlib.py”, line 288, in _update
feed.feed(data)
File “/usr/lib/python2.3/sgmllib.py”, line 95, in feed
self.goahead(0)
File “/usr/lib/python2.3/sgmllib.py”, line 134, in goahead
k = self.parse_endtag(i)
File “/usr/lib/python2.3/sgmllib.py”, line 293, in parse_endtag
self.finish_endtag(tag)
File “/usr/lib/python2.3/sgmllib.py”, line 333, in finish_endtag
self.unknown_endtag(tag)
File “/var/www/luusa/planet/feedparser.py”, line 358, in unknown_endtag
method()
File “/var/www/luusa/planet/feedparser.py”, line 778, in _end_content
value = self.pop(’content’)
File “/var/www/luusa/planet/feedparser.py”, line 480, in pop
output = resolveRelativeURIs(output, self.baseuri)
File “/var/www/luusa/planet/feedparser.py”, line 897, in resolveRelativeURIs
data = p.output()
File “/var/www/luusa/planet/feedparser.py”, line 853, in output
return “”.join(self.pieces)
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 4: ordinal not in range(128)
Maybe someone has time to look into that problem and provide a fix.
[…] ify the exact cause yet. When it tries to merge the byte strings and the unicode strings, this error occurs and causes the offending feed to be ignored. I found a […]
Pingback by Sebastian Kirsch: Blog » Python string handling — 11/2/2005 #