Python の Tips

きちんと DTD をたどれない XML を簡易 parse する

Sax2 ではなくて minidom を使う

from xml.dom import minidom
import urllib

stream = urllib.urlopen(args, proxies = {'http': 'http://localhost:8080/'})
dom = minidom.parseString(stream.read())
elms = dom.getElementsByTagName("item")
title = dom.getElementsByTagName("title")

HTML entity の unescape を行う

cgi module には escape しかない.

def unescape(s):
  s = s.replace("&lt;", "<")
  s = s.replace("&gt;", ">")
  # this has to be last:
  s = s.replace("&amp;", "&")
  return s

formatstring ではまる

>>> print("%d/%d" % 3, 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
>>> print("%d/%d" % (3, 7))
3/7