python防xss注入 – 源码巴士

什么是xss注入攻击
可以查看这篇文章
其实主要就是转换特定的字符，在某些接口前转换出来或者在前端做处理转换出来，这篇文章只是后端的转换和恢复
转换
quote表示是否要转换引号

>>> import cgi
>>> cgi.escape('<script>&"', quote=True)
'&lt;script&gt;&amp;&quot;'

转换回来
有多种方式，针对纯英文环境，可以使用htmlparser

>>> HTMLParser.unescape.__func__(HTMLParser, '&lt;script&gt;&amp;&quot;')
u'<script>&"'

发现一个问题，会decode成unicode，所以会出现其他类型字符无法解析的问题，比如：

HTMLParser.unescape.__func__(HTMLParser, 'ss&amp;\xe9')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 475, in unescape
    return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

因此可以使用xml反解

>>> from xml.sax.saxutils import unescape
>>> unescape('&lt;script&gt;&amp;&quot;', {"&apos;": "'", "&quot;": '"'})
'<script>&"'

后面的字典里放的是自定义转换的内容，因为xml不认为引号是特殊字符，所以要补上

原文链接：https://blog.csdn.net/weixin_41571449/article/details/79549693