##示例1:去除script
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup
html = '''
<script>a</script>
baba
<script>b</script>
<h1>hi, world</h1>
'''
soup = BeautifulSoup('<script>a</script>baba<script>b</script><h1>')
[s.extract() for s in soup('script')]
print soup
输出:
baba<h1></h1>
可以使用这种方法去除其他标签、以及其中内容。
也可以将
[s.extract() for s in soup('script')]
替换为:
[s.extract() for s in soup.findAll('script')]
##示例2:去除注释
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup, Comment
data = """<div class="foo">
cat dog sheep goat
<!--
<p>test</p>
-->
</div>"""
soup = BeautifulSoup(data)
for element in soup(text=lambda text: isinstance(text, Comment)):
element.extract()
print soup.prettify()
输出结果:
<div class="foo">
cat dog sheep goat
</div>
版权声明:本文为letiantian原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。