Python清除html文件中内容的方法
方法一:In : str_ = ''
...: flag = 1
...: for ele in test:
...: if ele == "<":
...: flag = 0
...: elif ele == '>':
...: flag = 1
...: continue
...: if flag == 1:
...: str_ += ele
...:
In : str_
Out: 'just for testjust for testtest'
In : str_ = ''
...: flag = 1
...: for ele in test:
...: if ele == "<":
...: flag = 0
...: elif ele == '>':
...: flag = 1
...: ele = ' '
...: if flag == 1:
...: str_ += ele
...:
In : str_
Out: ' just for test just for testtest '
方法二:
import re
In : pat = re.compile('(?<=\>).*?(?=\<)')
In : pat.findall(test)
Out: ['just for test', '', '', 'just for test', '', 'test']
In : ''.join(pat.findall(test))
Out: 'just for testjust for testtest'
方法三:
pat = re.compile('>(.*?)<')
''.join(pat.findall(test))
方法四:
In : pat = re.compile('<[^>]+>', re.S)
In : pat.sub('', test)
Out: 'just for testjust for testtest'
页:
[1]