8、只捕获单词,去掉空格
使用()捕获,这是不准确版本,请参看第9个
import re s='This module provides regular expression matching operations similar to those found in Perl' pat=r's([a-zA-Z]+)' r=re.findall(pat,s) print(r) #['module', 'provides', 'regular', 'expression', 'matching', 'operations', 'similar', 'to', 'those', 'found', 'in', 'Perl']
9、补充上第一个单词
上面第8,看到提取单词中未包括第一个单词,使用?表示前面字符出现0次或1次,但是此字符还有表示贪心或非贪心匹配含义,使用时要谨慎。
import re s='This module provides regular expression matching operations similar to those found in Perl' pat=r's?([a-zA-Z]+)' r=re.findall(pat,s) print(r) #['This', 'module', 'provides', 'regular', 'expression', 'matching', 'operations', 'similar', 'to', 'those', 'found', 'in', 'Perl']
10、使用split函数直接分割单词
使用以上方法分割单词,不是简洁的,仅仅为了演示。分割单词最简单还是使用split函数。
import re s = 'This module provides regular expression matching operations similar to those found in Perl' pat = r's+' r = re.split(pat,s) print(r) #['This', 'module', 'provides', 'regular', 'expression', 'matching', 'operations', 'similar', 'to', 'those', 'found', 'in', 'Perl']
11、提取以m或t开头的单词,忽略大小写
下面出现的结果不是我们想要的,原因出在 ?上!
import re s='This module provides regular expression matching operations similar to those found in Perl' pat=r's?([mt][a-zA-Z]*)' # 查找以 r=re.findall(pat,s) print(r) #['module', 'matching', 'tions', 'milar', 'to', 'those']
12、使用^查找字符串开头的单词
综合11和12得到所有以m或t开头的单词
import re s='This module provides regular expression matching operations similar to those found in Perl' pat=r'^([mt][a-zA-Z]*)s' # 查找以 r=re.compile(pat,re.I).findall(s) print(r) #['This']
13、先分割,再查找满足要求的单词
使用match表示是否匹配
import re s='This module provides regular expression matching operations similar to those found in Perl' pat=r's+' r=re.split(pat,s) res=[i for i in r if re.match(r'[mMtT]',i)] print(res) #['This', 'module', 'matching', 'to', 'those']
14、贪心匹配
尽可能多的匹配字符
import re content='<h>ddedadsad</h><div>graph</div>bb<div>math</div>cc' pat=re.compile(r"<div>(.*)</div>") #贪婪模式 m=pat.findall(content) print(m) #['graph</div>bb<div>math']









