I have a list of strings that I am trying to parse for data that is meaningful to me. I need an ID number that is contained within the string. Sometimes it might be two or even three of them. Example string might be:
lst1 = [
"(Tower 3rd floor window corner_ : option 3_floor cut out_large : GA - floors : : model lines : id 3999595(tower 4rd floor window corner : option 3_floor: : whatever else is in iit " new floor : id 3999999)",
"(Tower 3rd floor window corner_ : option 3_floor cut out_large : GA - floors : : model lines : id 3998895(tower 4rd floor window corner : option 3_floor: : id 5555456 whatever else is in iit " new floor : id 3998899)"
]
I would like to be able to iterate over that list of strings and extract only those highlighted id values.
Output would be a lst1 = ["3999595; 3999999", "3998895; 5555456; 3998899"] where each id values from the same input string is separated by a colon but list order still matches the input list.
解决方案
You can use id\s(\d{7}) regular expression.
Iterate over items in a list and join the results of findall() call by ;:
import re
lst1 = [
'(Tower 3rd floor window corner_ : option 3_floor cut out_large : GA - floors : : model lines : id 3999595(tower 4rd floor window corner : option 3_floor: : whatever else is in iit " new floor : id 3999999)',
'(Tower 3rd floor window corner_ : option 3_floor cut out_large : GA - floors : : model lines : id 3998895(tower 4rd floor window corner : option 3_floor: : id 5555456 whatever else is in iit " new floor : id 3998899)'
]
pattern = re.compile(r'id\s(\d{7})')
print ["; ".join(pattern.findall(item)) for item in lst1]
prints:
['3999595; 3999999', '3998895; 5555456; 3998899']