Skip to content Skip to sidebar Skip to footer

Parse Html File Using Python Without External Module

I am trying to Parse a html file using Python without using any external module. The reason is I am triggering a jenkins job and running into some import issues with lxml and Beau

Solution 1:

For one element you could try to use re module or even string functions.

data = '''<trclass="test"><tdclass="test"><ahref="no.html">track</a></td><tdclass="duration">0.390s</td><tdclass="zero number">0</td><tdclass="zero number">0</td><tdclass="zero number">0</td><tdclass="passRate">N/A</td></tr><trclass="suite"><tdcolspan="2"class="totalLabel">Total</td><tdclass="passed number">271</td><tdclass="zero number">0</td><tdclass="failed number">3</td><tdclass="passRate suite">98%</td></tr>'''

# re module

import re

print(re.search('suite">(\d+)%', data).group(1))

# string functions

before = 'passRate suite">'
after  = '%'
start = data.find(before) + len(before)
stop  = data.find(after, start)

print(data[start:stop])

EDIT: to get othere values with re

import re

print('passed:', re.search('passed number">(\d+)', data).group(1))
print('zero:', re.search('zero number">(\d+)', data).group(1))
print('failed:', re.search('zero number">(\d+)', data).group(1))
print('Rate:', re.search('suite">(\d+)', data).group(1))

passed: 271
zero: 0
failed: 0
Rate: 98

Solution 2:

import re

f =open(HTML_FILE)
data = f.read()
before ='<td colspan="2" class="totalLabel">Total</td>'
after  ='%<'start= data.find(before) + len(before)
stop  = data.find(after, start)

suite_filter = data[start:stop].strip()

RATE_PASS = re.search('suite">[ \n]+(\d+)', suite_filter).group(1)
PASS_COUNT = re.search('passed number">(\d+)', suite_filter).group(1)
SKIPPED_COUNT = re.search('zero number">(\d+)', suite_filter).group(1)

FAIL_COUNT = re.search('failed number">(\d+)', suite_filter).group(1)

TESTS_TOTAL =int(PASS_COUNT) +int(SKIPPED_COUNT) +int(FAIL_COUNT)

print RATE_PASS, PASS_COUNT, SKIPPED_COUNT, TESTS_TOTAL

Here is my solution as per the suggestions from @furas. Any improvements/suggestions are welcomed.

Post a Comment for "Parse Html File Using Python Without External Module"