Skip to content Skip to sidebar Skip to footer

Why Is This Tag Empty When Parsed With Beautiful Soup?

I am parsing this page with beautiful soup: https://au.finance.yahoo.com/q/is?s=AAPL I am attempting to get the total revenue for 27/09/2014 (42,123,000) which is one of the first

Solution 1:

There is no <tbody> tag in the HTML.

If you look at the page with a browser (e.g. with Chrome developer tools) it looks like there is a <tbody> tag, but that's a fake tag inserted into the DOM by Chrome.

Try omitting both tags in your search chain. I am certain the first one isn't there and (although the HTML is hard to read) I'm pretty sure the second isn't there either.

Update: Here are the HTML beginning with the table you are interested in:

<TABLEclass="yfnc_tabledata1"width="100%"cellpadding="0"cellspacing="0"border="0"><TR><TD><TABLEwidth="100%"cellpadding="2"...><TRclass="yfnc_modtitle1"style="border-top:none;"><tdcolspan="2"style="border-top:2px solid #000;"><small><spanclass="yfi-module-title">Period Ending</span></small></td><thscope="col"style="border-top:2px ...">27/09/2014</th><thscope="col"style="border-top:2px ...">28/06/2014</th>
          ...

so no <tbody> tags.

Solution 2:

Let's be specific and practical.

The idea is to find the Total Revenue label and get the next cell's text using .next_sibling:

table = soup.find("table", class_="yfnc_tabledata1")
total_revenue_label = table.find(text=re.compile(r'Total Revenue'))
print total_revenue_label.parent.parent.next_sibling.get_text(strip=True)

Demo:

>>>import re>>>import requests>>>import bs4>>>>>>page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")>>>soup = bs4.BeautifulSoup(page.content)>>>>>>table = soup.find("table", class_="yfnc_tabledata1")>>>total_revenue_label = table.find(text=re.compile(r'Total Revenue'))>>>total_revenue_label.parent.parent.next_sibling.get_text(strip=True)
42,123,000

Solution 3:

To answer your general question:

I suggest book "Mining the Social Web" second edition. Specially chapter 5 - "Mining Web Pages".

Source code for the book is available here on github.

Solution 4:

I think there are probably better ways of getting the data you want? It's been provided for free for a number of years by a number of institutions, e.g. is the information you want in here somewhere?

http://www.afr.com/share_tables/

Post a Comment for "Why Is This Tag Empty When Parsed With Beautiful Soup?"