How To Get Orphaned Text With Jsoup?
I have an html: This is the first text More text here Another line of text Text in the span Another text in span
Solution 1:
I would go with a recursive method that takes your starting tag and iterates over its child nodes. For each TextNode, print the contents. For each Element, check it for child nodes.
public static void main(String[] args) throws ParseException, IOException
{
//I put your HTML in the body tag in a local file
Document doc = Jsoup.parse(new File("input/20160505.html"), "UTF-8");
Elements elements = doc.getElementsByTag("body");
Element rootTag = elements.get(0);
printTextOfTag(rootTag);
}
public static void printTextOfTag(Element currentTag)
{
List<Node> nodes = currentTag.childNodes();
for(Node n : nodes)
{
if(n instanceof TextNode)
{
System.out.println(((TextNode)n).text());
}
else if(n instanceof Element)
{
printTextOfTag((Element)n);
}
}
}
Output
This is the first text
More text here Another line of text
Text in the span
Another text in span
This is another line
Post a Comment for "How To Get Orphaned Text With Jsoup?"