Write a method called stripHtmlTags that accepts a Scanner representing an input file containing an HTML web page as its parameter, then reads that file and prints the file’s text with all HTML tags removed. A tag is any text between the characters < and >. For example, consider the following text:
<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>
If the file contained these lines, your program should output the following text:
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:
You may assume that the file is a well-formed HTML document and that there are no < or > characters inside tags.
// package chap456;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class StripHTMLTags {
public static void stripHtml(Scanner input) {
while (input.hasNextLine()) {
String htmlLine = input.nextLine();
// at first set flag to true,
// while reading the each character, if we interfere "<" we'll set the
// flag to false and stop reading and once we get ">" character we set the
// flag to true and read the characters as usual.
boolean flag = true;
for (int i = 0; i < htmlLine.length(); i++) {
if (htmlLine.charAt(i) == '<')
flag = false;
else if (htmlLine.charAt(i) == '>') {
flag = true;
System.out.println();
}
else if (flag) {
System.out.print(htmlLine.charAt(i));
}
}
}
}
public static void main(String[] args) throws FileNotFoundException {
String str = System.getProperty("user.dir") +"\\resources\\stripHtml.txt";
File file = new File(str);
Scanner input = new Scanner(file);
stripHtml(input);
}
}
input file: stripHtml.txt
<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>
Output:
My web page
There are many pictures of my cat here,as well as my
very cool
blog page,which contains
awesomestuff about my trip to Vegas.
Here's my cat now: