Write a method called stripHtmlTags that accepts a Scanner

Authors:

Stuart Reges, Marty Stepp

Chapter:

File Processing

Exercise:

Exercises

Question:10 | ISBN:9780136091813 | Edition: 2

Question

Write a method called stripHtmlTags that accepts a Scanner representing an input file containing an HTML web page as its parameter, then reads that file and prints the file’s text with all HTML tags removed. A tag is any text between the characters < and >. For example, consider the following text:

<html>
<head>
<title>My web page</title>
</head>
<body>
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:<img src="cat.jpg">
</body>
</html>

If the file contained these lines, your program should output the following text:
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.

Here's my cat now:
You may assume that the file is a well-formed HTML document and that there are no < or > characters inside tags.

Answer

// package chap456;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class StripHTMLTags {

    public static void stripHtml(Scanner input) {

        while (input.hasNextLine()) {
            String htmlLine = input.nextLine();

            // at first set flag to true,
             // while reading the each character, if we interfere "<" we'll set the 
             // flag to false and stop reading and once we get ">" character we set the                                 
             // flag to true and read the  characters as usual. 
              
            boolean flag = true;

            for (int i = 0; i < htmlLine.length(); i++) {
                if (htmlLine.charAt(i) == '<')
                    flag = false;
                else if (htmlLine.charAt(i) == '>') {
                    flag = true;
                    System.out.println();
                }
                else if (flag) {
                    System.out.print(htmlLine.charAt(i));
                }
            }

        }

    }

    public static void main(String[] args) throws FileNotFoundException {

        String str = System.getProperty("user.dir") +"\\resources\\stripHtml.txt";
        File file = new File(str);
        Scanner input = new Scanner(file);
        stripHtml(input);
        
        
    }

}

input file: stripHtml.txt

 <html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>

Output:

My web page


There are many pictures of my cat here,as well as my 
very cool
 blog page,which contains 
awesomestuff about my trip to Vegas.
Here's my cat now:

Discussions

Post the discussion to improve the above solution.

The Tradition of Sharing

Help your friends and juniors by posting answers to the questions that you know. Also post questions that are not available.

Why should I post the question or an answer?

Question

Answer

Discussions