SHARE
SPREAD
HELP

The Tradition of Sharing

Help your friends and juniors by posting answers to the questions that you know. Also post questions that are not available.


To start with, Sr2Jr’s first step is to reduce the expenses related to education. To achieve this goal Sr2Jr organized the textbook’s question and answers. Sr2Jr is community based and need your support to fill the question and answers. The question and answers posted will be available free of cost to all.

 

#
Authors:
Stuart Reges, Marty Stepp
Chapter:
File Processing
Exercise:
Exercises
Question:10 | ISBN:9780136091813 | Edition: 2

Question

Write a method called stripHtmlTags that accepts a Scanner representing an input file containing an HTML web page as its parameter, then reads that file and prints the file’s text with all HTML tags removed. A tag is any text between the characters < and >. For example, consider the following text:

<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>

If the file contained these lines, your program should output the following text:
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.

Here's my cat now:
You may assume that the file is a well-formed HTML document and that there are no < or > characters inside tags.

TextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbook

Answer

// package chap456;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class StripHTMLTags {

    public static void stripHtml(Scanner input) {

        while (input.hasNextLine()) {
            String htmlLine = input.nextLine();

            // at first set flag to true,
             // while reading the each character, if we interfere "<" we'll set the 
             // flag to false and stop reading and once we get ">" character we set the                                 
             // flag to true and read the  characters as usual. 
              
            boolean flag = true;

            for (int i = 0; i < htmlLine.length(); i++) {
                if (htmlLine.charAt(i) == '<')
                    flag = false;
                else if (htmlLine.charAt(i) == '>') {
                    flag = true;
                    System.out.println();
                }
                else if (flag) {
                    System.out.print(htmlLine.charAt(i));
                }
            }

        }

    }

    public static void main(String[] args) throws FileNotFoundException {

        String str = System.getProperty("user.dir") +"\\resources\\stripHtml.txt";
        File file = new File(str);
        Scanner input = new Scanner(file);
        stripHtml(input);
        
        
    }

}
input file: stripHtml.txt

 <html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>
Here's my cat now:<img src="cat.jpg">
</body>
</html>
Output:

My web page


There are many pictures of my cat here,as well as my 
very cool
 blog page,which contains 
awesomestuff about my trip to Vegas.
Here's my cat now:

 

0 0

Discussions

Post the discussion to improve the above solution.