SHARE
SPREAD
HELP

The Tradition of Sharing

Help your friends and juniors by posting answers to the questions that you know. Also post questions that are not available.


To start with, Sr2Jr’s first step is to reduce the expenses related to education. To achieve this goal Sr2Jr organized the textbook’s question and answers. Sr2Jr is community based and need your support to fill the question and answers. The question and answers posted will be available free of cost to all.

 

#
Authors:
Walter Savitch ,kenrick Mock
Chapter:
Standard Template Library
Exercise:
Programming Projects
Question:8 | ISBN:9780132846813 | Edition: 5

Question

The field of information retrieval is concerned with finding relevant electronic documents based on a query. For example, given a group of keywords, a search engine retrieves Web pages (documents) and displays them in order, with the most relevant documents listed first. This technology requires a way to compare a document with the query to see which is most relevant to the query.

A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, consider the following sample document:

“Cows are big. Cows go moo. I love cows.”

This document would be parsed into keywords where case is ignored and punctuation discarded and turned into the set containing the words “{cows, are, big, go, moo, i, love}”. An identical process is performed on the query.

Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity between the query and document is computed by



For example, if D = {cows, are, big, go, moo, i, love} and Q = {love, holstein, cows} then




Write a program that allows the user to input a set of strings that represents a document and a set of strings that represents a query. (If you are more ambitious, you could write a program that parses an actual text file and computes the set of unique strings.) Represent the document and query as an STL set of strings. Then compute and print out the similarity between the query and document using the binary cosine coefficient. The sqrt function is in cmath. Use the generic set_intersection function to compute the intersection of Q and D.

Here is an example of set_intersection to intersect set A with B and store the result in C, where all sets are sets of strings:


#include <iterator>

#include <algorithm>

#include <set>

#include <string>

...

using std::insert_iterator;

set<string> A,B,C;

// Code below assumes strings have been inserted into A and B

// Note space between > > in line below

insert_iterator<set<string> > cIterator(C, C.begin( ));

set_intersection(A.begin( ), A.end( ),

B.begin( ),B.end( ),

cIterator);

// set C now contains the intersection of A and B


TextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbookTextbook

Sorry the answer is not available at the moment…

If you are able to find the answer, please make sure to post it here. So that your Juniors have smile on their lips and feel happy.

Spread the 'tradition of sharing'.