Issue
I am writing this piece of code to remove stop words from my text.
Problem - This code works perfectly for removing stopwords but the problem arises when words like ant, ide is present in my text as it removes both words ant and ide because ant is present in important, want and ide is present in side. But I don't want to split words into a letter to remove stopwords.
String sCurrentLine;
List<String> stopWordsofwordnet=new ArrayList<>();
FileReader fr=new FileReader("G:\\stopwords.txt");
BufferedReader br= new BufferedReader(fr);
while ((sCurrentLine = br.readLine()) != null)
{
stopWordsofwordnet.add(sCurrentLine);
}
//out.println("<br>"+stopWordsofwordnet);
List<String> wordsList = new ArrayList<>();
String text = request.getParameter("textblock");
text=text.trim().replaceAll("[\\s,;]+", " ");
String[] words = text.split(" ");
// wordsList.addAll(Arrays.asList(words));
for (String word : words) {
wordsList.add(word);
}
out.println("<br>");
//remove stop words here from the temp list
for (int i = 0; i < wordsList.size(); i++)
{
// get the item as string
for (int j = 0; j < stopWordsofwordnet.size(); j++)
{
if (stopWordsofwordnet.get(j).contains(wordsList.get(i).toLowerCase()))
{
out.println(wordsList.get(i)+" ");
wordsList.remove(i);
i--;
break;
}
}
}
out.println("<br>");
for (String str : wordsList) {
out.print(str+" ");
}
Solution
Your code is overly complicated, and can be reduced to this:
// Load stop words from file
Set<String> stopWords = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
stopWords.addAll(Files.readAllLines(Paths.get("G:\\stopwords.txt")));
// Get text and split into words
String text = request.getParameter("textblock");
List<String> wordsList = new ArrayList<>(Arrays.asList(
text.replaceAll("[\\s,;]+", " ").trim().split(" ")));
// Remove stop words from list of words
wordsList.removeAll(stopWords);
Answered By - Andreas
Answer Checked By - Clifford M. (JavaFixing Volunteer)