Monday, February 7, 2022

[FIXED] Stanford NER: Can I use two classifiers at once in my code?

February 07, 2022 classification, netbeans, stanford-nlp

Issue

In my code, I get the Person recognition from the first classifier, and for the second one which I made, I added some words to be recognized or annotated as Organization but it does not annotate Person.

I need to get the benefit from the two of them, how can I do that?

I'm using Netbeans, and this is the code:

String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
String serializedClassifier2 = "/Users/ha/stanford-ner-2014-10-26/classifiers/dept-model.ser.gz";

if (args.length > 0) {
  serializedClassifier = args[0];
}

AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(serializedClassifier);
AbstractSequenceClassifier<CoreLabel> classifier2 = CRFClassifier.getClassifier(serializedClassifier2);

  String fileContents = IOUtils.slurpFile("/Users/ha/NetBeansProjects/NERtry/src/nertry/input.txt");
  List<List<CoreLabel>> out = classifier.classify(fileContents);
  List<List<CoreLabel>> out2 = classifier2.classify(fileContents);

  for (List<CoreLabel> sentence : out) {
      System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: ");
    for (CoreLabel word : sentence) {
      System.out.print(word.word() + '/' + word.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

  for (List<CoreLabel> sentence2 : out2) {
      System.out.print("\ndept-model.ser.gz");
    for (CoreLabel word2 : sentence2) {
      System.out.print(word2.word() + '/' + word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

    System.out.println();
  }
}

The problem comes from the result I get:

english.all.3class.distsim.crf.ser.gz: What/O date/O did/O James/PERSON started/O his/O job/O in/O Human/O and/O Finance/O ?/O 
dept-model.ser.gzWhat/O date/O did/O James/ORGANIZATION started/O his/O job/O in/O Human/ORGANIZATION and/O Finance/ORGANIZATION ?/O

where it recognize the names as organization from the second classifier, and I need it to be annotated as PERSON. Any help?

Solution

The class you should use to make this easy is NERClassifierCombiner. Its semantics is that it runs the classifiers in order from left to right as you specify them (any number can be given to it in the constructor), and that later classifiers cannot annotate an entity that overlaps with an entity tagging of an earlier classifier, but are otherwise free to add annotations. So, earlier classifiers are preferred in a simple preference ranking. I give a complete code example below.

(If you are training all your own classifiers, it is generally best to train all the entities together, so they can influence each other in the categories assigned. But this simple preference ordering usually works pretty well, and we use it ourselves.)

import edu.stanford.nlp.ie.NERClassifierCombiner;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreLabel;

import java.io.IOException;
import java.util.List;

public class MultipleNERs {

  public static void main(String[] args) throws IOException {
    String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
    String serializedClassifier2 = "classifiers/english.muc.7class.distsim.crf.ser.gz";

    if (args.length > 0) {
      serializedClassifier = args[0];
    }

    NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
            serializedClassifier, serializedClassifier2);

    String fileContents = IOUtils.slurpFile("input.txt");
    List<List<CoreLabel>> out = classifier.classify(fileContents);

    int i = 0;
    for (List<CoreLabel> lcl : out) {
      i++;
      int j = 0;
      for (CoreLabel cl : lcl) {
        j++;
        System.out.printf("%d:%d: %s%n", i, j,
                cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
      }
    }
  }

}

Answered By - Christopher Manning
Answer Checked By - Terry (JavaFixing Volunteer)

This Answer collected from stackoverflow and tested by JavaFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, February 7, 2022

[FIXED] Stanford NER: Can I use two classifiers at once in my code?

Issue

Solution

Popular Posts

Labels