Issue
I'm trying to test the Hello word of Stanford POS tagger API in Java (I used the same .jar in python and it worked well) on french sentences. Here is my code
public class TextPreprocessor {
private static MaxentTagger tagger=new MaxentTagger("../stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger");
public static void main(String[] args) {
String taggedString = tagger.tagString("Salut à tous, je suis coincé");
System.out.println(taggedString);
}
}
But I get the following exception:
Loading POS tagger from C:/Users/_Nprime496_/Downloads/Compressed/stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger ... done [0.3 sec].
Exception in thread "main" java.lang.IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes
at edu.stanford.nlp.process.PTBLexer.<init>(PTBLexer.java)
at edu.stanford.nlp.process.PTBTokenizer.<init>(PTBTokenizer.java:285)
at edu.stanford.nlp.process.PTBTokenizer$PTBTokenizerFactory.getTokenizer(PTBTokenizer.java:698)
at edu.stanford.nlp.process.DocumentPreprocessor$PlainTextIterator.<init>(DocumentPreprocessor.java:271)
at edu.stanford.nlp.process.DocumentPreprocessor.iterator(DocumentPreprocessor.java:226)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.tokenizeText(MaxentTagger.java:1148)
at edu.stanford.nlp.tagger.maxent.MaxentTagger$TaggerWrapper.apply(MaxentTagger.java:1332)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagString(MaxentTagger.java:999)
at modules.generation.preprocessing.TextPreprocessor.main(TextPreprocessor.java:19)
Can you help me?
Solution
You can use this code and the full CoreNLP package:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import java.util.*;
public class PipelineExample {
public static String text = "Paris est la capitale de la France.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = StringUtils.argsToProperties("-props", "french");
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,mwt,pos");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = pipeline.processToCoreDocument(text);
// display tokens
for (CoreLabel tok : document.tokens()) {
System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
}
}
}
You can download CoreNLP here: https://stanfordnlp.github.io/CoreNLP/
Make sure to download the latest French models.
I am not sure why your example with the standalone tagger does not work. What jars were you using?
Answered By - StanfordNLPHelp
Answer Checked By - Cary Denson (JavaFixing Admin)