Class HyphenationCompoundWordTokenFilterFactory

java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory
All Implemented Interfaces:
ResourceLoaderAware

public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Factory for HyphenationCompoundWordTokenFilter.

This factory accepts the following parameters:

  • hyphenator (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.
  • encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
  • dictionary (optional): dictionary of words. defaults to no dictionary.
  • minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
  • minSubwordSize (optional): minimum length of subwords. defaults to 2.
  • maxSubwordSize (optional): maximum length of subwords. defaults to 15.
  • onlyLongestMatch (optional): if true, adds only the longest matching subword to the stream. defaults to false.

 <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
         dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
   </analyzer>
 </fieldType>
Since:
3.1.0
See Also:
  • Field Details

    • NAME

      public static final String NAME
      SPI name
      See Also:
    • dictionary

      private CharArraySet dictionary
    • hyphenator

      private HyphenationTree hyphenator
    • dictFile

      private final String dictFile
    • hypFile

      private final String hypFile
    • encoding

      private final String encoding
    • minWordSize

      private final int minWordSize
    • minSubwordSize

      private final int minSubwordSize
    • maxSubwordSize

      private final int maxSubwordSize
    • onlyLongestMatch

      private final boolean onlyLongestMatch
  • Constructor Details

    • HyphenationCompoundWordTokenFilterFactory

      public HyphenationCompoundWordTokenFilterFactory(Map<String,String> args)
      Creates a new HyphenationCompoundWordTokenFilterFactory
    • HyphenationCompoundWordTokenFilterFactory

      public HyphenationCompoundWordTokenFilterFactory()
      Default ctor for compatibility with SPI
  • Method Details