Class NRTSuggester
- All Implemented Interfaces:
Accountable
CompletionScorer
See lookup(CompletionScorer, Bits, TopSuggestDocsCollector)
for more implementation
details.
FST Format:
- Input: analyzed forms of input terms
- Output: Pair<Long, BytesRef> containing weight, surface form and docID
NOTE:
- having too many deletions or using a very restrictive filter can make the search
inadmissible due to over-pruning of potential paths. See
CompletionScorer.accept(int, Bits)
- when matched documents are arbitrarily filtered (
CompletionScorer.filtered
set totrue
, it is assumed that the filter will roughly filter out half the number of documents that match the provided automaton - lookup performance will degrade as more accepted completions lead to filtered out documents
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static final class
Helper to encode/decode payload (surface + PAYLOAD_SEP + docID) outputprivate static class
Compares partial completion paths usingCompletionScorer.score(float, float)
, breaks ties comparing path inputs -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final FST<PairOutputs.Pair<Long,
BytesRef>> FST<Weight,Surface>: input is the analyzed form, with a null byte between terms and aNRTSuggesterBuilder.END_BYTE
to denote the end of the input weight is a long surface is the original, unanalyzed form followed by the docIDprivate static final long
Maximum queue depth for TopNSearcherprivate final int
Highest number of analyzed paths we saw for any single input surface form.private final int
Separator used between surface form and its docID in the FST outputFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
NRTSuggester
(FST<PairOutputs.Pair<Long, BytesRef>> fst, int maxAnalyzedPathsPerOutput, int payloadSep) -
Method Summary
Modifier and TypeMethodDescriptionprivate static double
calculateLiveDocRatio
(int numDocs, int maxDocs) (package private) static long
decode
(long output) (package private) static long
encode
(long input) Returns nested resources of this class.private static Comparator<PairOutputs.Pair<Long,
BytesRef>> private int
getMaxTopNSearcherQueueSize
(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled) Simple heuristics to try to avoid over-pruning potential suggestions by the TopNSearcher.static NRTSuggester
load
(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode) void
lookup
(CompletionScorer scorer, Bits acceptDocs, TopSuggestDocsCollector collector) Collects at mostTopSuggestDocsCollector.getCountToCollect()
completions that match the providedCompletionScorer
.long
Return the memory usage of this object in bytes.private static boolean
shouldLoadFSTOffHeap
(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)
-
Field Details
-
fst
FST<Weight,Surface>: input is the analyzed form, with a null byte between terms and aNRTSuggesterBuilder.END_BYTE
to denote the end of the input weight is a long surface is the original, unanalyzed form followed by the docID -
maxAnalyzedPathsPerOutput
private final int maxAnalyzedPathsPerOutputHighest number of analyzed paths we saw for any single input surface form. This can be > 1, when index analyzer creates graphs or if multiple surface form(s) yields the same analyzed form -
payloadSep
private final int payloadSepSeparator used between surface form and its docID in the FST output -
MAX_TOP_N_QUEUE_SIZE
private static final long MAX_TOP_N_QUEUE_SIZEMaximum queue depth for TopNSearcherNOTE: value should be <= Integer.MAX_VALUE
- See Also:
-
-
Constructor Details
-
NRTSuggester
private NRTSuggester(FST<PairOutputs.Pair<Long, BytesRef>> fst, int maxAnalyzedPathsPerOutput, int payloadSep)
-
-
Method Details
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getChildResources
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResources
in interfaceAccountable
- See Also:
-
lookup
public void lookup(CompletionScorer scorer, Bits acceptDocs, TopSuggestDocsCollector collector) throws IOException Collects at mostTopSuggestDocsCollector.getCountToCollect()
completions that match the providedCompletionScorer
.The
CompletionScorer.automaton
is intersected with thefst
.CompletionScorer.weight
is used to compute boosts and/or extract context for each matched partial paths. A top N search is executed onfst
seeded with the matched partial paths. Upon reaching a completed path,CompletionScorer.accept(int, Bits)
andCompletionScorer.score(float, float)
is used on the document id, index weight and query boost to filter and score the entry, before being collected viaTopSuggestDocsCollector.collect(int, CharSequence, CharSequence, float)
- Throws:
IOException
-
getComparator
-
getMaxTopNSearcherQueueSize
private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled) Simple heuristics to try to avoid over-pruning potential suggestions by the TopNSearcher. Since suggestion entries can be rejected if they belong to a deleted document, the length of the TopNSearcher queue has to be increased by some factor, to account for the filtered out suggestions. This heuristic will try to make the searcher admissible, but the search can still lead to over-pruningIf a
filter
is applied, the queue size is increased by half the number of live documents.The maximum queue size is
MAX_TOP_N_QUEUE_SIZE
-
calculateLiveDocRatio
private static double calculateLiveDocRatio(int numDocs, int maxDocs) -
shouldLoadFSTOffHeap
private static boolean shouldLoadFSTOffHeap(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode) -
load
public static NRTSuggester load(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode) throws IOException - Throws:
IOException
-
encode
static long encode(long input) -
decode
static long decode(long output)
-