java.lang.Object
org.apache.lucene.util.fst.FSTCompiler<T>
Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with
outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a
compact serialized format byte array, which can be saved to / loaded from a Directory or used
directly for traversal. The FST is always finite (no cycles).
NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698
The parameterized type T is the output type. See the subclasses of Outputs
.
FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static class
Expert: holds a pending (seen but not yet serialized) arc.static class
Fluent-style constructor for FSTFSTCompiler
.(package private) static final class
(package private) static class
Reusable buffer for building nodes with fixed length arcs (binary search or direct addressing).(package private) static interface
(package private) static final class
Expert: holds a pending (seen but not yet serialized) Node. -
Field Summary
FieldsModifier and TypeFieldDescription(package private) final boolean
(package private) long
(package private) long
(package private) final BytesStore
(package private) static final float
(package private) long
(package private) final float
(package private) long
private final boolean
(package private) final FSTCompiler.FixedLengthArcsBuffer
private FSTCompiler.UnCompiledNode<T>[]
(package private) long
private final IntsRefBuilder
private final int
private final int
private final T
(package private) long
(package private) int[]
(package private) int[]
private final int
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
FSTCompiler
(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, boolean allowFixedLengthArcs, int bytesPageBits, float directAddressingMaxOversizingFactor) FSTCompiler
(FST.INPUT_TYPE inputType, Outputs<T> outputs) Instantiates an FST/FSA builder with default settings and pruning options turned off. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add the next input/output pair.compile()
Returns final FST.private void
compileAllTargets
(FSTCompiler.UnCompiledNode<T> node, int tailLength) private FSTCompiler.CompiledNode
compileNode
(FSTCompiler.UnCompiledNode<T> nodeIn, int tailLength) private void
freezeTail
(int prefixLenPlus1) long
long
float
long
long
long
private boolean
validOutput
(T output)
-
Field Details
-
DIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR
static final float DIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR- See Also:
-
dedupHash
-
fst
-
NO_OUTPUT
-
minSuffixCount1
private final int minSuffixCount1 -
minSuffixCount2
private final int minSuffixCount2 -
lastInput
-
frontier
-
lastFrozenNode
long lastFrozenNode -
numBytesPerArc
int[] numBytesPerArc -
numLabelBytesPerArc
int[] numLabelBytesPerArc -
fixedLengthArcsBuffer
-
arcCount
long arcCount -
nodeCount
long nodeCount -
binarySearchNodeCount
long binarySearchNodeCount -
directAddressingNodeCount
long directAddressingNodeCount -
allowFixedLengthArcs
final boolean allowFixedLengthArcs -
directAddressingMaxOversizingFactor
final float directAddressingMaxOversizingFactor -
directAddressingExpansionCredit
long directAddressingExpansionCredit -
bytes
-
-
Constructor Details
-
FSTCompiler
Instantiates an FST/FSA builder with default settings and pruning options turned off. For more tuning and tweaking, seeFSTCompiler.Builder
. -
FSTCompiler
private FSTCompiler(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, boolean allowFixedLengthArcs, int bytesPageBits, float directAddressingMaxOversizingFactor)
-
-
Method Details
-
getDirectAddressingMaxOversizingFactor
public float getDirectAddressingMaxOversizingFactor() -
getTermCount
public long getTermCount() -
getNodeCount
public long getNodeCount() -
getArcCount
public long getArcCount() -
getMappedStateCount
public long getMappedStateCount() -
compileNode
private FSTCompiler.CompiledNode compileNode(FSTCompiler.UnCompiledNode<T> nodeIn, int tailLength) throws IOException - Throws:
IOException
-
freezeTail
- Throws:
IOException
-
add
Add the next input/output pair. The provided input must be sorted after the previous one according toIntsRef.compareTo(org.apache.lucene.util.IntsRef)
. It's also OK to add the same input twice in a row with different outputs, as long asOutputs
implements theOutputs.merge(T, T)
method. Note that input is fully consumed after this method is returned (so caller is free to reuse), but output is not. So if your outputs are changeable (egByteSequenceOutputs
orIntSequenceOutputs
) then you cannot reuse across calls.- Throws:
IOException
-
validOutput
-
compile
Returns final FST. NOTE: this will return null if nothing is accepted by the FST.- Throws:
IOException
-
compileAllTargets
private void compileAllTargets(FSTCompiler.UnCompiledNode<T> node, int tailLength) throws IOException - Throws:
IOException
-
fstRamBytesUsed
public long fstRamBytesUsed()
-