Class CFSA2Serializer

java.lang.Object
morfologik.fsa.builders.CFSA2Serializer
All Implemented Interfaces:
FSASerializer

public final class CFSA2Serializer extends Object implements FSASerializer
Serializes in-memory FSA graphs to CFSA2.

It is possible to serialize the automaton with numbers required for perfect hashing. See withNumbers() method.

See Also:
  • Field Details

    • logger

      private final Logger logger
    • flags

      private static final EnumSet<FSAFlags> flags
      Supported flags.
    • NO_STATE

      private static final int NO_STATE
      No-state id.
      See Also:
    • withNumbers

      private boolean withNumbers
      true if we should serialize with numbers.
      See Also:
    • offsets

      private com.carrotsearch.hppc.IntIntHashMap offsets
      A hash map of [state, offset] pairs.
    • numbers

      private com.carrotsearch.hppc.IntIntHashMap numbers
      A hash map of [state, right-language-count] pairs.
    • scratch

      private final byte[] scratch
      Scratch array for serializing vints.
    • labelsIndex

      private byte[] labelsIndex
      The most frequent labels for integrating with the flags field.
    • labelsInvIndex

      private int[] labelsInvIndex
      Inverted index of labels to be integrated with flags field. A label at index i has the index or zero (no integration).
  • Constructor Details

    • CFSA2Serializer

      public CFSA2Serializer()
  • Method Details

    • withNumbers

      public CFSA2Serializer withNumbers()
      Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.
      Specified by:
      withNumbers in interface FSASerializer
      Returns:
      Returns the same object for easier call chaining.
    • serialize

      public <T extends OutputStream> T serialize(FSA fsa, T os) throws IOException
      Serializes any FSA to CFSA2 stream.
      Specified by:
      serialize in interface FSASerializer
      Type Parameters:
      T - A subclass of OutputStream, returned for chaining.
      Parameters:
      fsa - The automaton to serialize.
      os - The output stream to serialize to.
      Returns:
      Returns os for chaining.
      Throws:
      IOException - Rethrown if an I/O error occurs.
      See Also:
    • computeLabelsIndex

      private void computeLabelsIndex(FSA fsa)
      Compute a set of labels to be integrated with the flags field.
    • getFlags

      public Set<FSAFlags> getFlags()
      Return supported flags.
      Specified by:
      getFlags in interface FSASerializer
      Returns:
      Returns the set of flags supported by the serializer (and the output automaton).
    • linearize

      private com.carrotsearch.hppc.IntArrayList linearize(FSA fsa) throws IOException
      Linearization of states.
      Throws:
      IOException
    • log

      private void log(Level level, String msg, Object... args)
    • linearizeAndCalculateOffsets

      private int linearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) throws IOException
      Linearize all states, putting states in front of the automaton and calculating stable state offsets.
      Throws:
      IOException
    • linearizeState

      private void linearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, BitSet visited, int node)
      Add a state to linearized list.
    • computeFirstStates

      private int[] computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount)
      Compute the set of states that should be linearized first to minimize other states goto length.
    • computeInlinkCount

      private com.carrotsearch.hppc.IntIntHashMap computeInlinkCount(FSA fsa)
      Compute in-link count for each state.
    • emitNodes

      private int emitNodes(FSA fsa, OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) throws IOException
      Update arc offsets assuming the given goto length.
      Throws:
      IOException
    • emitNodeArcs

      private int emitNodeArcs(FSA fsa, OutputStream os, int state, int nextState) throws IOException
      Emit all arcs of a single node.
      Throws:
      IOException
    • emitArc

      private int emitArc(OutputStream os, int flags, byte label, int targetOffset) throws IOException
      Throws:
      IOException
    • emitNodeData

      private int emitNodeData(OutputStream os, int number) throws IOException
      Throws:
      IOException
    • withFiller

      public CFSA2Serializer withFiller(byte filler)
      Description copied from interface: FSASerializer
      Sets the filler separator (only if FSASerializer.getFlags() returns FSAFlags.SEPARATORS).
      Specified by:
      withFiller in interface FSASerializer
      Parameters:
      filler - The filler separator byte.
      Returns:
      Returns this for call chaining.
    • withAnnotationSeparator

      public CFSA2Serializer withAnnotationSeparator(byte annotationSeparator)
      Description copied from interface: FSASerializer
      Sets the annotation separator (only if FSASerializer.getFlags() returns FSAFlags.SEPARATORS).
      Specified by:
      withAnnotationSeparator in interface FSASerializer
      Parameters:
      annotationSeparator - The filler separator byte.
      Returns:
      Returns this for call chaining.
    • writeVInt

      static int writeVInt(byte[] array, int offset, int value)
      Write a v-int to a byte array.