Package org.apache.uima.cas.impl
Class BinaryCasSerDes
java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes
Binary (mostly non compressed) CAS deserialization The methods in this class were originally part
of the CASImpl, and were moved here to this class for v3
Binary non compressed CAS serialization is in class CASSerializer, but that class uses routines
and data structures in this class.
There is one instance of this class per CAS (shared by all views of that CAS), created at the
same time the CAS is created.
This instance also holds data needed for binary serialization, and deserialization. For binary
delta deserialization, it uses the data computed on a previous serialization, or, if none, it
re-computes it. See scanAllFSsForBinarySerialization method.
The data is computed lazily, and reset with cas reset.
Lifecycle:
created when a CAS (any view) is first created, as part of the shared view data for that CAS.
never re-created.
Data created when non-delta serializing, in case needed when delta-deserializing later:
xxxAuxAddr2fsa maps aux arrays to FSs
heaps and nextXXXHeapAddrAfterMark (in this case mark is the end).
Reset:
Instance Data:
baseCas - ref to the corresponding CAS (final)
tsi - the CAS's type system impl (can change; each use sets it from CAS API)
heaps - there is 1 main heap, and 4 aux heaps (Byte, Short, Long, and String
Some uses of this class require these be materialized. (May be input or output)
for Delta deserialization:
5 ints - representing the first free address in the above 5 heaps, after the mark
For delta deserialization: Maps for Aux arrays representing updatable arrays (not String):
From starting addr in the aux array to the corresponding V3 FS object
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate class
Binary Deserialization Support An instance of this class is made for every reinit operation doing delta deserialization Mainly used to convert addrs into the main heap into their corresponding FS info -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
The number of cells we need to skip to get to the array contents.private static final int
The offset for the array length cell.private final CASImpl
private final Int2ObjHashMap
<TOP, TOP> Map from an aux addr starting address for an array of boolean/byte/short/long/double to the V3 FS.(package private) ByteHeap
(package private) Heap
(package private) boolean
used to calculate total heap sizeprivate final Int2ObjHashMap
<TOP, TOP> (package private) LongHeap
(package private) int
(package private) int
These next are for delta (de)serialization, and identify the first slot in the aux or string tables for new FS data when there's a mark set.(package private) int
(package private) int
(package private) int
private final Int2ObjHashMap
<TOP, TOP> (package private) ShortHeap
private static final boolean
private static final boolean
(package private) StringHeap
private static final boolean
private TypeSystemImpl
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) void
addIdsToIntVector
(Collection<TOP> fss, IntVector v, Obj2IntIdentityHashMap<TOP> fs2addr) (package private) void
addIdsToIntVector
(Set<TOP> fss, IntVector v, Obj2IntIdentityHashMap<TOP> fs2addr) private SerialFormat
build a model of the heap, string and aux heaps.void
clear()
called by cas resetprivate void
private void
private void
createFSsFromHeaps
(boolean isDelta, int startPos, CommonSerDesSequential csds) Given the deserialized main heap, byte heap, short heap, long heap and string heap, a) create the corresponding FSs, populating a b) addr2fs map, key = fsAddr, value = FS c) auxAddr2fs map, key = aux Array Start addr, value = FS corresponding to that primitive bool/byte/short/long/double array For some use cases, the byte / short / long heaps have not yet been initialized.(package private) void
createStringTableFromArray
(String[] stringTable) private void
extractFsToV2Heaps
(TOP fs, boolean isMarkSet, Obj2IntIdentityHashMap<TOP> fs2addr) called in fs._id order to populate heaps from all FSs.(package private) CASImpl
getCas()
(package private) int[]
getDeltaIndexedFSs
(MarkerImpl mark, Obj2IntIdentityHashMap<TOP> fs2addr) static int
getFsSpaceReq
(TOP fs, TypeImpl type) (package private) int[]
getIndexedFSs
(Obj2IntIdentityHashMap<TOP> fs2addr) Serialization support *private Sofa
getSofaFromAnnotBase
(int annotBaseAddr, StringHeap stringHeap2, Int2ObjHashMap<TOP, TOP> addr2fs, CommonSerDesSequential csds) private int
getSortedArrayAddrsIndex
(int[] sortedArrayAddrs, int auxAddr, int sortedArrayAddrsIndex) given an aux address representing an element of an array, find the start of the array Fast path for the same as before array.private int
heapFeat
(int nextFsAddr, FeatureImpl feat) private Sofa
makeSofaFromHeap
(int sofaAddr, StringHeap stringHeap2, CommonSerDesSequential csds, boolean isUnordered) (package private) static CASMgrSerializer
(package private) void
reinit
(int[] heapMetadata, int[] heapArray, String[] stringTable, int[] fsIndex, byte[] byteHeapArray, short[] shortHeapArray, long[] longHeapArray) This is for deserializing (never delta) from a serialized java object representation or maybe from the JNI bridge both callers do a cas reset of some kindreinit
(InputStream istream) see Blob Format in CASSerializer This reads in and deserializes CAS data from a stream.void
reinit
(CASCompleteSerializer casCompSer) Deserializer for CASCompleteSerializer instances - includes type system and index definitions Never deltavoid
reinit
(CASSerializer ser) Deserializer for Java-object serialized instance of CASSerializer.reinit
(CommonSerDes.Header h, InputStream istream, CASMgrSerializer casMgrSerializer, CasLoadMode casLoadMode, BinaryCasSerDes6 f6, AllowPreexistingFS allowPreexistingFS, TypeSystemImpl ts) Deserialize a binary input stream, after reading the header, and optionally an externally provided type system and index spec used in compressed form 6 serialization previously This reads in and deserializes CAS data from a stream.(package private) void
reinitDeltaIndexedFSsInner
(FSIndexRepositoryImpl ir, int[] fsindexes, int idx, int length, boolean isAdd, IntFunction<TOP> getFsFromAddr) Given a list of FSs and a starting index and length: iterate over the FSs, and add or remove that from the indexes.(package private) void
reinitIndexedFSs
(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr) This routine is used by several of the deserializers.(package private) void
reinitIndexedFSs
(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr, int numViews, int idx) (package private) void
reinitIndexedFSs
(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr, IntFunction<TOP> getSofaFromAddr) (package private) int
reinitIndexedFSsSofas
(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr) Called when serializing a cas, or deserializing a delta CAS, if not saved in that case from a previous binary serialization (in that case, the scan is done as if it is doing a non-delta serialization).private void
setFeatOrDefer
(int heapIndex, FeatureImpl feat, List<Runnable> fixups4forwardFsRefs, Consumer<TOP> setter, Int2ObjHashMap<TOP, TOP> addr2fs) (package private) void
setupCasFromCasMgrSerializer
(CASMgrSerializer casMgrSerializer) (package private) int
updateAuxArrayMods
(CommonSerDes.Reading r, Int2ObjHashMap<TOP, TOP> auxAddr2fsa, Consumer_T_int_withIOException<TOP> setter) Called 3 times to process non-compressed binary deserialization of aux array modifications - once for byte/boolean, short, and long/doubleprivate void
updateHeapSlot
(BinaryCasSerDes.BinDeserSupport bds, int slotAddr, int slotValue, Int2ObjHashMap<TOP, TOP> addr2fs) Doing updates for delta cas for existing objects.private boolean
updateStringFeature
(TOP fs, FeatureImpl feat, String s, List<Runnable> fixups4forwardFsRefs)
-
Field Details
-
TRACE_DESER
private static final boolean TRACE_DESER- See Also:
-
SOFA_IN_NORMAL_ORDER
private static final boolean SOFA_IN_NORMAL_ORDER- See Also:
-
SOFA_AHEAD_OF_NORMAL_ORDER
private static final boolean SOFA_AHEAD_OF_NORMAL_ORDER- See Also:
-
arrayLengthFeatOffset
private static final int arrayLengthFeatOffsetThe offset for the array length cell. An array consists of length+2 number of cells, where the first cell contains the type, the second one the length, and the rest the actual content of the array.- See Also:
-
arrayContentOffset
private static final int arrayContentOffsetThe number of cells we need to skip to get to the array contents. That is, if we have an array starting at addr, the first cell is at addr+arrayContentOffset.- See Also:
-
baseCas
-
tsi
-
heap
Heap heap -
byteHeap
ByteHeap byteHeap -
shortHeap
ShortHeap shortHeap -
longHeap
LongHeap longHeap -
stringHeap
StringHeap stringHeap -
nextHeapAddrAfterMark
int nextHeapAddrAfterMarkThese next are for delta (de)serialization, and identify the first slot in the aux or string tables for new FS data when there's a mark set. These values are read by CASSerializer when doing delta serialization, and set at the end of a matching binary deserialization. When serializing a delta, the heaps used are storing just the delta, so any numbers for offsets they yield are adjusted by adding these, so that when the delta is deserialized (and these augment the existing heaps), the references are correct with respect to the deserialized heap model. -
nextStringHeapAddrAfterMark
int nextStringHeapAddrAfterMark -
nextByteHeapAddrAfterMark
int nextByteHeapAddrAfterMark -
nextShortHeapAddrAfterMark
int nextShortHeapAddrAfterMark -
nextLongHeapAddrAfterMark
int nextLongHeapAddrAfterMark -
byteAuxAddr2fsa
Map from an aux addr starting address for an array of boolean/byte/short/long/double to the V3 FS. key = simulated starting address in aux heap for the array value = FS having that array When deserializing a modification, used to find the v3 FS and the offset in the array to modify. created when serializing (in case receive delta deser back). created when delta deserializing if not available from previous serialization. updated when delta deserializing. reset at end of delta deserializings because multiple mods not supported -
shortAuxAddr2fsa
-
longAuxAddr2fsa
-
isBeforeV3
boolean isBeforeV3used to calculate total heap size
-
-
Constructor Details
-
BinaryCasSerDes
-
-
Method Details
-
reinit
Deserializer for Java-object serialized instance of CASSerializer.- Parameters:
ser
- - The instance to convert back to a CAS
-
reinit
void reinit(int[] heapMetadata, int[] heapArray, String[] stringTable, int[] fsIndex, byte[] byteHeapArray, short[] shortHeapArray, long[] longHeapArray) This is for deserializing (never delta) from a serialized java object representation or maybe from the JNI bridge both callers do a cas reset of some kind- Parameters:
heapMetadata
- -heapArray
- -stringTable
- -fsIndex
- -byteHeapArray
- -shortHeapArray
- -longHeapArray
- -
-
setupCasFromCasMgrSerializer
-
reinit
Deserializer for CASCompleteSerializer instances - includes type system and index definitions Never delta- Parameters:
casCompSer
- -
-
reinit
see Blob Format in CASSerializer This reads in and deserializes CAS data from a stream. Byte swapping may be needed if the blob is from C++ -- C++ blob serialization writes data in native byte order. Supports delta deserialization. For that, the the csds from the serialization event must be used.- Parameters:
istream
- -- Returns:
- - the format of the input stream detected
- Throws:
CASRuntimeException
- wraps IOException
-
reinit
public SerialFormat reinit(CommonSerDes.Header h, InputStream istream, CASMgrSerializer casMgrSerializer, CasLoadMode casLoadMode, BinaryCasSerDes6 f6, AllowPreexistingFS allowPreexistingFS, TypeSystemImpl ts) throws CASRuntimeException Deserialize a binary input stream, after reading the header, and optionally an externally provided type system and index spec used in compressed form 6 serialization previously This reads in and deserializes CAS data from a stream. Byte swapping may be needed if the blob is from C++ -- C++ blob serialization writes data in native byte order. The corresponding serialization code is in org.apache.uima.cas.impl.Serialization, also see CasIOUtils- Parameters:
h
- -istream
- -casMgrSerializer
- null or the Java object representing the externally supplied type and maybe indexes definition (TSI)casLoadMode
- DEFAULT or REINIT. REINIT required with compressed form 6 to reinitialize the cas's type system and index definition, for form 6.f6
- only used for form 6 where an instance of BinaryCasSerDes6 has been initializedallowPreexistingFS
- only used for form 6 delta deserializationts
- the type system- Returns:
- the format that was deserialized
- Throws:
CASRuntimeException
- wraps IOException
-
maybeReadEmbeddedTSI
-
binaryDeserialization
build a model of the heap, string and aux heaps. For delta deserialization, this is presumed to be in response to a previous serialization for delta - these can be just for the new ones read into these recreate / update V3 feature structures from this data delta CAS supported use case: CAS(1) -> binary serialize -> binary deserialize -> CAS(2). CAS(2) has mark set (before any new activity in deserialized CAS) CAS(2) has updates - new FSs, and mods to existing ones CAS(2) -> delta binary ser -> delta binary deser -> CAS(1). V3 supports the above scenario by retaining some information in CAS(2) at the end of the initial deserialization, including the model heap size/cellsUsed. - this is needed to properly do a compatible-with-v2 delta serialization. delta CAS edge use cases not supported: serialize (not binary), then receive delta binary serialization Both v2 and v3 assume that the delta mark is set immediately after binary deserialization; otherwise, subsequent binary deserialization of the delta will fail. This method assumes a previous binary serialization was done, and the following data structures are still valid (i.e. no CAS altering operations have been done) (these are reset: heap, stringHeap, byteHeap, shortHeap, longHeap) csds, [string/byte/short/long]auxAddr2fs (for array mods) nextHeapAddrAfterMark, next[string/byte/short/long]HeapAddrAfterMark- Parameters:
h
- the Header (read by the caller)- Returns:
- the format of the incoming serialized data
-
setHeapExtents
void setHeapExtents() -
updateAuxArrayMods
int updateAuxArrayMods(CommonSerDes.Reading r, Int2ObjHashMap<TOP, TOP> auxAddr2fsa, Consumer_T_int_withIOException<TOP> setter) throws IOExceptionCalled 3 times to process non-compressed binary deserialization of aux array modifications - once for byte/boolean, short, and long/double- Returns:
- heapsz (used by caller to do word alignment)
- Throws:
IOException
-
reinitIndexedFSs
This routine is used by several of the deserializers. Each one may have a different way to go from the addr to the fs e.g. Compressed form 6: fsStartIndexes.getSrcFsFromTgtSeq(...) plain binary: addr2fs.get(...) gets number of views, number of sofas, For all sofas, adds them to the index repo in the base index registers the sofa insures initial view created for all views: does the view action and updates the documentannotation- Parameters:
fsIndex
- - array of fsRefs and counts, for sofas, and all viewsisDeltaMods
- - true for calls which are for delta mods - these have adds/removes
-
reinitIndexedFSs
void reinitIndexedFSs(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr, IntFunction<TOP> getSofaFromAddr) -
reinitIndexedFSsSofas
-
reinitIndexedFSs
void reinitIndexedFSs(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr, int numViews, int idx) -
reinitDeltaIndexedFSsInner
void reinitDeltaIndexedFSsInner(FSIndexRepositoryImpl ir, int[] fsindexes, int idx, int length, boolean isAdd, IntFunction<TOP> getFsFromAddr) Given a list of FSs and a starting index and length: iterate over the FSs, and add or remove that from the indexes.- Parameters:
ir
- index repositorylength
- the lengthisAdd
- true to add, false to removefss
- the list having the fssfsIdx
- the starting index
-
getSortedArrayAddrsIndex
private int getSortedArrayAddrsIndex(int[] sortedArrayAddrs, int auxAddr, int sortedArrayAddrsIndex) given an aux address representing an element of an array, find the start of the array Fast path for the same as before array. binary search of subsequent ones (the addresses in the serializations are not sorted.)- Parameters:
auxAddr
- the address being updatedsortedStarts
- the sorted array of start addressescurrentStart
- the last value found for fast path- Returns:
- index into the sortedStarts
-
getIndexedFSs
Serialization support * -
addIdsToIntVector
-
addIdsToIntVector
-
getDeltaIndexedFSs
-
createStringTableFromArray
-
getFsSpaceReq
-
scanAllFSsForBinarySerialization
Called when serializing a cas, or deserializing a delta CAS, if not saved in that case from a previous binary serialization (in that case, the scan is done as if it is doing a non-delta serialization). Initialize the serialization model for binary serialization in CASSerializer from a CAS Do 2 scans, each by walking all the reachable FSs - The first one processes all fs (including for delta, those below the line) -- computes the fs to addr map and its inverse, based on the size of each FS. -- done by CommonSerDesSequential class's "setup" method - The second one computes the values of the main and aux heaps and string heaps except for delta mods -- for delta, the heaps only have "new" values that binary serialization will write out as arrays --- mods are computed from FsChange info and added to the appropriate heaps, later - for byte/short/long/string array use, compute auxAddr2fsa maps. This is used when deserializing delta mod info, to locate the fs to update For delta serialization, the heaps are populated only with the new values. - Values "nextXXHeapAddrAfterMark" are added to main heap refs to aux heaps and to string tables, so they are correct after deserialization does delta deserialization and adds the aux heap and string heap info to the existing heaps. This is also done for the main heap refs, so that refs to existing FSs below the line and above the line are treated uniformly. The results must be retained for the use case of subsequently receiving back a delta cas.- Parameters:
mark
- null or the mark to use for separating the new from from the previously existing used by delta cas.cs
- the CASSerializer instance used to record the results of the scan- Returns:
- null or for delta, all the found FSs
-
extractFsToV2Heaps
called in fs._id order to populate heaps from all FSs. For delta cas, only called for new above-the-line FSs- Parameters:
fs
- Feature Structure to use to set heapsisMarkSet
- true if mark is set, used to compute first
-
createFSsFromHeaps
Given the deserialized main heap, byte heap, short heap, long heap and string heap, a) create the corresponding FSs, populating a b) addr2fs map, key = fsAddr, value = FS c) auxAddr2fs map, key = aux Array Start addr, value = FS corresponding to that primitive bool/byte/short/long/double array For some use cases, the byte / short / long heaps have not yet been initialized. - when data is available, deserialization will update the values in the fs directly Each new fs created augments the addr2fs map. - forward fs refs are put into deferred update list deferModFs Each new fs created which is a Boolean/Byte/Short/Long/Double array updates auxAddr2fsa map if the aux data is not available (update is put on deferred list). deferModByte deferModShort deferModLong Each new fs created which has a slot referencing a long/double not yet read in creates a deferred update specifying the fs, the slot, indexed by the addr in the aux table. see deferModStr deferModLong deferModDouble Notes: Subtypes of AnnotationBase created in the right view DocumentAnnotation - update out-of-indexes FSs not subtypes of AnnotationBase are **all** associated with the initial view. Delta serialization: this routine adds just the new (above-the-line) FSs, and augments existing addr2fs and auxAddr2fsa -
setFeatOrDefer
private void setFeatOrDefer(int heapIndex, FeatureImpl feat, List<Runnable> fixups4forwardFsRefs, Consumer<TOP> setter, Int2ObjHashMap<TOP, TOP> addr2fs) -
heapFeat
-
getSofaFromAnnotBase
private Sofa getSofaFromAnnotBase(int annotBaseAddr, StringHeap stringHeap2, Int2ObjHashMap<TOP, TOP> addr2fs, CommonSerDesSequential csds) -
makeSofaFromHeap
private Sofa makeSofaFromHeap(int sofaAddr, StringHeap stringHeap2, CommonSerDesSequential csds, boolean isUnordered) -
updateHeapSlot
private void updateHeapSlot(BinaryCasSerDes.BinDeserSupport bds, int slotAddr, int slotValue, Int2ObjHashMap<TOP, TOP> addr2fs) Doing updates for delta cas for existing objects. Cases: - item in heap-stored-array = update the corresponding item in the FS - non-ref in feature slot - update the corresponding feature - ref (to long/double value, to string) -- these always reference entries in long/string tables that are new (above the line) -- these have already been deserialized - ref (to main heap) - can update this directly NOTE: entire aux arrays never have their refs to the aux heaps updated, for arrays of boolean, byte, short, long, double NOTE: Slot updates for FS refs always point to addr which are in the addr2fs table or are 0 (null), because if the ref is to a new one, those have been already deserialized by this point, and if the ref is to a below-the-line one, those are already put into the addr2fs table- Parameters:
bds
- - helper dataslotAddr
- - the main heap slot addr being updatedslotValue
- - the new value
-
updateStringFeature
private boolean updateStringFeature(TOP fs, FeatureImpl feat, String s, List<Runnable> fixups4forwardFsRefs) - Parameters:
fs
-feat
-s
-fixups4forwardFsRefs
-- Returns:
- true if caller needs to do an appropriate fs._setStringValue...
-
getCas
CASImpl getCas() -
clearDeltaOffsets
private void clearDeltaOffsets() -
clearAuxAddr2fsa
private void clearAuxAddr2fsa() -
clear
public void clear()called by cas reset
-