Package org.apache.spark.api.java
Class JavaRDD<T>
Object
org.apache.spark.api.java.JavaRDD<T>
- All Implemented Interfaces:
Serializable
,JavaRDDLike<T,
JavaRDD<T>>
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncache()
Persist this RDD with the default storage level (MEMORY_ONLY
).scala.reflect.ClassTag<T>
classTag()
coalesce
(int numPartitions) Return a new RDD that is reduced intonumPartitions
partitions.coalesce
(int numPartitions, boolean shuffle) Return a new RDD that is reduced intonumPartitions
partitions.distinct()
Return a new RDD containing the distinct elements in this RDD.distinct
(int numPartitions) Return a new RDD containing the distinct elements in this RDD.Return a new RDD containing only the elements that satisfy a predicate.static <T> JavaRDD<T>
Get the ResourceProfile specified with this RDD or None if it wasn't specified.intersection
(JavaRDD<T> other) Return the intersection of this RDD and another one.persist
(StorageLevel newLevel) Set this RDD's storage level to persist its values across operations after the first time it is computed.randomSplit
(double[] weights) Randomly splits this RDD with the provided weights.randomSplit
(double[] weights, long seed) Randomly splits this RDD with the provided weights.rdd()
repartition
(int numPartitions) Return a new RDD that has exactly numPartitions partitions.sample
(boolean withReplacement, double fraction) Return a sampled subset of this RDD with a random seed.sample
(boolean withReplacement, double fraction, long seed) Return a sampled subset of this RDD, with a user-supplied seed.Assign a name to this RDDReturn this RDD sorted by the given key function.Return an RDD with the elements fromthis
that are not inother
.Return an RDD with the elements fromthis
that are not inother
.subtract
(JavaRDD<T> other, Partitioner p) Return an RDD with the elements fromthis
that are not inother
.static <T> RDD<T>
toString()
Return the union of this RDD and another one.Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.unpersist
(boolean blocking) Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.Specify a ResourceProfile to use when calculating this RDD.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.spark.api.java.JavaRDDLike
aggregate, cartesian, checkpoint, collect, collectAsync, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitioner, partitions, pipe, pipe, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toDebugString, toLocalIterator, top, top, treeAggregate, treeAggregate, treeAggregate, treeReduce, treeReduce, zip, zipPartitions, zipWithIndex, zipWithUniqueId
-
Constructor Details
-
JavaRDD
-
-
Method Details
-
fromRDD
-
toRDD
-
rdd
-
classTag
-
wrapRDD
-
cache
Persist this RDD with the default storage level (MEMORY_ONLY
).- Returns:
- (undocumented)
-
persist
Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet..- Parameters:
newLevel
- (undocumented)- Returns:
- (undocumented)
-
withResources
Specify a ResourceProfile to use when calculating this RDD. This is only supported on certain cluster managers and currently requires dynamic allocation to be enabled. It will result in new executors with the resources specified being acquired to calculate the RDD.- Parameters:
rp
- (undocumented)- Returns:
- (undocumented)
-
getResourceProfile
Get the ResourceProfile specified with this RDD or None if it wasn't specified.- Returns:
- the user specified ResourceProfile or null if none was specified
-
unpersist
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk. This method blocks until all blocks are deleted.- Returns:
- (undocumented)
-
unpersist
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.- Parameters:
blocking
- Whether to block until all blocks are deleted.- Returns:
- (undocumented)
-
distinct
Return a new RDD containing the distinct elements in this RDD.- Returns:
- (undocumented)
-
distinct
Return a new RDD containing the distinct elements in this RDD.- Parameters:
numPartitions
- (undocumented)- Returns:
- (undocumented)
-
filter
Return a new RDD containing only the elements that satisfy a predicate.- Parameters:
f
- (undocumented)- Returns:
- (undocumented)
-
coalesce
Return a new RDD that is reduced intonumPartitions
partitions.- Parameters:
numPartitions
- (undocumented)- Returns:
- (undocumented)
-
coalesce
Return a new RDD that is reduced intonumPartitions
partitions.- Parameters:
numPartitions
- (undocumented)shuffle
- (undocumented)- Returns:
- (undocumented)
-
repartition
Return a new RDD that has exactly numPartitions partitions.Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using
coalesce
, which can avoid performing a shuffle.- Parameters:
numPartitions
- (undocumented)- Returns:
- (undocumented)
-
sample
Return a sampled subset of this RDD with a random seed.- Parameters:
withReplacement
- can elements be sampled multiple times (replaced when sampled out)fraction
- expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be greater than or equal to 0- Returns:
- (undocumented)
- Note:
- This is NOT guaranteed to provide exactly the fraction of the count
of the given
RDD
.
-
sample
Return a sampled subset of this RDD, with a user-supplied seed.- Parameters:
withReplacement
- can elements be sampled multiple times (replaced when sampled out)fraction
- expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be greater than or equal to 0seed
- seed for the random number generator- Returns:
- (undocumented)
- Note:
- This is NOT guaranteed to provide exactly the fraction of the count
of the given
RDD
.
-
randomSplit
Randomly splits this RDD with the provided weights.- Parameters:
weights
- weights for splits, will be normalized if they don't sum to 1- Returns:
- split RDDs in an array
-
randomSplit
Randomly splits this RDD with the provided weights.- Parameters:
weights
- weights for splits, will be normalized if they don't sum to 1seed
- random seed- Returns:
- split RDDs in an array
-
union
Return the union of this RDD and another one. Any identical elements will appear multiple times (use.distinct()
to eliminate them).- Parameters:
other
- (undocumented)- Returns:
- (undocumented)
-
intersection
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.- Parameters:
other
- (undocumented)- Returns:
- (undocumented)
- Note:
- This method performs a shuffle internally.
-
subtract
Return an RDD with the elements fromthis
that are not inother
.Uses
this
partitioner/partition size, because even ifother
is huge, the resulting RDD will be less than or equal to us.- Parameters:
other
- (undocumented)- Returns:
- (undocumented)
-
subtract
Return an RDD with the elements fromthis
that are not inother
.- Parameters:
other
- (undocumented)numPartitions
- (undocumented)- Returns:
- (undocumented)
-
subtract
Return an RDD with the elements fromthis
that are not inother
.- Parameters:
other
- (undocumented)p
- (undocumented)- Returns:
- (undocumented)
-
toString
-
setName
Assign a name to this RDD -
sortBy
Return this RDD sorted by the given key function.- Parameters:
f
- (undocumented)ascending
- (undocumented)numPartitions
- (undocumented)- Returns:
- (undocumented)
-