| repartition {SparkR} | R Documentation |
The following options for repartition are possible:
1. Return a new SparkDataFrame that has exactly numPartitions.
2. Return a new SparkDataFrame hash partitioned by
the given columns into numPartitions.
3. Return a new SparkDataFrame hash partitioned by the given column(s),
using spark.sql.shuffle.partitions as number of partitions.
repartition(x, ...) ## S4 method for signature 'SparkDataFrame' repartition(x, numPartitions = NULL, col = NULL, ...)
x |
a SparkDataFrame. |
... |
additional column(s) to be used in the partitioning. |
numPartitions |
the number of partitions to use. |
col |
the column by which the partitioning will be performed. |
repartition since 1.4.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, arrange,
as.data.frame,
attach,SparkDataFrame-method,
cache, checkpoint,
coalesce, collect,
colnames, coltypes,
createOrReplaceTempView,
crossJoin, dapplyCollect,
dapply, describe,
dim, distinct,
dropDuplicates, dropna,
drop, dtypes,
except, explain,
filter, first,
gapplyCollect, gapply,
getNumPartitions, group_by,
head, hint,
histogram, insertInto,
intersect, isLocal,
isStreaming, join,
limit, merge,
mutate, ncol,
nrow, persist,
printSchema, randomSplit,
rbind, registerTempTable,
rename, sample,
saveAsTable, schema,
selectExpr, select,
showDF, show,
storageLevel, str,
subset, take,
toJSON, union,
unpersist, withColumn,
with, write.df,
write.jdbc, write.json,
write.orc, write.parquet,
write.stream, write.text
## Not run:
##D sparkR.session()
##D path <- "path/to/file.json"
##D df <- read.json(path)
##D newDF <- repartition(df, 2L)
##D newDF <- repartition(df, numPartitions = 2L)
##D newDF <- repartition(df, col = df$"col1", df$"col2")
##D newDF <- repartition(df, 3L, col = df$"col1", df$"col2")
## End(Not run)