pyspark.pandas.DataFrame.cummax#
- DataFrame.cummax(skipna=True)#
Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative maximum.
Note
the current implementation of cummax uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.
- Parameters
- skipna: boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- DataFrame or Series
See also
DataFrame.maxReturn the maximum over DataFrame axis.
DataFrame.cummaxReturn cumulative maximum over DataFrame axis.
DataFrame.cumminReturn cumulative minimum over DataFrame axis.
DataFrame.cumsumReturn cumulative sum over DataFrame axis.
DataFrame.cumprodReturn cumulative product over DataFrame axis.
Series.maxReturn the maximum over Series axis.
Series.cummaxReturn cumulative maximum over Series axis.
Series.cumminReturn cumulative minimum over Series axis.
Series.cumsumReturn cumulative sum over Series axis.
Series.cumprodReturn cumulative product over Series axis.
Examples
>>> df = ps.DataFrame([[2.0, 1.0], [3.0, None], [1.0, 0.0]], columns=list('AB')) >>> df A B 0 2.0 1.0 1 3.0 NaN 2 1.0 0.0
By default, iterates over rows and finds the maximum in each column.
>>> df.cummax() A B 0 2.0 1.0 1 3.0 NaN 2 3.0 1.0
It works identically in Series.
>>> df.B.cummax() 0 1.0 1 NaN 2 1.0 Name: B, dtype: float64