pyspark.pandas.DataFrame.aggregate¶
-
DataFrame.aggregate(func: Union[List[str], Dict[Union[Any, Tuple[Any, …]], List[str]]]) → pyspark.pandas.frame.DataFrame[source]¶ Aggregate using one or more operations over the specified axis.
- Parameters
- funcdict or a list
a dict mapping from column name (string) to aggregate functions (list of strings). If a list is given, the aggregation is performed against all columns.
- Returns
- DataFrame
See also
DataFrame.applyInvoke function on DataFrame.
DataFrame.transformOnly perform transforming type operations.
DataFrame.groupbyPerform operations over groups.
Series.aggregateThe equivalent function for Series.
Notes
agg is an alias for aggregate. Use the alias.
Examples
>>> df = ps.DataFrame([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9], ... [np.nan, np.nan, np.nan]], ... columns=['A', 'B', 'C'])
>>> df A B C 0 1.0 2.0 3.0 1 4.0 5.0 6.0 2 7.0 8.0 9.0 3 NaN NaN NaN
Aggregate these functions over the rows.
>>> df.agg(['sum', 'min'])[['A', 'B', 'C']].sort_index() A B C min 1.0 2.0 3.0 sum 12.0 15.0 18.0
Different aggregations per column.
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})[['A', 'B']].sort_index() A B max NaN 8.0 min 1.0 2.0 sum 12.0 NaN
For multi-index columns:
>>> df.columns = pd.MultiIndex.from_tuples([("X", "A"), ("X", "B"), ("Y", "C")]) >>> df.agg(['sum', 'min'])[[("X", "A"), ("X", "B"), ("Y", "C")]].sort_index() X Y A B C min 1.0 2.0 3.0 sum 12.0 15.0 18.0
>>> aggregated = df.agg({("X", "A") : ['sum', 'min'], ("X", "B") : ['min', 'max']}) >>> aggregated[[("X", "A"), ("X", "B")]].sort_index() X A B max NaN 8.0 min 1.0 2.0 sum 12.0 NaN