pyspark.sql.functions.tuple_intersection_theta_double#

pyspark.sql.functions.tuple_intersection_theta_double(col1, col2, mode=None)[source]#

Intersects a Datasketches TupleSketch with double summaries with a ThetaSketch.

New in version 4.2.0.

Parameters
col1Column or column name

The TupleSketch column with double summaries

col2Column or column name

The ThetaSketch column

modeColumn or str, optional

The summary mode: “sum” (default), “min”, “max”, or “alwaysone”

Returns
Column

The binary representation of the intersected TupleSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, 1.0, 1), (2, 2.0, 2), (3, 3.0, 4)], ["key1", "v1", "key2"])  # noqa
>>> df = df.agg(
...     sf.tuple_sketch_agg_double("key1", "v1").alias("sketch1"),
...     sf.theta_sketch_agg("key2").alias("sketch2")
... )
>>> df.select(sf.tuple_sketch_estimate_double(sf.tuple_intersection_theta_double(df.sketch1, "sketch2"))).show()  # noqa
+------------------------------------------------------------------------------------+
|tuple_sketch_estimate_double(tuple_intersection_theta_double(sketch1, sketch2, sum))|
+------------------------------------------------------------------------------------+
|                                                                                 2.0|
+------------------------------------------------------------------------------------+