pyspark.sql.functions.rand#

pyspark.sql.functions.rand(seed=None)[source]#

Generates a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

seedint, optional: Seed value for the random generator.

Returns

Column: A column of random values.

See also

pyspark.sql.functions.randn()
pyspark.sql.functions.randstr()
pyspark.sql.functions.uniform()

Notes

The function is non-deterministic in general case.

Examples

Example 1: Generate a random column without a seed

>>> from pyspark.sql import functions as sf
>>> spark.range(0, 2, 1, 1).select("*", sf.rand()).show() 
+---+-------------------------+
| id|rand(-158884697681280011)|
+---+-------------------------+
|  0|       0.9253464547887...|
|  1|       0.6533254118758...|
+---+-------------------------+

Example 2: Generate a random column with a specific seed

>>> spark.range(0, 2, 1, 1).select("*", sf.rand(seed=42)).show()
+---+------------------+
| id|          rand(42)|
+---+------------------+
|  0| 0.619189370225...|
|  1|0.5096018842446...|
+---+------------------+