spark

Firstly, please refer to the conda workflow in python section.

pip install findspark in your conda virtual env.

spack load jdk and spack load spark. It is necessary to load jdk for the exsitence of JAVA_HOME, otherwise spark context cannot be created, see so.

And open jupyter notebook use jupyter notebook as usual.

import findspark
findspark.init("/home/ubuntu/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.4.0/spark-2.3.0-ovs6bpfx4hfqncvdjdsoiuw2aoxbkuvb")
import pyspark
## below is an demo example
import random
sc = pyspark.SparkContext(appName="Pi") ## create a spark context
num_samples = 100000000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)

Last updated