Guide to IASTU-HPC2
  • Introduction
  • User's Manual
    • basics
      • basic info
      • connecting
      • storage (must-read)
      • module system (must-read)
      • workflow on jobs
    • softwares
      • python
      • mathematica
      • matlab
      • spark
      • singularity
      • fortran, C, C++
      • java
      • command line tools
      • tensorflow
      • jax
    • FAQ
  • Administrator's Manual
    • Hardwares
    • Toolchains
      • Toolset for DELL servers
      • Ansible
      • Slurm
      • Spack
      • Container
      • BigData
      • Toolset for logging and monitoring
      • Further considerations
    • History
      • ToDo
      • VM Test
      • Real Setup
      • Admin Workflow
      • Softwares for scientific computations
      • Relay Host
Powered by GitBook
On this page

Was this helpful?

  1. User's Manual
  2. softwares

spark

PreviousmatlabNextsingularity

Last updated 5 years ago

Was this helpful?

Firstly, please refer to the conda workflow in python section.

pip install findspark in your conda virtual env.

spack load jdk and spack load spark. It is necessary to load jdk for the exsitence of JAVA_HOME, otherwise spark context cannot be created, see .

And open jupyter notebook use jupyter notebook as usual.

import findspark
findspark.init("/home/ubuntu/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.4.0/spark-2.3.0-ovs6bpfx4hfqncvdjdsoiuw2aoxbkuvb")
import pyspark
## below is an demo example
import random
sc = pyspark.SparkContext(appName="Pi") ## create a spark context
num_samples = 100000000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
so