Download JDK8 from the official website and install it. We don’t need to make any configurations like in windows, instead we can use it immediately.
1 2 3 4 5 6 7
//test it java -version
//it succeeds java version "1.8.0_66" Java(TM) SE Runtime Environment (build 1.8.0_66-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
2. Intall Scala
Download Scala2.10.6 from the official website. Because we will use Spark1.5.1, the matching version of Scala is 2.10.*, we download the latest version of 2.10.*. Then we unzip it to the installing folders. Here I installed it in /usr/local/opt.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
cd /usr/local/opt sudo mkdir scala2.10.6 //create a new folder for scala
sudo tar -zxvf scala-2.10.6.tgz -C /usr/local/opt/scala2.10.6 //unzip scala to installing folder
//test it cd /usr/local/opt/scala2.10.6/scala-2.10.6 ./bin/scala
//it succeeds Welcome to Scala version 2.10.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66). Type in expressions to have them evaluated. Type :help for more information.
scala> //we can use Ctrl+D to exit the scala console
3. Install Spark
Download Spark1.5.1 from the official website. There are many versions. We choose Pre-built for hadoop 2.6 and later. Like installing Scala, we also unzip it to /usr/local/opt.
sudo tar -zxvf spark-1.5.1-bin-hadoop2.6.tgz -C /usr/local/opt/spark1.5.1
//test it cd /usr/local/opt/spark1.5.1/spark-1.5.1-bin-hadoop2.6 sudo ./bin/spark-shell
//it succeeds log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel("INFO") Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66) Type in expressions to have them evaluated. Type :help for more information. 15/11/02 15:55:50 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. Spark context available as sc. 15/11/02 15:55:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/11/02 15:55:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/11/02 15:55:57 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 15/11/02 15:55:57 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 15/11/02 15:55:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/02 15:55:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/11/02 15:55:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) SQL context available as sqlContext.
scala>
To make it easier to invoke Spark commands and Scala commands, we add the installing path to environment variables.
1 2 3 4 5 6 7 8 9
sudo nano /etc/profile
//add the content at the end of file export SCALA_HOME=/usr/local/opt/scala2.10.6/scala-2.10.6 export SPARK_HOME=/usr/local/opt/spark1.5.1/spark-1.5.1-bin-hadoop2.6 export CLASSPATH=.:${SCALA_HOME}/lib:${SPARK_HOME}/lib export PATH=${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PATH
source /etc/profile
Then we can use spark-shell or scala commands on the console at any path.
1 2 3 4
//The results will be same as before sudo spark-shell
scala
4. Install IntelliJ
Download IntelliJ14.1 from the official website and install. When we launch it for the first time, it will tell us to make some configurations, like choosing the show schema and etc. You’d better to choose to install the Scala plugin, otherwise we will do it later.
After installing it, we can create a new project to test.
Click Java, we need to set up Project SDK. The path is the JDK installing path. Then click next, don’t choose Groovy and Scala. You can write HelloWorld to test whether it succeeds.
From the picture above, in the left, we can see Scala option, click it. Then click Scala, we create a new scala project. Here still needs to set up Scala SDK. After setting up, you can write some scala codes to make testing.
5. Test Spark in IntelliJ
Create a Scala project as the instructions in 4. For using spark, we need to add the spark jar file into it. Click File->Project Structures->Libraries->+, add the spark-assembly-1.5.1-hadoop2.6.0.jar from the spark installing path.
Then we can run the spark examples. Create a new Scala file, paste the codes below in it.
/** Computes an approximation to pi */ object SparkPi { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark Pi") val spark = new SparkContext(conf) val slices = if (args.length > 0) args(0).toInt else 2 val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow val count = spark.parallelize(1 until n, slices).map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) spark.stop() } } // scalastyle:on println
Before running the code, we need to edit the running configurations, otherwise it will show errors.
In the VM Options, add the content below in it
1
-Dspark.master=local
Then run it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
... ... ... 15/11/02 16:37:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/11/02 16:37:06 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:40) finished in 0.330 s 15/11/02 16:37:06 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:40, took 0.878039 s Pi is roughly 3.1377 15/11/02 16:37:06 INFO SparkUI: Stopped Spark web UI at http://192.168.1.103:4041 15/11/02 16:37:06 INFO DAGScheduler: Stopping DAGScheduler 15/11/02 16:37:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/11/02 16:37:06 INFO MemoryStore: MemoryStore cleared 15/11/02 16:37:06 INFO BlockManager: BlockManager stopped 15/11/02 16:37:06 INFO BlockManagerMaster: BlockManagerMaster stopped 15/11/02 16:37:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/11/02 16:37:06 INFO SparkContext: Successfully stopped SparkContext 15/11/02 16:37:06 INFO ShutdownHookManager: Shutdown hook called 15/11/02 16:37:06 INFO ShutdownHookManager: Deleting directory /private/var/folders/kj/trxlbgrx6rjcv1rccwgxd5xm0000gn/T/spark-6a6b84a5-e3b3-4765-8c0d-6eb9275fc377
Process finished with exit code 0
You will see the results.
Now we have configured all the environments we need to develop Spark applications. Let’s start to program!!!