Spark1.5.1+Scala2.10.6+IntelliJ14.1 environment setup

发布于 2015-11-02 更新于 2016-06-04 spark1.5.1 scala2.10.6 intellij14.1 mac

The whole process:

Operating system is Mac OS X
Download JDK8 and install
Download Scala2.10.6 and install
Download Spark1.5.1 and install
Download IntelliJ14.1 and install

1. Intall JDK8

Download JDK8 from the official website and install it. We don’t need to make any configurations like in windows, instead we can use it immediately.

//test it
java -version

//it succeeds
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

2. Intall Scala

Download Scala2.10.6 from the official website. Because we will use Spark1.5.1, the matching version of Scala is 2.10.*, we download the latest version of 2.10.*. Then we unzip it to the installing folders. Here I installed it in /usr/local/opt.

cd /usr/local/opt
sudo mkdir scala2.10.6  //create a new folder for scala

sudo tar -zxvf scala-2.10.6.tgz -C /usr/local/opt/scala2.10.6  //unzip scala to installing folder

//test it
cd /usr/local/opt/scala2.10.6/scala-2.10.6
./bin/scala

//it succeeds
Welcome to Scala version 2.10.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66).
Type in expressions to have them evaluated.
Type :help for more information.

scala> 
//we can use Ctrl+D to exit the scala console

3. Install Spark

Download Spark1.5.1 from the official website. There are many versions. We choose Pre-built for hadoop 2.6 and later. Like installing Scala, we also unzip it to /usr/local/opt.

cd /usr/local/opt
sudo mkdir spark1.5.1

sudo tar -zxvf spark-1.5.1-bin-hadoop2.6.tgz -C /usr/local/opt/spark1.5.1

//test it
cd /usr/local/opt/spark1.5.1/spark-1.5.1-bin-hadoop2.6
sudo ./bin/spark-shell

//it succeeds
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
Type :help for more information.
15/11/02 15:55:50 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.
15/11/02 15:55:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/02 15:55:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/02 15:55:57 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/11/02 15:55:57 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/11/02 15:55:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/02 15:55:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/02 15:55:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.

scala>

To make it easier to invoke Spark commands and Scala commands, we add the installing path to environment variables.

sudo nano /etc/profile

//add the content at the end of file
export SCALA_HOME=/usr/local/opt/scala2.10.6/scala-2.10.6
export SPARK_HOME=/usr/local/opt/spark1.5.1/spark-1.5.1-bin-hadoop2.6
export CLASSPATH=.:${SCALA_HOME}/lib:${SPARK_HOME}/lib
export PATH=${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PATH

source /etc/profile

Then we can use spark-shell or scala commands on the console at any path.

//The results will be same as before
sudo spark-shell

scala

4. Install IntelliJ

Download IntelliJ14.1 from the official website and install. When we launch it for the first time, it will tell us to make some configurations, like choosing the show schema and etc. You’d better to choose to install the Scala plugin, otherwise we will do it later.

After installing it, we can create a new project to test.

Click Java, we need to set up Project SDK. The path is the JDK installing path. Then click next, don’t choose Groovy and Scala. You can write HelloWorld to test whether it succeeds.

From the picture above, in the left, we can see Scala option, click it. Then click Scala, we create a new scala project. Here still needs to set up Scala SDK. After setting up, you can write some scala codes to make testing.

5. Test Spark in IntelliJ

Create a Scala project as the instructions in 4. For using spark, we need to add the spark jar file into it. Click File->Project Structures->Libraries->+, add the spark-assembly-1.5.1-hadoop2.6.0.jar from the spark installing path.

Then we can run the spark examples. Create a new Scala file, paste the codes below in it.

// scalastyle:off println
package org.apache.spark.examples

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    val count = spark.parallelize(1 until n, slices).map { i =>
        val x = random * 2 - 1
        val y = random * 2 - 1
        if (x*x + y*y < 1) 1 else 0
      }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}
// scalastyle:on println

Before running the code, we need to edit the running configurations, otherwise it will show errors.

In the VM Options, add the content below in it

1	-Dspark.master=local

Then run it.

...
...
...
15/11/02 16:37:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/11/02 16:37:06 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:40) finished in 0.330 s
15/11/02 16:37:06 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:40, took 0.878039 s
Pi is roughly 3.1377
15/11/02 16:37:06 INFO SparkUI: Stopped Spark web UI at http://192.168.1.103:4041
15/11/02 16:37:06 INFO DAGScheduler: Stopping DAGScheduler
15/11/02 16:37:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/11/02 16:37:06 INFO MemoryStore: MemoryStore cleared
15/11/02 16:37:06 INFO BlockManager: BlockManager stopped
15/11/02 16:37:06 INFO BlockManagerMaster: BlockManagerMaster stopped
15/11/02 16:37:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/11/02 16:37:06 INFO SparkContext: Successfully stopped SparkContext
15/11/02 16:37:06 INFO ShutdownHookManager: Shutdown hook called
15/11/02 16:37:06 INFO ShutdownHookManager: Deleting directory /private/var/folders/kj/trxlbgrx6rjcv1rccwgxd5xm0000gn/T/spark-6a6b84a5-e3b3-4765-8c0d-6eb9275fc377

Process finished with exit code 0

You will see the results.

Now we have configured all the environments we need to develop Spark applications. Let’s start to program!!!

mac+github+hexo--搭建个人博客
十月 29日, 2015

1. 创建github账户根据创建github技术博客全攻略创建github账号。 2. 安装hexo安装hexo需要安装nodejs。这里采用homebrew方式安装。所以安装顺序为homebrew-> nodejs ->...
T Bill vs T Notes vs T Bonds
六月 22日, 2019

T 指的是 Treasury, 国债，债券发行机构为美国中央政府(federal government)，为了解决政府日常运营所需资金的问题，例如遇到财政赤字时，不仅可以通过增加税收的方式，也可以通过发债的方式解决政府资金短缺。美...
加德满都/巴德岗4日游--最幸福的地区
六月 16日, 2019

时间：20190605-20190608；从广州乘坐飞机直达加德满都，4个小时。飞机上可以远远看到珠峰加德满都海拔1340米左右，比北京晚2小时15分 D1: 加德满都泰米尔到达加德满都机场(标记1)为当地时间11点左右。订的酒...
六月 16日, 2019

I graduated from Nanjing Normal University, majoring in computer science and technology. Now I am applying for Master...
身份信息泄露，南京警方让回户籍所在地报警，户籍所在地警方让去支付宝所在的派出所报警，我到底该去哪里报警？？为什么求助那么困难？？
五月 3日, 2018

本人南京市溧水区石湫镇人，南京大学毕业，目前香港工作。前段时间遇到一件特别心塞的事情，身份信息被盗，总计盗款38000元左右，已报警，不立案。报警过程更加心塞。事情大概是这样的：身份被盗：我的身份证号码（未丢失）+银行卡号码（...
美西之行七 Bryce Canyon
四月 12日, 2017

美西之行七 Bryce Canyon今天前往Bryce Canyon。路上的视野很开阔，颜色也较之前变得小清新了一些。看到一辆RV车，载着一家随处转悠可爱的大叔主动配合照相马场小木屋到达Bryce Canyon 又开始浓...
美西之旅六 Arches Canyon
四月 11日, 2017

美西之旅六 Arches Canyon离开blanding之后，驱车前往Arches Canyon,中文名为拱石国家公园。沿途的颜色开始变成橙黄路上碰到一只被撞死的小鹿，身体还是温热的。这在西部是经常发生的事情晚上住在一个叫...
美西之旅五大峡谷到Blanding
十月 16日, 2016

美西之旅 Grand Canyon –> Blanding小镇离开Page之后，计划去羚羊谷，但是去羚羊谷要经过一片私人领地，造成景区价格偏高，朋友说他之前去过并没有网上照片拍的那么美，同时我们也要赶路，所以决定不再进入羚羊...
美西之旅四大峡谷
十月 16日, 2016

美西之旅大峡谷(Grand Canyon)进入国家公园第一站－－大峡谷。大峡谷模拟图我们直接坐蓝线到底，之后一站一个景点下来走大峡谷光影倒映在山谷，为其增添新色彩走走走，拍拍拍逛完南峡谷，准备驱车进入北峡谷。北峡谷入...
美西之旅三拉斯维加斯到大峡谷
九月 11日, 2016

美西之旅拉斯维加斯->大峡谷离开Vegas之后，我们驱车前往Grand Canyon大峡谷。Vegas外围的民宅，沙漠中的家园。每家每户门前都栽种着一颗绿树，即使再荒凉，心中依然绿树成荫。在去往大峡谷的中途会经过Hover...

Please check the comment setting in config.yml of hexo-theme-Annie!

0.0%