Spark1.5.1+Scala2.10.6+IntelliJ14.1 environment setup

The whole process:

1. Intall JDK8

Download JDK8 from the official website and install it. We don’t need to make any configurations like in windows, instead we can use it immediately.

1
2
3
4
5
6
7
//test it
java -version

//it succeeds
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

2. Intall Scala

Download Scala2.10.6 from the official website. Because we will use Spark1.5.1, the matching version of Scala is 2.10.*, we download the latest version of 2.10.*. Then we unzip it to the installing folders. Here I installed it in /usr/local/opt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cd /usr/local/opt
sudo mkdir scala2.10.6 //create a new folder for scala

sudo tar -zxvf scala-2.10.6.tgz -C /usr/local/opt/scala2.10.6 //unzip scala to installing folder

//test it
cd /usr/local/opt/scala2.10.6/scala-2.10.6
./bin/scala

//it succeeds
Welcome to Scala version 2.10.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66).
Type in expressions to have them evaluated.
Type :help for more information.

scala>
//we can use Ctrl+D to exit the scala console

3. Install Spark

Download Spark1.5.1 from the official website. There are many versions. We choose Pre-built for hadoop 2.6 and later. Like installing Scala, we also unzip it to /usr/local/opt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
cd /usr/local/opt
sudo mkdir spark1.5.1

sudo tar -zxvf spark-1.5.1-bin-hadoop2.6.tgz -C /usr/local/opt/spark1.5.1

//test it
cd /usr/local/opt/spark1.5.1/spark-1.5.1-bin-hadoop2.6
sudo ./bin/spark-shell

//it succeeds
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 1.5.1
/_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
Type :help for more information.
15/11/02 15:55:50 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.
15/11/02 15:55:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/02 15:55:53 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/02 15:55:57 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/11/02 15:55:57 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/11/02 15:55:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/02 15:55:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/02 15:55:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.

scala>

To make it easier to invoke Spark commands and Scala commands, we add the installing path to environment variables.

1
2
3
4
5
6
7
8
9
sudo nano /etc/profile

//add the content at the end of file
export SCALA_HOME=/usr/local/opt/scala2.10.6/scala-2.10.6

export SPARK_HOME=/usr/local/opt/spark1.5.1/spark-1.5.1-bin-hadoop2.6
export CLASSPATH=.:${SCALA_HOME}/lib:${SPARK_HOME}/lib
export PATH=${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PATH

source /etc/profile

Then we can use spark-shell or scala commands on the console at any path.

1
2
3
4
//The results will be same as before
sudo spark-shell

scala

4. Install IntelliJ

Download IntelliJ14.1 from the official website and install. When we launch it for the first time, it will tell us to make some configurations, like choosing the show schema and etc. You’d better to choose to install the Scala plugin, otherwise we will do it later.

After installing it, we can create a new project to test.

1

Click Java, we need to set up Project SDK. The path is the JDK installing path. Then click next, don’t choose Groovy and Scala. You can write HelloWorld to test whether it succeeds.

From the picture above, in the left, we can see Scala option, click it. Then click Scala, we create a new scala project. Here still needs to set up Scala SDK. After setting up, you can write some scala codes to make testing.

5. Test Spark in IntelliJ

Create a Scala project as the instructions in 4. For using spark, we need to add the spark jar file into it. Click File->Project Structures->Libraries->+, add the spark-assembly-1.5.1-hadoop2.6.0.jar from the spark installing path.

2

Then we can run the spark examples. Create a new Scala file, paste the codes below in it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// scalastyle:off println
package org.apache.spark.examples

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
// scalastyle:on println

Before running the code, we need to edit the running configurations, otherwise it will show errors.

3

In the VM Options, add the content below in it

1
-Dspark.master=local

Then run it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
...
...
...
15/11/02 16:37:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/11/02 16:37:06 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:40) finished in 0.330 s
15/11/02 16:37:06 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:40, took 0.878039 s
Pi is roughly 3.1377
15/11/02 16:37:06 INFO SparkUI: Stopped Spark web UI at http://192.168.1.103:4041
15/11/02 16:37:06 INFO DAGScheduler: Stopping DAGScheduler
15/11/02 16:37:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/11/02 16:37:06 INFO MemoryStore: MemoryStore cleared
15/11/02 16:37:06 INFO BlockManager: BlockManager stopped
15/11/02 16:37:06 INFO BlockManagerMaster: BlockManagerMaster stopped
15/11/02 16:37:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/11/02 16:37:06 INFO SparkContext: Successfully stopped SparkContext
15/11/02 16:37:06 INFO ShutdownHookManager: Shutdown hook called
15/11/02 16:37:06 INFO ShutdownHookManager: Deleting directory /private/var/folders/kj/trxlbgrx6rjcv1rccwgxd5xm0000gn/T/spark-6a6b84a5-e3b3-4765-8c0d-6eb9275fc377

Process finished with exit code 0

You will see the results.

Now we have configured all the environments we need to develop Spark applications. Let’s start to program!!!

ubuntu 14.04 64bit install jdk

Setup environment

operating system: ubuntu 14.04 64bit

Download jdk

Download the latest version of jdk from the official website.

Unzip jdk

Usually, we will create a new folder to hold the unzipped jdk file.

1
sudo mkdir /usr/lib/jvm

Then we unzip jdk we download and copy them to jvm folder

1
sudo tar zxvf jdk-8u60-linux-x64.tar.gz -C /usr/lib/jvm

Next, we need to configure environment variables.

1
2
3
4
5
6
7
8
9
10
11
//open the user environment variable file
sudo gedit ~/.bashrc


//add the content at the end of ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

//make the file in effect
source ~/.bashrc

Then we can make test.

1
java -version

If succeed, the results will be look like as follows:

1
2
3
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

mac+github+hexo--搭建个人博客

1. 创建github账户

根据创建github技术博客全攻略创建github账号。

2. 安装hexo

安装hexo需要安装nodejs。这里采用homebrew方式安装。所以安装顺序为homebrew-> nodejs -> hexo -> git

2.1 安装homebrew

homebrew安装起来相当简单,在shell中输入以下命令:

1
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

2.2 安装nodejs

用homebrew安装,一句话搞定

1
brew install node

2.3 安装git

依然采用homebrew安装

1
2
3
4
5
6
brew install git


//配置git
git config --global user.name "***"
git config --global user.email "***"

2.4 配置ssh

1
ssh-keygen -t rsa -C "***@***.com"

将新生成的ssh key添加到github上

1
登录github->Account Settings -> SSH Public Keys -> Add another key: 将id_rsa.pub文件的内容拷贝到输入框中

测试连接

1
ssh git@github.com

测试nodejs和git是否安装成功

1
2
3
4
5
6
git --version
//git version 2.6.2
node -v
//v4.2.1
npm -v
//2.14.7

如果出现版本号表示安装成功

2.4 安装hexo

1
npm install -g hexo-cli

接下来简历个人博客目录。进入到用户主目录,在用户主目录或者子目录中建立博客目录。最好不要在系统根目录中建立,权限管理比较麻烦。

1
2
3
4
5
~ user$ mkdir blog
~ user$ hexo init blog
~ user$ cd blog
//安装必要的插件
~ user$ npm install

进入到blog文件夹,可以发现该文件夹下有很多文件,其中_config.yml文件是博客页面的配置文件,source文件夹用来放我们的页面.md文件, themes文件夹用来放博客的主题,可以从网上下载markdown博客主题

2.4.1 配置_config.yml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
//我的配置文件
# Hexo Configuration
## Docs: http://hexo.io/docs/configuration.html
## Source: https://github.com/hexojs/hexo/

//这里需要做些更改,注意冒号之后有空格,否则会出错
# Site
title: Ping Li's Blog
subtitle: Love the Life You Live
description: Life Record
author: Ping Li
language: zh-cn
timezone:

# URL
## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: http://yoursite.com
root: /
permalink: :year/:month/:day/:title/
permalink_defaults:

# Directory
source_dir: source
public_dir: public
tag_dir: tags
archive_dir: archives
category_dir: categories
code_dir: downloads/code
i18n_dir: :lang
skip_render:

# Writing
new_post_name: :title.md # File name of new posts
default_layout: post
titlecase: false # Transform title into titlecase
external_link: true # Open external links in new tab
filename_case: 0
render_drafts: false
post_asset_folder: false
relative_link: false
future: true
highlight:
enable: true
line_number: true
auto_detect: true
tab_replace:

# Category & Tag
default_category: uncategorized
category_map:
tag_map:

# Date / Time format
## Hexo uses Moment.js to parse and display date
## You can customize the date format as defined in
## http://momentjs.com/docs/#/displaying/format/
date_format: YYYY-MM-DD
time_format: HH:mm:ss

# Pagination
## Set per_page to 0 to disable pagination
per_page: 10
pagination_dir: page

//用来更改博客主题
# Extensions
## Plugins: http://hexo.io/plugins/
## Themes: http://hexo.io/themes/
theme: yilia

// 配置发布形式,发布到github上
# Deployment
## Docs: http://hexo.io/docs/deployment.html
deploy:
type: git
repository: git@github.com:PingLinju/PingLinju.github.io.git
branch: master

2.4.2 切换主题

可以在网上找到很多主题,通过git下载下来,然后将_config.yml配置文件中的theme字段改成下载的主题即可。

1
git clone https://github.com/wuchong/jacman themes/jacman

//修改_config.yml文件中的theme字段
theme: jacman

//每次做完修改都要讲修改同步到github上
hexo g
hexo d

可以进入到主题的_config.yml文件对主题进行配置

2.4.3 编写博文,发布到github上

1
2
//该命令会在blog/source/_posts文件夹中生成一个.md文件,该文件既是我们的博文
hexo new "New Post"

进入到blog/source/_posts文件夹下,用markdown编辑器编辑新生成的.md文件,即撰写我们的博文。撰写完之后就可以发布到github上

1
npm install hexo-deployer-git --save
//部署到github上
hexo g
hexo d

完成之后,就可以在浏览器中输入用户名.github.io即可查看博文。