只需四个库,1、首先需求搭建好hadoop+spark处境

【原创】Helenykwang 于2018-01-13 18:10:18编写

1、首先要求搭建好hadoop+spark情形,并保障服务符合规律。本文以wordcount为例。

1、首先需求搭建好hadoop+spark情形,并确认保证服务平常。本文以wordcount为例。

决不maven,不用sbt,只需八个库

2、创立源文件,即输入源。hello.txt文件,内容如下:

2、创制源文件,即输入源。hello.txt文件,内容如下:

一、遭逢表达

集群:Spark 2.1.2 + hadoop2.3

开发机OS:win7

Jdk 1.8.0_151

下载jre即可
http://www.oracle.com/technetwork/java/javase/downloads/index.html

注:JDK是三个平台湾特务定的软件,有针对Windows,Mac和Unix系统的差异的安装包。
能够说JDK是JRE的超集,它含有了JRE的Java编写翻译器,调节和测验器和大旨类

scala 2.11.8
http://www.scala-lang.org/download/

IntelliJ IDEA 2017.3

spark 源码spark-2.1.2-bin-hadoop2.3

tom jerry
henry jim
suse lusy
tom jerry
henry jim
suse lusy

二、意况搭建

注:以空格为分隔符

注:以空格为分隔符

1. 主导配置

安装java、scala,配置境况变量JAVA_HOME、SCALA_HOME为相应安装路线

PATH前面加多%JAVA_HOME%\jre\bin; %SCALA_HOME%\bin

【WIN】%JAVA_HOME%

【Linux】$JAVA_HOME

潜心:scala 安装路线不可能有空格,不然会报错

>>找不到或不能够加载主类scala.tools.nsc.MainGenericRunner

视察标准

张开CMD,分别实施java、scala命令。

3、然后执行如下命令:

3、然后试行如下命令:

2. 设置配备英特尔liJ IDEA 2017.3

开首化后,在file –settings 中增多scala插件,重启

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS创设目录)

  hadoop fs -mkdir -p /Hadoop/Input(在HDFS创设目录)

三、开拓示范

  hadoop fs -put hello.txt /Hadoop/Input(将hello.txt文件上传出HDFS)

  hadoop fs -put hello.txt /Hadoop/Input(将hello.txt文件上盛传HDFS)

1. 新建筑工程程

事实上这里选个java工程就行了,不用搞那么复杂,记住首如若重视库java、scala、spark源码库增添好就行。

下图展现了创制工程时加多java-sdk、scala-sdk的经过。

创建一些须要的目录,笔者的demo的目录树如下:

在scala目录右键mark Directory as — Source Root

  hadoop fs -ls /Hadoop/Input (查看上传的文书)

  hadoop fs -ls /Hadoop/Input (查看上传的公文)

2. 增加spark源码信赖

从File – Project Structure 走入,增加Lib,按下图操作结束后,点击apply –
ok

文本树的表面库会冒出以下多少个:分别是java、spark、scala【珍视重申唷~
三者不可缺少,其余随便】

  hadoop fs -text /Hadoop/Input/hello.txt (查看文件内容)

  hadoop fs -text /Hadoop/Input/hello.txt (查看文件内容)

3. 编写程序

新建二个scala文件斯Parker德姆o.scala,代码如下:

package demo

importorg.apache.spark._

objectSparkDemo{

  def main(args: Array[String]): Unit = {

        val masterUrl = “local[1]”

        val sparkconf =
newSparkConf().setAppName(“helenApp”).setMaster(masterUrl)

        //spark配置,提议保留setMaster(local)

        //调试的时候须要,在骨子里集群上跑的时候可在命令行自定义

        val sc = new SparkContext(sparkconf)

        val rdd=sc.parallelize(List(1,2,3,4,5,6)).map(_*3) 
//将数组(1,2,3,4,5,6)分别乘3

      rdd.filter(_>10).collect().foreach(println) 
//打字与印刷大于10的数字

        println(rdd.reduce(_+_))  //打印 和

        println(“hello world”)  // demo必备的一句代码!!! [认真脸]

  }

}

那时候,scala编辑分界面大概出现那句话,点击setup scala SDK就可以了。

正视库增多成功的验证专门的学问是,import org.apache.spark._不报错。

点击紫褐三角形,run~

在console分界面符合规律输出!(*^__^*)

4、用spark-shell先测量试验一下wordcount职分。

4、用spark-shell先测量试验一下wordcount职务。

4. 打包jar包

如故是从老朋友File–Project Structure 步向,在Artifacts下加多jar。

专一:打包的jar包无需把spark源码也搞进去的,因为集群上自己就有spark代码,所以就留给以下那多少个文本就可以~~~
点击apply –ok

在主分界面,Build—Build Artifacts。开端编写翻译~~~
编写翻译OK后会多出一个out目录,里面有最后jar包

查阅主类,MANIFEST.MF文件内容如下:

Manifest-Version:1.0

Main-Class:demo.SparkDemo

(1)运营spark-shell,当然要求在spark的bin目录下实施,然则此间作者布置了环境变量。

(1)运转spark-shell,当然须要在spark的bin目录下进行,但是此地自身布置了境况变量。

5. 集群上运转jar包

Jar包放到/home/hadoop目录下

>>spark-submit  –class demo.SparkDemo–master spark://:7077
project_name.jar

注解:–class <主类名>
最终跟的参数是大家的jar包。–master钦命了集群master,中间还足以自定义一些spark配置参数,举个例子:

–num-executors 100 \

      –executor-memory6G \

      –executor-cores4 \

      –driver-memory1G \

      –confspark.default.parallelism=1000 \

      –confspark.storage.memoryFraction=0.5 \

      –confspark.shuffle.memoryFraction=0.3 \

集群上运营成功~~~

图片 1

图片 2

总结

事实上看了成都百货上千什么样maven工程、scala工程营造立模型式,其实完全一样,入眼无非是java、spark、scala的借助搞好,那才是大旨难题。

瞩目版本一致性,包涵:

· 开垦机和集群的spark源码版本一样

· 开垦IDE中scala插件和系统装置的scala和集群上scala版本同样

· scala 和spark 版本相称。(spark2.x 相比较 1.x 有多数分裂,请使用scala
2.11+版本)

===================分割线=======================

踩过的坑 ~ (๑ŐдŐ)b,迎接小同伴们和自身享受蒙受的题材 (*^__^*) ~

(2)然后径直输入scala语句:

(2)然后径直输入scala语句:

谬误集锦

阴差阳错景况:

命令行运转scala时,找不到或不只怕加载主类scala.tools.nsc.MainGenericRunner

阴差阳错原因:

scala安装目录出现了空格。

标题一蹴而就:

将Scala 移动到未有空格的文件夹下。复位SCALA_HOME。文件化解。

阴差阳错意况:

编写程序时,调用rdd.saveAsTextFile 报错NullPointerException

阴差阳错原因:

和hadoop文件输出配置有关,下个补丁,配置一下就能够

化解措施:

1)下载文件winutils.exe

2) 将此文件放置在有些目录下,举例D:\hadoop\bin\中。

3)在scala程序的一从头表明:System.setProperty(“hadoop.home.dir”,”D:\\hadoop\\”)

出错意况:

maven开端化学工业程时,连接https://repo.maven.apache.org/maven2过期失败

阴差阳错原因:

PC网络自个儿的难题。无法访谈外网。

斩草除根措施:

运用maven离线模式,手动把重视库导入 <客商目录>\\.m\\
repository

Q:哪天供给maven离线形式吧?

A:未有互联网,唯有本地库,又是用maven来保管项目,在编译恐怕下载第三方Jar的时候,老是去中心旅馆上机关下载,导致出标题。

1)    全局设置Work offline 如下图所示

2)    编辑 <顾客目录>\\.m\\
settings.xml,写一行:<offline> true</offline>

  val file=sc.textFile(“hdfs://hacluster/Hadoop/Input/hello.txt”)

  val file=sc.textFile(“hdfs://hacluster/Hadoop/Input/hello.txt”)

  val rdd = file.flatMap(line => line.split(” “)).map(word =>
(word,1)).reduceByKey(_+_)

  val rdd = file.flatMap(line => line.split(” “)).map(word =>
(word,1)).reduceByKey(_+_)

  rdd.collect()

  rdd.collect()

  rdd.foreach(println)

  rdd.foreach(println)

图片 3

图片 4

ok,测量检验通过。

ok,测量检验通过。

5、Scala达成单词计数

5、Scala完结单词计数

 1 package com.example.spark
 2 
 3 /**
 4  * User: hadoop
 5  * Date: 2017/8/17 0010
 6  * Time: 10:20
 7  */
 8 import org.apache.spark.SparkConf
 9 import org.apache.spark.SparkContext
10 import org.apache.spark.SparkContext._
11 
12 /**
13  * 统计字符出现次数
14  */
15 object ScalaWordCount {
16   def main(args: Array[String]) {
17     if (args.length < 1) {
18       System.err.println("Usage: <file>")
19       System.exit(1)
20     }
21 
22     val conf = new SparkConf()
23     val sc = new SparkContext(conf)
24     val line = sc.textFile(args(0))
25 
26     line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
27 
28     sc.stop()
29   }
30 }
 1 package com.example.spark
 2 
 3 /**
 4  * User: hadoop
 5  * Date: 2017/8/17 0010
 6  * Time: 10:20
 7  */
 8 import org.apache.spark.SparkConf
 9 import org.apache.spark.SparkContext
10 import org.apache.spark.SparkContext._
11 
12 /**
13  * 统计字符出现次数
14  */
15 object ScalaWordCount {
16   def main(args: Array[String]) {
17     if (args.length < 1) {
18       System.err.println("Usage: <file>")
19       System.exit(1)
20     }
21 
22     val conf = new SparkConf()
23     val sc = new SparkContext(conf)
24     val line = sc.textFile(args(0))
25 
26     line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
27 
28     sc.stop()
29   }
30 }

6、用java实现wordcount

6、用java实现wordcount

package com.example.spark;

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

public final class WordCount {
    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {
        if (args.length < 1) {
            System.err.println("Usage: JavaWordCount <file>");
            System.exit(1);
        }
        SparkConf conf = new SparkConf().setAppName("JavaWordCount");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> lines = sc.textFile(args[0],1);
        JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Iterable<String> call(String s) {
                return Arrays.asList(SPACE.split(s));
            }
        });

        JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Tuple2<String, Integer> call(String s) {
                return new Tuple2<String, Integer>(s, 1);
            }
        });

        JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Integer call(Integer i1, Integer i2) {
                return i1 + i2;
            }
        });

        List<Tuple2<String, Integer>> output = counts.collect();
        for (Tuple2<?, ?> tuple : output) {
            System.out.println(tuple._1() + ": " + tuple._2());
        }

        sc.stop();
    }
}
package com.example.spark;

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

public final class WordCount {
    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {
        if (args.length < 1) {
            System.err.println("Usage: JavaWordCount <file>");
            System.exit(1);
        }
        SparkConf conf = new SparkConf().setAppName("JavaWordCount");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> lines = sc.textFile(args[0],1);
        JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Iterable<String> call(String s) {
                return Arrays.asList(SPACE.split(s));
            }
        });

        JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Tuple2<String, Integer> call(String s) {
                return new Tuple2<String, Integer>(s, 1);
            }
        });

        JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Integer call(Integer i1, Integer i2) {
                return i1 + i2;
            }
        });

        List<Tuple2<String, Integer>> output = counts.collect();
        for (Tuple2<?, ?> tuple : output) {
            System.out.println(tuple._1() + ": " + tuple._2());
        }

        sc.stop();
    }
}

7、IDEA打包。

7、IDEA打包。

(1)File —> Project Structure 

(1)File —> Project Structure 

图片 5

图片 6

 

 

图片 7

图片 8

 

 

图片 9

图片 10

 点击ok,配置完结后,在菜单栏中接纳Build->Build
Artifacts…,然后利用Build等一声令下打包。打包实现后会在意况栏中展现“Compilation
completed successfully…”的音讯,去jar包输出路线下查看jar包,如下所示。

 点击ok,配置完毕后,在菜单栏中精选Build->Build
Artifacts…,然后使用Build等一声令下打包。打包实现后会在地方栏中展现“Compilation
completed successfully…”的音讯,去jar包输出路线下查看jar包,如下所示。

图片 11

图片 12

将以此wordcount.jar上传到集群的节点上,scp wordcount.jar
root@10.57.22.244:/opt/   输入虚构机root密码。

将那一个wordcount.jar上传出集群的节点上,scp wordcount.jar
root@10.57.22.244:/opt/   输入虚构机root密码。

8、运行jar包。

8、运行jar包。

正文以spark on yarn格局运维jar包。

本文以spark on yarn情势运营jar包。

推行命令运营javawordcount:spark-submit –master yarn-client –class
com.example.spark.WordCount –executor-memory 1G –total-executor-cores
2 /opt/wordcount.jar hdfs://hacluster/aa/hello.txt

实行命令运营javawordcount:spark-submit –master yarn-client –class
com.example.spark.WordCount –executor-memory 1G –total-executor-cores
2 /opt/wordcount.jar hdfs://hacluster/aa/hello.txt

实施命令运转scalawordcount:spark-submit –master yarn-client –class
com.example.spark.ScalaWordCount –executor-memory 1G
–total-executor-cores 2 /opt/wordcount.jar
hdfs://hacluster/aa/hello.txt

实行命令运营scalawordcount:spark-submit –master yarn-client –class
com.example.spark.ScalaWordCount –executor-memory 1G
–total-executor-cores 2 /opt/wordcount.jar
hdfs://hacluster/aa/hello.txt

正文以java的wordcount为示范对象,如下图:

本文以java的wordcount为示范对象,如下图:

图片 13

图片 14

以上是直接以spark-submit方式提交职分,上面介绍一种以java web的措施交给。

上述是直接以spark-submit格局提交职务,下边介绍一种以java web的方法交给。

9、以Java Web的点子提交职分到spark。

9、以Java Web的章程交给任务到spark。

用spring boot搭建java web框架,达成代码如下:

用spring boot搭建java web框架,达成代码如下:

  1)新建maven项目spark-submit

  1)新建maven项目spark-submit

  2)pom.xml文件内容,这里要专一spark的依赖jar包要与scala的版本相对应,如spark-core_2.11,那背后2.11正是你安装的scala的版本

  2)pom.xml文件内容,这里要留神spark的注重jar包要与scala的版本相呼应,如spark-core_2.11,那背后2.11正是您安装的scala的版本

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.4.1.RELEASE</version>
    </parent>
    <groupId>wordcount</groupId>
    <artifactId>spark-submit</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <start-class>com.example.spark.SparkSubmitApplication</start-class>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
        <commons.version>3.4</commons.version>
        <org.apache.spark-version>2.1.0</org.apache.spark-version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>${commons.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.tomcat.embed</groupId>
            <artifactId>tomcat-embed-jasper</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.jayway.jsonpath</groupId>
            <artifactId>json-path</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <exclusions>
                <exclusion>
                    <artifactId>spring-boot-starter-tomcat</artifactId>
                    <groupId>org.springframework.boot</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jetty</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.eclipse.jetty.websocket</groupId>
                    <artifactId>*</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jetty</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
        </dependency>
        <dependency>
            <groupId>org.eclipse.jetty</groupId>
            <artifactId>apache-jsp</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-solr</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.11</artifactId>
            <version>1.6.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-graphx_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.0.0</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.6.5</version>
        </dependency>

    </dependencies>
    <packaging>war</packaging>

    <repositories>
        <repository>
            <id>spring-snapshots</id>
            <name>Spring Snapshots</name>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>maven2</id>
            <url>http://repo1.maven.org/maven2/</url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>spring-snapshots</id>
            <name>Spring Snapshots</name>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </pluginRepository>
        <pluginRepository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-war-plugin</artifactId>
                <configuration>
                    <warSourceDirectory>src/main/webapp</warSourceDirectory>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.mortbay.jetty</groupId>
                <artifactId>jetty-maven-plugin</artifactId>
                <configuration>
                    <systemProperties>
                        <systemProperty>
                            <name>spring.profiles.active</name>
                            <value>development</value>
                        </systemProperty>
                        <systemProperty>
                            <name>org.eclipse.jetty.server.Request.maxFormContentSize</name>
                            <!-- -1代表不作限制 -->
                            <value>600000</value>
                        </systemProperty>
                    </systemProperties>
                    <useTestClasspath>true</useTestClasspath>
                    <webAppConfig>
                        <contextPath>/</contextPath>
                    </webAppConfig>
                    <connectors>
                        <connector implementation="org.eclipse.jetty.server.nio.SelectChannelConnector">
                            <port>7080</port>
                        </connector>
                    </connectors>
                </configuration>
            </plugin>
        </plugins>

    </build>

</project>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.4.1.RELEASE</version>
    </parent>
    <groupId>wordcount</groupId>
    <artifactId>spark-submit</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <start-class>com.example.spark.SparkSubmitApplication</start-class>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
        <commons.version>3.4</commons.version>
        <org.apache.spark-version>2.1.0</org.apache.spark-version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>${commons.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.tomcat.embed</groupId>
            <artifactId>tomcat-embed-jasper</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.jayway.jsonpath</groupId>
            <artifactId>json-path</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <exclusions>
                <exclusion>
                    <artifactId>spring-boot-starter-tomcat</artifactId>
                    <groupId>org.springframework.boot</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jetty</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.eclipse.jetty.websocket</groupId>
                    <artifactId>*</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jetty</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
        </dependency>
        <dependency>
            <groupId>org.eclipse.jetty</groupId>
            <artifactId>apache-jsp</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-solr</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jstl</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.11</artifactId>
            <version>1.6.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-graphx_2.11</artifactId>
            <version>${org.apache.spark-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.0.0</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.6.5</version>
        </dependency>

    </dependencies>
    <packaging>war</packaging>

    <repositories>
        <repository>
            <id>spring-snapshots</id>
            <name>Spring Snapshots</name>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>maven2</id>
            <url>http://repo1.maven.org/maven2/</url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>spring-snapshots</id>
            <name>Spring Snapshots</name>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </pluginRepository>
        <pluginRepository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-war-plugin</artifactId>
                <configuration>
                    <warSourceDirectory>src/main/webapp</warSourceDirectory>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.mortbay.jetty</groupId>
                <artifactId>jetty-maven-plugin</artifactId>
                <configuration>
                    <systemProperties>
                        <systemProperty>
                            <name>spring.profiles.active</name>
                            <value>development</value>
                        </systemProperty>
                        <systemProperty>
                            <name>org.eclipse.jetty.server.Request.maxFormContentSize</name>
                            <!-- -1代表不作限制 -->
                            <value>600000</value>
                        </systemProperty>
                    </systemProperties>
                    <useTestClasspath>true</useTestClasspath>
                    <webAppConfig>
                        <contextPath>/</contextPath>
                    </webAppConfig>
                    <connectors>
                        <connector implementation="org.eclipse.jetty.server.nio.SelectChannelConnector">
                            <port>7080</port>
                        </connector>
                    </connectors>
                </configuration>
            </plugin>
        </plugins>

    </build>

</project>

(3)SubmitJobToSpark.java

(3)SubmitJobToSpark.java

package com.example.spark;

import org.apache.spark.deploy.SparkSubmit;

/**
 * @author kevin
 *
 */
public class SubmitJobToSpark {

    public static void submitJob() {
        String[] args = new String[] { "--master", "yarn-client", "--name", "test java submit job to spark", "--class", "com.example.spark.WordCount", "/opt/wordcount.jar", "hdfs://hacluster/aa/hello.txt" };
        SparkSubmit.main(args);
    }
}
package com.example.spark;

import org.apache.spark.deploy.SparkSubmit;

/**
 * @author kevin
 *
 */
public class SubmitJobToSpark {

    public static void submitJob() {
        String[] args = new String[] { "--master", "yarn-client", "--name", "test java submit job to spark", "--class", "com.example.spark.WordCount", "/opt/wordcount.jar", "hdfs://hacluster/aa/hello.txt" };
        SparkSubmit.main(args);
    }
}

(4)SparkController.java

(4)SparkController.java

package com.example.spark.web.controller;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.ResponseBody;

import com.example.spark.SubmitJobToSpark;

@Controller
@RequestMapping("spark")
public class SparkController {
    private Logger logger = LoggerFactory.getLogger(SparkController.class);

    @RequestMapping(value = "sparkSubmit", method = { RequestMethod.GET, RequestMethod.POST })
    @ResponseBody
    public String sparkSubmit(HttpServletRequest request, HttpServletResponse response) {
        logger.info("start submit spark tast...");
        SubmitJobToSpark.submitJob();
        return "hello";
    }

}
package com.example.spark.web.controller;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.ResponseBody;

import com.example.spark.SubmitJobToSpark;

@Controller
@RequestMapping("spark")
public class SparkController {
    private Logger logger = LoggerFactory.getLogger(SparkController.class);

    @RequestMapping(value = "sparkSubmit", method = { RequestMethod.GET, RequestMethod.POST })
    @ResponseBody
    public String sparkSubmit(HttpServletRequest request, HttpServletResponse response) {
        logger.info("start submit spark tast...");
        SubmitJobToSpark.submitJob();
        return "hello";
    }

}

5)将项目spark-submit打成war包布署到Master节点tomcat上,访谈如下央浼:

5)将项目spark-submit打成war包布置到Master节点tomcat上,访谈如下乞请:

  http://10.57.22.244:9090/spark/sparkSubmit

  http://10.57.22.244:9090/spark/sparkSubmit

  在tomcat的log中能看到总括的结果。

  在tomcat的log中能看到总括的结果。

 

 

相关文章