I start to learn Spark to process some log files, here is a simple example.

How to build Spark, please see http://spark.apache.org/docs/latest/building-spark.html

Scala file

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("wordCount")
    val sc = new SparkContext(conf)
    val input = sc.textFile("/home/nickyang/develop/spark/spark-1.6.1/README.md")
    val words = input.flatMap(line => line.split(" "))
    val couts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y}

sbt file(use sbt to build this example)

name := "SampleApp"
version := "0.0.1"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided"
sbt package
YOUR_SPARK_HOME/bin/spark-submit --class <span class="s2">"SimpleApp"</span> --master <span class="nb">local</span><span class="o">[</span>1<span class="o">]</span> target/scala-2.11/sampleapp_2.10-0.0.1.jar

The result is  in result directory, two files, one is _SUCCESS that tells us the right result, the other one is “part-00000”, contains words and words’ count in this text file.





(Because, 1)









BTW. this article is written in Ubuntu, haven’t Chinese input method(English version).