I start to learn Spark to process some log files, here is a simple example. How to build Spark, please see http://spark.apache.org/docs/latest/building-spark.html Scala file import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("wordCount") val sc = new SparkContext(conf) val input = sc.textFile("/home/nickyang/develop/spark/spark-1.6.1/README.md") val words = input.flatMap(line => line.split(" ")) val couts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y} couts.saveAsTextFile("/home/nickyang/develop/spark/spark-1.6.1/examples/wordCount/result") } } sbt file(use sbt to build this example) name := "SampleApp" version := "0.0.1" scalaVersion := "2.10.5" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided" sbt package YOUR_SPARK_HOME/bin/spark-submit --class <span class="s2""SimpleApp"</span --master <span class="nb"local</span<span class="o"[</span1<span class="o"]</span target/scala-2.11/sampleapp_2.10-0.0.1.jar The result is in result directory, two files, one is _SUCCESS that tells us the right result, the other one is “part-00000”, contains words and words’ count in this……

阅读全文