Apache Spark 配置

2 年 ago

清, 宇

1 minute

有关Apache Spark设置的步骤。

安装

brew install apache-spark

准备已经完成。

启动REPL

在 REPL 中，你可以尝试许多不同的东西。

spark-shell

试着玩一玩

尝试阅读 README 等等。
shì README .)

scala> val textFile = spark.read.textFile("README.md")
textFile: org.apache.spark.sql.Dataset[String] = [value: string]

scala> textFile.first
res0: String = # 俺のアパッチスパーク

textFile.count
res1: Long = 167

试着创建一个合适的 DataFrame 等。

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.appName(this.getClass.getName).getOrCreate()
import spark.implicits._

val people = Seq(
  ("Spark太郎", "spark-taro@example.com", "2021-01-01", 11),
  ("Spark二郎", "spark-jiro@example.com", "2021-02-02", 12),
  ("Spark三郎", "spark-saburo@example.com", "2021-03-03", 13)
).toDF("name", "email", "birthday", "age")

// Exiting paste mode, now interpreting.
...

scala> people.printSchema()
root
 |-- name: string (nullable = true)
 |-- email: string (nullable = true)
 |-- birthday: string (nullable = true)
 |-- age: integer (nullable = false)

scala> people.show(3)
+---------+--------------------+----------+---+
|     name|               email|  birthday|age|
+---------+--------------------+----------+---+
|Spark太郎|spark-taro@exampl...|2021-01-01| 11|
|Spark二郎|spark-jiro@exampl...|2021-02-02| 12|
|Spark三郎|spark-saburo@exam...|2021-03-03| 13|
+---------+--------------------+----------+---+

请尝试各种不同的事情。
结束了。