Spark算子:RDD键值转换操作(4)

Spark算子:RDD键值转换操作(4)–cogroup/join

本文主要是介绍Spark算子:RDD键值转换操作(4)–cogroup/join，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

cogroup

函数原型：最多可以组合4个RDD，可以通过partitioner和numsPartitions设置

def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)], partitioner: Partitioner) :RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))] 
def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)], numPartitions: Int) :RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)]): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)],partitioner: Partitioner): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)], numPartitions: Int): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)]): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
def cogroup[W](other: RDD[(K, W)], partitioner: Partitioner) :RDD[(K, (Iterable[V], Iterable[W]))]
def cogroup[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (Iterable[V], Iterable[W]))]
def cogroup[W](other: RDD[(K, W)]): RDD[(K, (Iterable[V], Iterable[W]))]

输入：

    val data1 = sc.parallelize(List((1, "1.101"), (2, "1.201"),(1, "1.102"), (2, "1.202"),(1, "1.103"), (2, "1.203")))val data2 = sc.parallelize(List((1, "2.101"), (2, "2.201"), (3, "2.301"),(1, "2.102"), (2, "2.202"), (3, "2.302")))val data3 = sc.parallelize(List((1, "3.101"), (2, "3.201"), (3, "3.303"),(1, "3.102"), (2, "3.202"), (3, "3.303")))val result = data1.cogroup(data2, data3)result.collect.foreach(println)

输出结果：

scala> result.collect.foreach(println)
(1,(CompactBuffer(1.101, 1.102, 1.103),CompactBuffer(2.102, 2.101),CompactBuffer(3.101, 3.102)))
(2,(CompactBuffer(1.201, 1.202, 1.203),CompactBuffer(2.202, 2.201),CompactBuffer(3.202, 3.201)))
(3,(CompactBuffer(),CompactBuffer(2.301, 2.302),CompactBuffer(3.303, 3.303)))

join

函数原型

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]

join相当于SQL中的内关联join，只返回两个RDD根据K可以关联上的结果，join只能用于两个RDD之间的关联，如果要多个RDD关联，多关联几次即可。

var rdd1 = sc.makeRDD(Array(("A","1"),("B","2"),("C","3")),2)
var rdd2 = sc.makeRDD(Array(("A","a"),("C","c"),("D","d")),2)scala> rdd1.join(rdd2).collect
res10: Array[(String, (String, String))] = Array((A,(1,a)), (C,(3,c)))

这篇关于Spark算子:RDD键值转换操作(4)–cogroup/join的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！