Spark API编程动手实战-02-以集群模式进行Spark API实战textFile、cache、count

2024-02-01 12:18

本文主要是介绍Spark API编程动手实战-02-以集群模式进行Spark API实战textFile、cache、count,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

操作HDFS:先要保证HDFS启动了:


启动spark集群:


以spark-shell运行在spark集群上:



查看下之前上传到HDFS上的”LICENSE.txt“文件:


用spark读取这个文件:


使用count统计该文件的行数:


 我们可以看到count 耗时为0.239708s

对该RDD进行cache操作并执行count使得缓存生效:


执行count结果为:


此时耗时为0.21132s

再执行count操作:


此时耗时为0.029580s,这时因为我们自己基于cache后的数据进行操作的。

接着我们对上面的rdd进行wordcount操作:



通过saveAsTextFile把数据存到HDFS中:


我们通过web控制台查看下运行结果:


我们通过命令行看下part-00000和part-00001内容:

[spark@S1PA222 ~]$ hadoop fs -cat /data/resultLicenseWordCount/part-00000
15/01/22 13:51:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(under,10)
(Unless,3)
(Contributions),1)
(offer,1)
(agree,1)
(BUSINESS,2)
(NON-INFRINGEMENT,,1)
(its,4)
(materials,2)
(event,1)
(intentionally,2)
(Grant,2)
(writing,1)
(include,3)
(responsibility,,1)
(have,2)
(MERCHANTABILITY,,1)
(Contribution,3)
(Massachusetts,1)
(express,2)
("Your"),1)
((i),1)
(However,,1)
(been,2)
(files;,1)
(This,1)
(stating,1)
(2-Clause,1)
(conditions.,1)
(non-exclusive,,2)
(appropriateness,1)
(marked,1)
(risks,1)
(any,28)
(IS",4)
(implementation,1)
(filed.,1)
(Sections,1)
(fee,1)
(losses),,1)
(out,1)
(contract,2)
(DISTRIBUTION,1)
(4.,1)
(file,6)
(documentation,,2)
(wherever,1)
(unless,1)
(below).,1)
(names,,1)
(verbal,,1)
(ANY,10)
(version,1)
(file.,2)
(are,10)
(no-charge,,2)
(2.,1)
(from,,1)
(reproduction,,3)
(2011-2014,,1)
(assume,1)
(licenses,1)
(DATA,,2)
(IS,2)
(recommend,1)
(prominent,1)
(revisions,,1)
("[]",1)
(FITNESS,3)
(otherwise,,3)
(distribution,,1)
(necessarily,1)
(Apache,5)
(grant,1)
(CONTRIBUTORS,4)
(as,15)
(irrevocable,2)
(inclusion,2)
(purpose,2)
(products,1)
(ARE,2)
(merely,1)
(File,1)
(Definitions.,1)
(form,10)
(IMPLIED,4)
(Warranty,1)
(Patent,1)
(incurred,1)
(8.,1)
(repository,1)
(contributors,1)
("printed,1)
(sell,,2)
(:,3)
(malfunction,,1)
(Version,2)
(origin,1)
(alongside,1)
(CRC,1)
(implied.,1)
(contract,,1)
(representatives,,1)
(warranty,1)
(offer,,1)
(org.apache.hadoop.util.bloom.*,1)
(KIND,,2)
(is,10)
(conspicuously,1)
(found,1)
(charge,1)
(make,,1)
(file,,1)
(associated,1)
(even,1)
(same,1)
((Don't,1)
(outstanding,1)
(link,1)
([name,1)
(Trademarks.,1)
(notice,2)
(endorse,1)
(shall,15)
(contact,1)
(Redistributions,4)
(using,1)
(class,1)
(name),1)
(behalf,5)
(form.,1)
(We,1)
(INTERRUPTION),2)
(responsible,1)
(annotations,,1)
(THIS,4)
(subject,1)
(acting,1)
(permitted,2)
(OUT,2)
(BASIS,,2)
(has,2)
(Accepting,1)
(defend,,1)
(University,1)
([yyyy],1)
((http://www.one-lab.org),1)
(EVENT,2)
(granting,1)
(portions,1)
(implied,,1)
(NOTICE,5)
(infringed,1)
(limitation,,1)
(names,2)
(electronic,,1)
(PURPOSE,2)
(licensable,1)
(section),1)
(conditions,14)
(EVEN,2)
(acts),1)
(law,3)
(licenses.,1)
(compression,1)
(readable,1)
(solely,1)
(configuration,1)
(information.,1)
(litigation,2)
(represent,,1)
(warranty,,1)
(shares,,1)
(supersede,1)
(governed,1)
(marks,,1)
(http://code.google.com/p/lz4/,1)
(modification,,2)
(fifty,1)
(sent,1)
(places:,1)
(means,2)
(identifying,1)
(this,22)
(Works",1)
(Louvain,1)
(prior,1)
(slicing-by-8,1)
(PROCUREMENT,2)
(changed,1)
(describing,1)
(only,4)
(contributory,1)
(normally,1)
(indirect,,2)
(WITHOUT,2)
(Works,12)
(documentation,3)
(agreement,1)
(otherwise,3)
("AS,4)
(damages,,1)
(patent,,1)
(APACHE,1)
(without,6)
("NOTICE",1)
(Limitation,1)
(SUBSTITUTE,2)
(Contribution(s),3)
(Subject,2)
(Submission,1)
(UCL,1)
(TITLE,,1)
(trademarks,,1)
((iii),1)
(2.0,1)
(Fast,1)
(exercise,1)
(accepting,2)
(example,1)
(distribution.,2)
(interfaces,1)
(conditions:,1)
(act,1)
(incorporated,2)
(provides,2)
(limited,4)
(LZ4,3)
(2008,2009,2010,1)
(can,2)
(contents,1)
(PURPOSE.,1)
(recipients,1)
("Contribution",1)
(failure,1)
(communication,3)
(commercial,1)
(works,1)
(language,1)
(permissions,3)
(WARRANTIES,4)
(media,1)
(reserved.,2)
(Works,,2)
(How,1)
(WARRANTIES,,2)
(controlled,1)
(Warranty.,1)
(2.0,,1)
((http://www.opensource.org/licenses/bsd-license.php),1)
(own,4)
(submit,1)
(SHALL,2)
(reasonable,1)
(reason,1)
(agreed,3)
(systems,1)
(patent,5)
(form,,4)
(Technology.,1)
(advised,1)
(systems,,1)
(classes:,1)
(HOWEVER,2)
(distribution,3)
(DAMAGES,2)
((c),2)
(src/main/native/src/org/apache/hadoop/util:,1)
(PROFITS;,2)
(perpetual,,2)
(applies,1)
(apply,2)
(subcomponents,2)
(modify,2)
(owner],1)
(one,1)
(modifying,1)
(counterclaim,1)
(January,1)
(discussing,1)
(CONTRACT,,2)
(with,16)
((C),1)
(infringement,,1)
(2004,1)
(lawsuit),1)
(specific,2)
(LZ,1)
(warranties,1)
(reproducing,1)
(promote,1)
(beneficial,1)
(ADVISED,2)
((a),1)
(other,9)
(date,1)
(met:,2)
(publicly,2)
(from,4)
(LIMITED,4)
(display,,1)
(MERCHANTABILITY,2)
(damages,3)
(SUBCOMPONENTS:,1)
(negligence),,1)
(remain,1)
(CONDITIONS,4)
(their,2)
(electronic,1)
(identification,1)
(determining,1)
(consistent,1)
(display,1)
(writing,,3)
(trade,1)
(third-party,2)
(,1299)
(description,1)
(REPRODUCTION,,1)
(attached,1)
(list,4)
(*,34)
(INDIRECT,,2)
(designated,1)
(Contribution.",1)
(complies,1)
(addendum,1)
(damages.,1)
(Yann,1)
(EXPRESS,2)
(License;,1)
(6.,1)
(GOODS,2)
(subsequently,1)
(included,2)
(replaced,1)
(notice,,5)
[spark@S1PA222 ~]$   hadoop fs -cat /data/resultLicenseWordCount/part-00001

15/01/22 13:52:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(For,6)
(reproduce,,1)
("Contributor",1)
((or,3)
(nothing,1)
(work.,1)
(content,1)
(HOLDERS,2)
(add,2)
(through,1)
(All,2)
(perform,,1)
(result,1)
(goodwill,,1)
(herein,1)
(direct,,1)
(used,1)
(To,1)
(harmless,1)
(9.,1)
(these,1)
(control,,1)
(INCIDENTAL,,2)
(indicated,1)
(part,4)
(alone,1)
(different,1)
(forms,,2)
(purposes,4)
(https://groups.google.com/forum/#!forum/lz4c,1)
(be,7)
(/**,2)
(carry,1)
(separable,1)
(including,5)
(contained,1)
(combination,1)
(calculation,1)
(license,7)
(FOR,6)
(thereof,,2)
(ARISING,2)
(constitutes,1)
(but,5)
(types.,1)
(stated,2)
(archives.,1)
(obligations,,1)
(5.,1)
(Works;,3)
(nor,1)
("Legal,1)
(Work,20)
(whole,,2)
(Copyright,5)
(at,3)
(copyright,,1)
(Redistribution,2)
(object,1)
(copy,3)
(indemnify,,1)
(asserted,1)
(HADOOP,1)
(attach,1)
("control",1)
(support,,1)
("Object",1)
(give,1)
(THEORY,2)
(may,10)
(except,2)
("Work",1)
(sublicense,,1)
(IF,2)
(granted,2)
(project,2)
(authorized,2)
(SPECIAL,,2)
(BY,2)
(retain,2)
(or,65)
(transfer,1)
(fields,1)
(Licensor,,1)
((b),1)
((ii),1)
(2005,,1)
(of,75)
(does,1)
(transformation,1)
((INCLUDING,2)
(DIRECT,,2)
(management,1)
(modified,1)
(Licensed,1)
(percent,1)
(Header,1)
(original,2)
(Contributor,,1)
(native,1)
((INCLUDING,,2)
(PARTICULAR,3)
(limitations,1)
(THE,10)
(INCLUDING,,2)
(power,,1)
(CAUSED,2)
(de,1)
(appropriate,1)
(against,,1)
(TORT,2)
("Source",1)
(each,4)
(1.,1)
(following,10)
(Liability.,2)
(acceptance,1)
("You",1)
(sole,1)
(from),1)
(See,1)
(tracking,1)
(for,19)
(cause,2)
(alleging,1)
(obtain,1)
(reproduce,3)
(source,,1)
(control,2)
(EXEMPLARY,,2)
(TERMS,2)
(terms,8)
(syntax,1)
(SERVICES;,2)
(made,,1)
(BUT,4)
(compiled,1)
(issue,1)
("submitted",1)
(OneLab,1)
(algorithm,1)
(was,1)
(While,1)
(entity,,1)
(do,3)
(PROVIDED,2)
(no,2)
(License,10)
(entity,3)
(Contributions.,2)
(mean,10)
(individual,3)
(Institute,1)
(computer,1)
(notices,9)
(Neither,1)
(Licensor,8)
(STRICT,2)
(made,1)
(authorship,,2)
(bind,1)
((the,1)
(indemnity,,1)
(distribute,3)
(You,24)
(grants,2)
(brackets,1)
(meet,1)
(for,,1)
(service,1)
(in,31)
(trademark,,1)
(boilerplate,1)
(WAY,2)
(LOSS,2)
(distributed,3)
(LIABILITY,,4)
(submitted,2)
(public,1)
(OF,19)
(managed,1)
(derived,2)
(Source,8)
(use,,4)
(name,2)
(definition,,2)
(that,25)
(src/main/native/src/org/apache/hadoop/io/compress/lz4/{lz4.h,lz4.c,lz4hc.h,lz4hc.c},,1)
(customary,1)
(BSD,1)
(thereof,1)
(claims,2)
(CONSEQUENTIAL,2)
(translation,1)
(format.,1)
(construed,1)
(DAMAGE.,2)
(applicable,3)
(binary,4)
(regarding,1)
(European,1)
(excluding,3)
(END,1)
((d),1)
(choose,1)
(NO,2)
(BE,2)
(direct,2)
(retain,,1)
(modifications,,3)
(forum,1)
(owner,4)
(USE,2)
(informational,1)
(The,3)
(legal,1)
((50%),1)
(document.,1)
(received,1)
(such,17)
(institute,1)
(distribute,,2)
(WHETHER,2)
(page",1)
((except,1)
(loss,1)
(common,1)
(additions,1)
(BSD-style,1)
(Appendix,1)
(Use,1)
(disclaimer,2)
(resulting,1)
(ON,2)
(hereby,2)
(License.,11)
(software,3)
(whom,1)
(along,1)
(lists,,1)
(required,4)
(OR,18)
(ownership,2)
(SOFTWARE,2)
(the,122)
(includes,1)
(obligations,1)
(import,,1)
(not,11)
(either,2)
(terminate,1)
(if,4)
(stoppage,,1)
(provided,9)
(submitted.,1)
(all,3)
(permission.,1)
("License");,1)
(written,2)
(generated,2)
(consequential,1)
(Derivative,17)
(AND,11)
(rights,3)
(http://www.apache.org/licenses/,1)
(terms.,1)
(Catholique,1)
(deliberate,1)
(entity.,2)
(Work,,4)
(special,,1)
(Additional,1)
(Legal,3)
(034819,1)
(least,1)
(text,4)
(on,11)
(editorial,1)
(redistributing,2)
("License",1)
(against,1)
(permission,1)
(9,1)
(separate,2)
(and/or,3)
(LICENSE,1)
(union,1)
((and,1)
(1,1)
(including,,1)
(Entity,3)
(negligent,1)
(LIABLE,2)
(IN,6)
(use,8)
(enclosed,2)
(contains,1)
(files,1)
(Entity",1)
(Work.,1)
(owner.,1)
(preferred,1)
(modifications,3)
(brackets!),1)
(available,1)
(code,5)
(http://www.apache.org/licenses/LICENSE-2.0,1)
(more,1)
(possibility,1)
(product,1)
(liable,1)
(SUCH,2)
(direction,1)
(must,8)
(making,1)
(Disclaimer,1)
(disclaimer.,2)
(Commission,1)
(OTHERWISE),2)
(Hadoop,1)
((an,1)
(APPENDIX:,1)
("Licensor",1)
(DISCLAIMED.,2)
("Derivative,1)
(elaborations,,1)
(incidental,,1)
(prepare,1)
(A,3)
(exercising,1)
(*/,3)
(which,2)
(pertain,2)
(explicitly,1)
(tort,1)
(3.,1)
(also,1)
(conversions,1)
(liability,2)
(whether,4)
(character,1)
(should,1)
(thereof.,1)
(of,,3)
(your,4)
(royalty-free,,2)
(entities,1)
(or,,1)
(NEGLIGENCE,2)
(author,1)
("Not,1)
(source,9)
(then,2)
((including,3)
(Redistribution.,1)
(attribution,4)
(by,21)
(TO,,4)
(defined,1)
(OWNER,2)
(If,2)
(an,6)
(/*,1)
(Collet.,1)
(improving,1)
(grossly,1)
(COPYRIGHT,4)
(above,,1)
(theory,,1)
(mailing,1)
(7.,1)
(Notwithstanding,1)
(code,,2)
(cross-claim,1)
(provide,1)
((such,1)
(arising,1)
(Object,4)
(In,1)
(-,7)
(those,3)
(work,,2)
(easier,1)
(based,1)
(medium,,1)
(within,8)
(worldwide,,2)
(authorship.,1)
(files.,1)
(inability,1)
(you,2)
(POSSIBILITY,2)
(cannot,1)
(copies,1)
(a,21)
(statement,1)
(above,4)
(state,1)
(work,5)
(by,,3)
(to,41)
(appear.,1)
(Your,9)
(where,1)
(liability.,1)
(governing,1)
(NOT,4)
(License,,6)
(hold,1)
(and,51)
(copyright,15)
(USE,,3)
(compliance,1)
(SOFTWARE,,2)
(comment,1)
(additional,4)
(executed,1)
(mechanical,1)
(Contributor,8)
[spark@S1PA222 ~]$

这篇关于Spark API编程动手实战-02-以集群模式进行Spark API实战textFile、cache、count的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/667254

相关文章

网页解析 lxml 库--实战

lxml库使用流程 lxml 是 Python 的第三方解析库,完全使用 Python 语言编写,它对 XPath表达式提供了良好的支 持,因此能够了高效地解析 HTML/XML 文档。本节讲解如何通过 lxml 库解析 HTML 文档。 pip install lxml lxm| 库提供了一个 etree 模块,该模块专门用来解析 HTML/XML 文档,下面来介绍一下 lxml 库

服务器集群同步时间手记

1.时间服务器配置(必须root用户) (1)检查ntp是否安装 [root@node1 桌面]# rpm -qa|grep ntpntp-4.2.6p5-10.el6.centos.x86_64fontpackages-filesystem-1.41-1.1.el6.noarchntpdate-4.2.6p5-10.el6.centos.x86_64 (2)修改ntp配置文件 [r

HDFS—集群扩容及缩容

白名单:表示在白名单的主机IP地址可以,用来存储数据。 配置白名单步骤如下: 1)在NameNode节点的/opt/module/hadoop-3.1.4/etc/hadoop目录下分别创建whitelist 和blacklist文件 (1)创建白名单 [lytfly@hadoop102 hadoop]$ vim whitelist 在whitelist中添加如下主机名称,假如集群正常工作的节

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

【Prometheus】PromQL向量匹配实现不同标签的向量数据进行运算

✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,阿里云开发者社区专家博主,CSDN全栈领域优质创作者,掘金优秀博主,51CTO博客专家等。 🏆《博客》:Python全栈,前后端开发,小程序开发,人工智能,js逆向,App逆向,网络系统安全,数据分析,Django,fastapi

Linux 网络编程 --- 应用层

一、自定义协议和序列化反序列化 代码: 序列化反序列化实现网络版本计算器 二、HTTP协议 1、谈两个简单的预备知识 https://www.baidu.com/ --- 域名 --- 域名解析 --- IP地址 http的端口号为80端口,https的端口号为443 url为统一资源定位符。CSDNhttps://mp.csdn.net/mp_blog/creation/editor

【Python编程】Linux创建虚拟环境并配置与notebook相连接

1.创建 使用 venv 创建虚拟环境。例如,在当前目录下创建一个名为 myenv 的虚拟环境: python3 -m venv myenv 2.激活 激活虚拟环境使其成为当前终端会话的活动环境。运行: source myenv/bin/activate 3.与notebook连接 在虚拟环境中,使用 pip 安装 Jupyter 和 ipykernel: pip instal

在JS中的设计模式的单例模式、策略模式、代理模式、原型模式浅讲

1. 单例模式(Singleton Pattern) 确保一个类只有一个实例,并提供一个全局访问点。 示例代码: class Singleton {constructor() {if (Singleton.instance) {return Singleton.instance;}Singleton.instance = this;this.data = [];}addData(value)

C#实战|大乐透选号器[6]:实现实时显示已选择的红蓝球数量

哈喽,你好啊,我是雷工。 关于大乐透选号器在前面已经记录了5篇笔记,这是第6篇; 接下来实现实时显示当前选中红球数量,蓝球数量; 以下为练习笔记。 01 效果演示 当选择和取消选择红球或蓝球时,在对应的位置显示实时已选择的红球、蓝球的数量; 02 标签名称 分别设置Label标签名称为:lblRedCount、lblBlueCount