关于大数据的十个有力事实

2023-10-09 08:50
文章标签 数据 十个 有力 事实

本文主要是介绍关于大数据的十个有力事实,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

0.jpg

无论大家如何进行定义,大数据自诞生之日起就饱受争议——既有毛病之词,亦不乏诋毁之声。大数据对于很多人来说包含有重要的意义,特别是科学家和零售商家。不过这项技术的出现也引发了大量的相关隐私问题与安全威胁。


到底是救世主、骗局抑或二者兼而有之?无论如何,大数据仍然在技术专家、趋势分析师、市场推广人士以及安全从业者群体中拥有极高的热度与人气。事实上,截至今天大数据仍然没有一个受到普遍认同的官方定义。那么大数据到底是什么?维基百科给出的描述可以说为大数据的概念确立之路开了个好头:“任何由于规模庞大且高度复杂而难以通过现有数据库管理工具或者传统数据处理应用进行处理的数据集。”


虽然管理这种规模庞大、形式多变且对速度要求较高(这三点也就是经典的3V定义)的数据集确实充满挑战,不过目前针对这类任务的数据共享设备的数量正呈现指数级增长的趋势,而这又给大数据难题带来更多别样的变化。这类硬件被统称为物联网,其中包括机器传感器以及面向普通消费者的设备,例如联网温控器、电灯泡、冰箱以及可穿戴式健康监测工具等。IDC公司预计,物联网市场在未来几年当中将迅猛增长——其单位安装数量将由2013年年底的91亿增长到2020年的281亿。


企业则将来自大数据的可行性分析结论视为潜在的利好消息,这不仅是因为此类结论能够帮助商家售出更多工具及服务,同时也可以更好地处理医疗事务、阻止伪劣药品流通、追踪恐怖分子甚至监控特定目标的通话内容。因此,大数据本身并没有善恶之分,真正起决定作用的还是我们的实际使用方式。



具有讽刺意味的是,尽管大数据当中蕴藏着提升人类经验的潜在可能性,但这些宝贵的信息却往往很难进行收集、筛选、分析以及最后的解释。今天的文章着重审视大数据领域的挑战与机遇,这些事实与论证数据很可能给各位带来意外惊喜。哪些内容值得期待?这个嘛,作为大数据平台中的领导者,Hadoop的发展前景一片光明。而且数据科学家与大数据相关技术人士也将在未来几年中获得丰厚的薪酬回报。


业内人士作出预测,认为“大数据”作为流行词汇将彻底消失。“一切的一切最终都会被归结为数据,仅此而已。大数据与所有以此为基础的预测行为都将成为由分析师以及众多‘大型’技术供应商负责的‘数据管理’工作,”Hortonworks公司总裁Herb Cunitz在2012年12月的一篇博文中写道。


Cunitz作出的“大数据”概念消亡预测可能为时过早,他提出了很重要的一项结论,即一切的一切最终都会被归结为数据。只有管理这些信息所必需的工具会迎来变革。现在就请大家跟随我们的脚步,一同通过图文了解与大数据紧密相关的统计及研究成果。



一、有多少数据被忽略掉了?

0

大多数企业估算称,他们只对自身持有的约12%数据进行了分析,Forrester研究公司在最近的一项调查中发现。这到底是好消息还是坏消息?这个嘛,被他们所忽略的88%数据当中很可能蕴藏着足以带来数据驱动结论的宝贵信息。但从另一个角度看,他们也许明智地避免了由所谓“煮沸海洋”战略所带来的巨大资源消耗。说起企业忽略绝大多数自有数据的理由,原因主要有两点:第一是缺乏相关分析工具与“可控制”数据仓库,第二则在于他们很难确切了解哪些信息能够实现价值、哪些则最好加以忽略,Forrester公司在报告中指出。


二、大数据相关工作岗位持续增长


0

大数据掀起的狂潮对于具备特定技能的从业人员来说不啻为一大福音。根据 Dice网站(一家专门服务于技术及工程专业人才的求职网站)的统计,目前业界对于数据专家的需求正持续激增。与上一年相比,目前针对NoSQL技术人员的招聘岗位数量增长了54%,而面向“大数据人才”的岗位也上涨了46%,该网站在今年四月的报告中指出。虽然这样的提升幅度令人印象深刻,不过与网络安全专家的职位需求相比仍然是小巫见大巫——后者的同比增长幅度高达162%。


三、大数据最终将成长至怎样的规模?

0

在未来六年当中,数字化领域的数据问题将由目前的3.2 ZB(即泽字节)增长到40 ZB。(1 ZB基本相当于10亿TB。)“当我们审视即将席卷而来的数据量时,其庞大的规模真的很令人兴奋,”Hortonworks公司CEO Rob Bearden在今年于加利福尼亚州圣何塞举办的2014 Hadoop峰会上表示。“从现在到2020年,企业所持有的数量问题将以每年50倍的速度递增。我认为目前最重要的任务在于清醒地认识到,其中85%的数据来自新兴网络数据源。”包括移动、社交媒体以及Web与机器生成数据在内的这些新兴数据源将给全球企业带来重大挑战与不可错过的发展机遇,Bearden指出。


四、大数据等同于大财富

0

大数据相关岗位的薪酬相当突出。根据Burtch Works公司发布的2014年4月数据科学家薪酬报告,2014年数据科学家职位的基础薪酬为每年12万美元,相关管理岗位则为每年16万美元。这一结论以Burtch Works就业数据库的分析为基础,涉及超过170位数据科学家在采访中的意见反馈。对于范畴更为广泛的大数据相关专业人士而言,也就是那些“利用复杂的定量分析技术对事务、相互作用或者其它人为因素进行数据化描述、从而得出结论及对应方案的从业者”,其整体薪酬同样实现了显著提升。这类工作人员在2013年获得的平均薪酬水平在每年9万美元左右,而相关管理岗位则开出了每年14.5万美元这一令人艳羡的平均工资。


五、大数据专业人士是否准备好迎接物联网时代?

0

大多数IT专家表示他们还没有开始为物联网时代的来临进行准备。Spiceworks公司今年四月对440位IT专业人士进行了调查,了解他们如何看待物联网并有针对性地推进前期准备工作。其中62%的受访者来自北美地区,38%则来自EMEA(即欧洲、中东以及非洲)地区。超过一半(59%)的受访者指出,他们还没有采取具体的步骤来处理未来产生自传感器、摄像头以及其它各类物联网设备的海量数据。不过调查还发现,也有相当一部分IT专业人士开始切实筹备物联网相关事宜,包括向基础设施、安全、应用以及分析机制进行投资,并同时扩大数据传输带宽。


六、数据科学家:仍然性感、依旧迷人

0

2012年10月《哈佛商业评论》发布了一篇抓人眼球的报道,其中将数据科学相关工作称为“二十一世纪最性感的工作岗位”。这种说法存在一定争议,不过如果把“性感”当成是需求的代名词则更容易理解,这是指数据科学家仍然拥有旺盛的市场需求。根据全球IT职业介绍服务供应商Modis的统计,目前数据科学家仍然处于“需求高企但供应不足”的阶段,换言之与大数据相关的博士学位持有者年平均薪酬都能超过六位数。


七、颤抖吧,数据仓库:Hadoop就要将你取而代之了



0

数据仓库业界是否该为Hadoop的迅速崛起而感到担忧甚至恐慌?抑或是该向其敞开热情的怀抱?Cloudera公司的Doug Cutting与Hortonworks公司的Arun Murthy作为Hadoop领域的两位先驱者,在本届Hadoop 2014峰会的问答环节中提出了这样的问题。尽管很多企业开始将数据仓库中的工作负载迁移到Hadoop环境当中,但这种作法仍然没有成为主流。但未来情况是否会有变化?“如果相当比例的用户不再增加数据仓库的规模,反而由于发现了Hadoop类系统在处理效率与负担成本方面的优势而对数据仓库方案进行投资或者规模缩减处理,那我认为这确实应该算作一种威胁,”Cutting解释道。


八、对于隐私的忧虑不会阻碍大数据的前进步伐

0

对于隐私与安全漏洞的担忧与看似无穷无尽的问题解决道路不可能阻止大数据的发展进程。《经济学家》在今年六月的一篇报道中指出,“没有证据表明隐私问题会给数据的使用以及存储方式带来根本性转变。”Gartner公司分析师Carsten Casper在接受该杂志采访时表示,IT领域并没有酝酿一场“隐私大革命”。而且尽管企业用户始终在就隐私相关问题提出更多要求,但其中九成查询其实指向的都是本地数据中心,Casper补充称。

  

九、大数据推动软件市场快速增长

0

从2013年到2018年,全球软件市场的年度复合增长率将在6%上下浮动,研究企业IDC公司预测称。不过大数据相关门类,包括协作应用程序与数据访问、分析与交付解决方案以及结构化数据管理软件,将在未来五年内迎来更高的年度复合增长水平(约为9%),IDC指出。


对于社交媒体的进一步关注也将有助于这种增长趋势的持续。“社交媒体关注度与面向大数据及分析解决方案的需求增长可谓互相依托,二者将帮助企业理解并切实推进对于客户行为的预期以及与产品可靠性及维护相关的新思路,”IDC公司分析师Herny Morris在一份声明中表示。


十、几乎万事万物都将与网络相连

0

物联网将包含众多千奇百怪但又精妙非常的设备,其中很多对于大数据领域来说都是前所未见的新鲜事物。有鉴于此,ABI研究公司的分析师们预计到2020年,全球无线联网设备总量将超过300亿。其中医疗相关数据收集方案将在物联网时代下扮演重要角色。


下面我们来看一个独特的例子:微软与来自罗切斯特大学(纽约)以及南安普敦大学(英国)的研究人员们共同设计出一款智能纹胸,能够借助传感器检测穿着者的心跳与皮肤活性、从而计算出其压力水平,BBC报道称。这款纹胸能够收集数据并将其发送至智能手机端的应用程序,从而利用穿戴式技术掌握用户的压力水平,进而帮助其摆脱由压力引发的暴饮暴食、保持良好的饮食习惯。


【10 Powerful FactsAbout Big Data】

More than a buzzword
Big data, however you define it, has been praised and vilified. It's manythings to many people: a boon to scientists andretailers, but also an enabling technology for a host of privacy and security threats.


Whether savior or scam -- or maybe evena mixture of the two -- big data remains a popular topic among pundits,prognosticators, marketers, and security buffs. Its unofficial definition isevolving as well. So what is it? Wikipedia's description is a good start:"any collection of data sets so large and complex that it becomesdifficult to process using on-hand database management tools or traditionaldata processing applications."


But the challenges of managing massivevolumes of varied data sets arriving at high velocities -- the classic 3V's definition -- are changingas the number of data-sharing devices grows exponentially. This hardware,collectively known as the Internet of Things (IoT), includes machine sensorsand consumer-oriented devices such as connected thermostats, light bulbs,refrigerators, and wearable health monitors. IDC predicts the IoT market willsoar in the coming years -- from 9.1 billion installed units at the end of 2013to 28.1 billion by 2020.


Organizations see a potential boon inactionable insights derived from big data, not only to sell more widgets andservices, but also to better manage healthcare, stopthe flow of counterfeit drugs, track terrorists, andmaybe even track your phone calls.Hence it's a given that big data isn't inherently good or evil. It's how you use it thatcounts.


The irony of big data is that despiteits potential to enhance the human experience, it's often difficult to collect,filter, analyze, and interpret to gain those cherished insights. This slideshowexamines the challenges and capabilities of big data. The facts and figures maysurprise you. What to expect? Well, the future appears bright for Hadoop, theleading big data platform. And data scientists and related big data gurusshould be gainfully (and lucratively) employed for years to come.


Industry insiders have predicted the buzzterm "big data" will fade away. "It is all just data, after all.Big data and all the predictions for this space will collapse into 'datamanagement' by the analysts and all those following, including a lot of the'big' vendors," wrote Hortonworks president Herb Cunitz in a December 2012blog.


Cunitz may have prematurely predictedthe demise of "big data," but he's spot on: It's all just data. Onlythe tools needed to manage it will change. Now dig into our slideshow and get alook at some revealing statistics and research.


Jeff Bertolucci is a technology journalist in Los Angeles who writesmostly for Kiplinger's Personal Finance, The Saturday Evening Post, andInformationWeek.

How much datais ignored?
Most companies estimate they're analyzing a mere 12% of the data they have, according to a recent studyby Forrester Research. Is this good or bad? Well, these firms might be missingout on data-driven insights hidden inside the 88% of data they're ignoring. Orperhaps they're wisely avoiding a resource-gobbling, boil-the-ocean strategy. A lack of analytics tools and"repressive" data silos are two reasons companies ignore a vastmajority of their own data, says Forrester, as well as the simple fact thatoften it's hard to know which information is valuable and which is best leftignored.

Big data jobgrowth
The big data craze is a boon for tech workers with a particular set of skills.According to Dice, a career site for tech and engineering professionals, demandis soaring for data mavens. Job postings for NoSQL experts were up 54% yearover year, and those for "big data talent" rose 46%, the sitereported in April. Similarly, postings for Hadoop and Python pros were up 43%and 16%, respectively. Impressive stats, certainly, but small potatoes comparedwith job postings for cyber-security specialists, which soared 162%year-over-year.

How big willbig data get?
The digital universe will grow from 3.2 zettabytes today to 40 zettabytes inonly six years. (One zettabyte is roughly a billion terabytes.) "When welook at the data volumes coming at us, it's mind-blowing," saidHortonworks CEO Rob Bearden in his keynote address at Hadoop Summit 2014 in SanJose, Calif. "The data volume in the enterprise is going to grow 50xyear-over-year between now and 2020. I think the most important thing torecognize is that 85% of that data is coming from net-new data sources."And these sources, including mobile, social media, and web- andmachine-generated data, present both a challenge and an opportunity forenterprises globally, Bearden noted.

Big data = bigbucks
Big data jobs pay quite well. According to Salaries of Data Scientists, an April 2014 study fromBurtch Works, the 2014 mean base salary for a staff data scientist is $120,000,and $160,000 for a manager. The estimates are based on interviews with morethan 170 data scientists from a Burtch Works employment database. The pay scaleis almost as good for the broader category of big data professionals, meaningthose who "apply sophisticated quantitative skills to data-describingtransactions, interactions, or other behaviors of people to derive insights andprescribe actions." In this category the 2013 median base salary for staffis $90,000; for managers, it's a cool $145,000.

Are big datapros ready for the IoT?
Most IT pros say they haven't started preparing for the Internet of Things --even if they have. Spiceworkspolled 440 IT professionals in April 2014 to get their take on the IoT and howthey're preparing for it. Sixty-two percent of respondents were in NorthAmerica and 38% in EMEA (Europe, the Middle East, and Africa). More than half(59%) of respondents said they're not taking specific steps to address theexpected data deluge from sensors, cameras, and numerous other IoT devices.However, the survey also found that many IT pros are, in fact, preparing forthe IoT by investing in infrastructure, security, applications, and analytics,and by expanding bandwidth.

Datascientists: still sexy
The eye-grabbing headline of an October 2012 article in the Harvard BusinessReview called the data science profession the "Sexiest Job of the 21st Century." That's debatable,but if "sexy" is synonymous with "in demand," datascientists haven't lost any of their mojo. According to Modis, a global ITstaffing services provider, data scientists remain in "high demand butshort supply," which translates into generous six-figure salaries for some PhDs with relevantbig data experience.

Be afraid,data warehouse: Hadoop's in town
Should the data warehouse industry fear the rise of Hadoop? Embrace it? Thatquestion was posed to two Hadoop pioneers -- Doug Cutting of Cloudera and ArunMurthy of Hortonworks -- during a Q&A; at Hadoop Summit 2014. While manyenterprises are moving workloads from data warehouses to Hadoop, that's nothappening en masse. But will it? "If you've got a lot of people no longerincreasing the size of their data warehouse, but rather capping the size orpotentially even decreasing their investment because they find they can do muchof the processing as effectively and much more affordably in a Hadoop-basedsystem, I think that's a threat," said Cutting.

Privacy fearswon't stop big data
The cacophony of concerns rising from a seemingly endless series of privacy andsecurity breaches isn't likely to thwart big data's advancement. The Economistreports in its June 2014 issue that "there is scant evidence that concernabout privacy is causing a fundamental change in the way data are used andstored." Gartner analyst Carsten Casper tells the magazine that no"big privacy revolution" is brewing in the IT world. And whilecompanies are asking more privacy-related questions, nine of 10 of thosequeries have to do with the location of data centers, Casper adds.

Big data drives softwaregrowth
The compound annual growth rate (CAGR) for the 2013-2018 worldwide softwaremarket will hover near 6%, research firm IDC predicts. But big data relatedcategories, including collaborative applications and data access, analysis anddelivery solutions, and structured data management software, will show a higherCAGR (around 9%) over that five-year period, says IDC.


A heightened interest in socialmedia will help drive this growth. "This is complementary to the increasedattention to big data and analytics solutions, which help enterprisesunderstand and act on anticipated customer behavior and new insights intoproduct reliability and maintenance," said IDC analyst Henry Morris in astatement.

Almost everything will beconnected
The Internet of Things will include many strange and wondrous devices, many ofwhich are new to the world of big data. That's why analysts at ABI Researchpredict more than 30 billion devices will be wirelessly connected by 2020.Health-related data collection will play a large role in the IoT, of course.


Here's a unique example:Microsoft, in conjunction with researchers from the University of Rochester(New York) and University of Southampton (UK), have designed a brawith sensors that detects the wearer's stress level by monitoring heart andskin activity, the BBC reported. Designed to see if wearable tech can helpcontrol stress-related overeating, the bra collects and sends data to asmartphone app to help the user control eating habits.


原文发布时间为:2014-07-08

本文来自云栖社区合作伙伴“大数据文摘”,了解相关信息可以关注“BigDataDigest”微信公众号

这篇关于关于大数据的十个有力事实的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/171711

相关文章

大模型研发全揭秘:客服工单数据标注的完整攻略

在人工智能(AI)领域,数据标注是模型训练过程中至关重要的一步。无论你是新手还是有经验的从业者,掌握数据标注的技术细节和常见问题的解决方案都能为你的AI项目增添不少价值。在电信运营商的客服系统中,工单数据是客户问题和解决方案的重要记录。通过对这些工单数据进行有效标注,不仅能够帮助提升客服自动化系统的智能化水平,还能优化客户服务流程,提高客户满意度。本文将详细介绍如何在电信运营商客服工单的背景下进行

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

关于数据埋点,你需要了解这些基本知识

产品汪每天都在和数据打交道,你知道数据来自哪里吗? 移动app端内的用户行为数据大多来自埋点,了解一些埋点知识,能和数据分析师、技术侃大山,参与到前期的数据采集,更重要是让最终的埋点数据能为我所用,否则可怜巴巴等上几个月是常有的事。   埋点类型 根据埋点方式,可以区分为: 手动埋点半自动埋点全自动埋点 秉承“任何事物都有两面性”的道理:自动程度高的,能解决通用统计,便于统一化管理,但个性化定

使用SecondaryNameNode恢复NameNode的数据

1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

异构存储(冷热数据分离)

异构存储主要解决不同的数据,存储在不同类型的硬盘中,达到最佳性能的问题。 异构存储Shell操作 (1)查看当前有哪些存储策略可以用 [lytfly@hadoop102 hadoop-3.1.4]$ hdfs storagepolicies -listPolicies (2)为指定路径(数据存储目录)设置指定的存储策略 hdfs storagepolicies -setStoragePo

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

【Prometheus】PromQL向量匹配实现不同标签的向量数据进行运算

✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,阿里云开发者社区专家博主,CSDN全栈领域优质创作者,掘金优秀博主,51CTO博客专家等。 🏆《博客》:Python全栈,前后端开发,小程序开发,人工智能,js逆向,App逆向,网络系统安全,数据分析,Django,fastapi

烟火目标检测数据集 7800张 烟火检测 带标注 voc yolo

一个包含7800张带标注图像的数据集,专门用于烟火目标检测,是一个非常有价值的资源,尤其对于那些致力于公共安全、事件管理和烟花表演监控等领域的人士而言。下面是对此数据集的一个详细介绍: 数据集名称:烟火目标检测数据集 数据集规模: 图片数量:7800张类别:主要包含烟火类目标,可能还包括其他相关类别,如烟火发射装置、背景等。格式:图像文件通常为JPEG或PNG格式;标注文件可能为X

pandas数据过滤

Pandas 数据过滤方法 Pandas 提供了多种方法来过滤数据,可以根据不同的条件进行筛选。以下是一些常见的 Pandas 数据过滤方法,结合实例进行讲解,希望能帮你快速理解。 1. 基于条件筛选行 可以使用布尔索引来根据条件过滤行。 import pandas as pd# 创建示例数据data = {'Name': ['Alice', 'Bob', 'Charlie', 'Dav

SWAP作物生长模型安装教程、数据制备、敏感性分析、气候变化影响、R模型敏感性分析与贝叶斯优化、Fortran源代码分析、气候数据降尺度与变化影响分析

查看原文>>>全流程SWAP农业模型数据制备、敏感性分析及气候变化影响实践技术应用 SWAP模型是由荷兰瓦赫宁根大学开发的先进农作物模型,它综合考虑了土壤-水分-大气以及植被间的相互作用;是一种描述作物生长过程的一种机理性作物生长模型。它不但运用Richard方程,使其能够精确的模拟土壤中水分的运动,而且耦合了WOFOST作物模型使作物的生长描述更为科学。 本文让更多的科研人员和农业工作者