【6.824】分布式lab1 mapReduce

2023-11-11 12:01
文章标签 分布式 mapreduce 6.824 lab1

本文主要是介绍【6.824】分布式lab1 mapReduce,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Paper main idea

请添加图片描述

Figure 1 shows the overall flow of a MapReduce operation in our implementation. When the user program calls the MapReduce function, the following sequence of actions occurs (the numbered labels in Figure 1 correspond to the numbers in the list below): 1. The MapReduce library in the user program first splits the input files into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece (controllable by the user via an optional parameter). It then starts up many copies of the program on a cluster of machines. 2. One of the copies of the program is special – the master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task. 3. A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory4. Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers. 5. When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers. When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit in memory, an external sort is used. 6. The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user's Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition7. When all map tasks and reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user program returns back to the user code.

some ideas

Task Synchronous:

all map task should be earlier than reduce task!

some map task may wait seconds, so should first all map tasks done before reduce.

Lock:

workers run parallel, should add mutex to filelist when

Error Return:

workers may run some error, such as rpc sock connect and can not open files. When these errors happend, should return Error to master, other than just stop itself.

Heartbeat:

master should ask workers whether they are alive.

Implementation method:

  1. Set a new scheduled task and ask every 2 minutes.
  2. Every time a new worker requests a new task, add a judgment when rotating all tasks to assign tasks to the worker. If a task status is not Finish and times out, the task fails
  3. Write in the Done() method and continuously rotate to ensure that all tasks are completed. And take the opportunity to determine if there is a task timeout.

Performance comparison: 1>2>3

Judging by frequency. The execution frequency of 3 is too high, constantly executing and discussing the sequence.

status of task and worker

worker:

  • Finish: worker finish task X
  • Ready: worker ask for a new task
  • Wrong: worker meet some problems
  • Close: worker will be closed when master ask worker to close

task:

  • Run: some worker is running on this task
  • Done: this task is finished
  • (empty string): task not be allocated yet

lab test:

pass all the test!请添加图片描述

这篇关于【6.824】分布式lab1 mapReduce的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/389925

相关文章

集中式版本控制与分布式版本控制——Git 学习笔记01

什么是版本控制 如果你用 Microsoft Word 写过东西,那你八成会有这样的经历: 想删除一段文字,又怕将来这段文字有用,怎么办呢?有一个办法,先把当前文件“另存为”一个文件,然后继续改,改到某个程度,再“另存为”一个文件。就这样改着、存着……最后你的 Word 文档变成了这样: 过了几天,你想找回被删除的文字,但是已经记不清保存在哪个文件了,只能挨个去找。真麻烦,眼睛都花了。看

开源分布式数据库中间件

转自:https://www.csdn.net/article/2015-07-16/2825228 MyCat:开源分布式数据库中间件 为什么需要MyCat? 虽然云计算时代,传统数据库存在着先天性的弊端,但是NoSQL数据库又无法将其替代。如果传统数据易于扩展,可切分,就可以避免单机(单库)的性能缺陷。 MyCat的目标就是:低成本地将现有的单机数据库和应用平滑迁移到“云”端

laravel框架实现redis分布式集群原理

在app/config/database.php中配置如下: 'redis' => array('cluster' => true,'default' => array('host' => '172.21.107.247','port' => 6379,),'redis1' => array('host' => '172.21.107.248','port' => 6379,),) 其中cl

基于MySQL实现的分布式锁

概述 在单机时代,虽然不需要分布式锁,但也面临过类似的问题,只不过在单机的情况下,如果有多个线程要同时访问某个共享资源的时候,我们可以采用线程间加锁的机制,即当某个线程获取到这个资源后,就立即对这个资源进行加锁,当使用完资源之后,再解锁,其它线程就可以接着使用了。例如,在JAVA中,甚至专门提供了一些处理锁机制的一些API(synchronize/Lock等)。 但是到了分布式系统的时代,这种

Kafka 分布式消息系统详细介绍

Kafka 分布式消息系统 一、Kafka 概述1.1 Kafka 定义1.2 Kafka 设计目标1.3 Kafka 特点 二、Kafka 架构设计2.1 基本架构2.2 Topic 和 Partition2.3 消费者和消费者组2.4 Replica 副本 三、Kafka 分布式集群搭建3.1 下载解压3.1.1 上传解压 3.2 修改 Kafka 配置文件3.2.1 修改zookeep

Spring Cloud整合Seata实现分布式事务

文章目录 1.Seata1.1 官网1.2 下载1.3 通过安装包运行seata1.3.1 解压seata-server-1.3.0.zip1.3.2 修改 conf/file.conf 配置文件1.3.3 修改conf/registry.conf配置文件1.3.4 添加seata配置信息到nacos1.3.5 配置seata服务端数据库表结构1.3.6 启动seata 2.Spring

ELK+Spring Cloud搭建分布式日志中心

ELK+Spring Cloud搭建分布式日志中心 1.ELK简介2.资源包下载3.Elasticsearch安装3.1 解压Elasticsearch3.2 修改Elasticsearch的配置文件3.3 修改系统配置3.4 启动Elasticsearch 4.ElasticSearch-head插件安装5.Logstash安装6.Kibana安装7.SpringCloud集成logsta

Redis进阶(七):分布式锁

在分布式系统下,涉及到多个节点访问同一个公共资源的情况,此时需要通过 锁 进行互斥控制:避免出现 线程安全问题。 1.分布式锁的基本实现 超卖问题: 解决: 采用redis实现分布式锁 可用采取:在购票的时候,操作过程中需要先加锁。在redis上设置一个key - value,完成上述买票操作,再把key - value 删掉。如果发现key - value 存在,就加锁失败,无法进

聊聊分布式,再讨论分布式解决方案

前言 最近很久没有写博客了,一方面是因为公司事情最近比较忙,另外一方面是因为在进行 CAP 的下一阶段的开发工作,不过目前已经告一段落了。 接下来还是开始我们今天的话题,说说分布式事务,或者说是我眼中的分布式事务,因为每个人可能对其的理解都不一样。 分布式事务是企业集成中的一个技术难点,也是每一个分布式系统架构中都会涉及到的一个东西,特别是在微服务架构中,几乎可以说是无法避免,本文就分布式事

分布式 事务的几种实现方案

背景 四月初,去面试了本市的一家之前在做办公室无人货架的公司,虽然他们现在在面临着转型,但是对于我这种想从传统企业往互联网行业走的孩子来说,还是比较有吸引力的。 在面试过程中就提到了分布式事务问题。我又一次在没有好好整理的问题上吃了亏,记录一下,还是长记性 !!! 先看面试过程 面试官先是在纸上先画了这样一张图: 让我看这张图按照上面的流程走,有没有什么问题?面试官并没有直接说出来这里面