聊聊PowerJob Server的高可用

2024-02-08 15:04
文章标签 聊聊 可用 server powerjob

本文主要是介绍聊聊PowerJob Server的高可用,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

本文主要研究一下PowerJob Server的高可用

PowerJobSpringWorker

tech/powerjob/worker/PowerJobSpringWorker.java

public class PowerJobSpringWorker implements ApplicationContextAware, InitializingBean, DisposableBean {/*** 组合优于继承,持有 PowerJobWorker,内部重新设置 ProcessorFactory 更优雅*/private PowerJobWorker powerJobWorker;private final PowerJobWorkerConfig config;public PowerJobSpringWorker(PowerJobWorkerConfig config) {this.config = config;}@Overridepublic void afterPropertiesSet() throws Exception {powerJobWorker = new PowerJobWorker(config);powerJobWorker.init();}@Overridepublic void setApplicationContext(ApplicationContext applicationContext) throws BeansException {BuiltInSpringProcessorFactory springProcessorFactory = new BuiltInSpringProcessorFactory(applicationContext);BuildInSpringMethodProcessorFactory springMethodProcessorFactory = new BuildInSpringMethodProcessorFactory(applicationContext);// append BuiltInSpringProcessorFactoryList<ProcessorFactory> processorFactories = Lists.newArrayList(Optional.ofNullable(config.getProcessorFactoryList()).orElse(Collections.emptyList()));processorFactories.add(springProcessorFactory);processorFactories.add(springMethodProcessorFactory);config.setProcessorFactoryList(processorFactories);}@Overridepublic void destroy() throws Exception {powerJobWorker.destroy();}
}

PowerJobSpringWorker实现了InitializingBean接口,其afterPropertiesSet会创建powerJobWorker,然后执行其init方法

PowerJobWorker.init

tech/powerjob/worker/PowerJobWorker.java

    public void init() throws Exception {if (!initialized.compareAndSet(false, true)) {log.warn("[PowerJobWorker] please do not repeat the initialization");return;}Stopwatch stopwatch = Stopwatch.createStarted();log.info("[PowerJobWorker] start to initialize PowerJobWorker...");PowerJobWorkerConfig config = workerRuntime.getWorkerConfig();CommonUtils.requireNonNull(config, "can't find PowerJobWorkerConfig, please set PowerJobWorkerConfig first");ServerDiscoveryService serverDiscoveryService = new PowerJobServerDiscoveryService(config);workerRuntime.setServerDiscoveryService(serverDiscoveryService);try {PowerBannerPrinter.print();// 校验 appNameWorkerAppInfo appInfo = serverDiscoveryService.assertApp();workerRuntime.setAppInfo(appInfo);// 初始化网络数据,区别对待上报地址和本机绑定地址(对外统一使用上报地址)String localBindIp = NetUtils.getLocalHost();int localBindPort = config.getPort();String externalIp = PropertyUtils.readProperty(PowerJobDKey.NT_EXTERNAL_ADDRESS, localBindIp);String externalPort = PropertyUtils.readProperty(PowerJobDKey.NT_EXTERNAL_PORT, String.valueOf(localBindPort));log.info("[PowerJobWorker] [ADDRESS_INFO] localBindIp: {}, localBindPort: {}; externalIp: {}, externalPort: {}", localBindIp, localBindPort, externalIp, externalPort);workerRuntime.setWorkerAddress(Address.toFullAddress(externalIp, Integer.parseInt(externalPort)));// 初始化 线程池final ExecutorManager executorManager = new ExecutorManager(workerRuntime.getWorkerConfig());workerRuntime.setExecutorManager(executorManager);// 初始化 ProcessorLoaderProcessorLoader processorLoader = buildProcessorLoader(workerRuntime);workerRuntime.setProcessorLoader(processorLoader);// 初始化 actorTaskTrackerActor taskTrackerActor = new TaskTrackerActor(workerRuntime);ProcessorTrackerActor processorTrackerActor = new ProcessorTrackerActor(workerRuntime);WorkerActor workerActor = new WorkerActor(workerRuntime, taskTrackerActor);// 初始化通讯引擎EngineConfig engineConfig = new EngineConfig().setType(config.getProtocol().name()).setServerType(ServerType.WORKER).setBindAddress(new Address().setHost(localBindIp).setPort(localBindPort)).setActorList(Lists.newArrayList(taskTrackerActor, processorTrackerActor, workerActor));EngineOutput engineOutput = remoteEngine.start(engineConfig);workerRuntime.setTransporter(engineOutput.getTransporter());// 连接 serverserverDiscoveryService.timingCheck(workerRuntime.getExecutorManager().getCoreExecutor());log.info("[PowerJobWorker] PowerJobRemoteEngine initialized successfully.");// 初始化日志系统OmsLogHandler omsLogHandler = new OmsLogHandler(workerRuntime.getWorkerAddress(), workerRuntime.getTransporter(), serverDiscoveryService);workerRuntime.setOmsLogHandler(omsLogHandler);// 初始化存储TaskPersistenceService taskPersistenceService = new TaskPersistenceService(workerRuntime.getWorkerConfig().getStoreStrategy());taskPersistenceService.init();workerRuntime.setTaskPersistenceService(taskPersistenceService);log.info("[PowerJobWorker] local storage initialized successfully.");// 初始化定时任务workerRuntime.getExecutorManager().getCoreExecutor().scheduleAtFixedRate(new WorkerHealthReporter(workerRuntime), 0, config.getHealthReportInterval(), TimeUnit.SECONDS);workerRuntime.getExecutorManager().getCoreExecutor().scheduleWithFixedDelay(omsLogHandler.logSubmitter, 0, 5, TimeUnit.SECONDS);log.info("[PowerJobWorker] PowerJobWorker initialized successfully, using time: {}, congratulations!", stopwatch);}catch (Exception e) {log.error("[PowerJobWorker] initialize PowerJobWorker failed, using {}.", stopwatch, e);throw e;}}

PowerJobWorker的init方法会执行serverDiscoveryService.timingCheck(workerRuntime.getExecutorManager().getCoreExecutor())调度timingCheck

timingCheck

tech/powerjob/worker/background/discovery/PowerJobServerDiscoveryService.java

    public void timingCheck(ScheduledExecutorService timingPool) {this.currentServerAddress = discovery();if (StringUtils.isEmpty(this.currentServerAddress) && !config.isAllowLazyConnectServer()) {throw new PowerJobException("can't find any available server, this worker has been quarantined.");}// 这里必须保证成功timingPool.scheduleAtFixedRate(() -> {try {this.currentServerAddress = discovery();} catch (Exception e) {log.error("[PowerDiscovery] fail to discovery server!", e);}}, 10, 10, TimeUnit.SECONDS);}

PowerJobServerDiscoveryService的timingCheck会使用timingPool定时每隔10s调度执行discovery()来更新当前worker的server地址

discovery

    private String discovery() {// 只有允许延迟加载模式下,appId 才可能为空。每次服务发现前,都重新尝试获取 appInfo。由于是懒加载链路,此处完全忽略异常if (appInfo.getAppId() == null || appInfo.getAppId() < 0) {try {assertApp0();} catch (Exception e) {log.warn("[PowerDiscovery] assertAppName in discovery stage failed, msg: {}", e.getMessage());return null;}}if (ip2Address.isEmpty()) {config.getServerAddress().forEach(x -> ip2Address.put(x.split(":")[0], x));}String result = null;// 先对当前机器发起请求String currentServer = currentServerAddress;if (!StringUtils.isEmpty(currentServer)) {String ip = currentServer.split(":")[0];// 直接请求当前Server的HTTP服务,可以少一次网络开销,减轻Server负担String firstServerAddress = ip2Address.get(ip);if (firstServerAddress != null) {result = acquire(firstServerAddress);}}for (String httpServerAddress : config.getServerAddress()) {if (StringUtils.isEmpty(result)) {result = acquire(httpServerAddress);}else {break;}}if (StringUtils.isEmpty(result)) {log.warn("[PowerDiscovery] can't find any available server, this worker has been quarantined.");// 在 Server 高可用的前提下,连续失败多次,说明该节点与外界失联,Server已经将秒级任务转移到其他Worker,需要杀死本地的任务if (FAILED_COUNT++ > MAX_FAILED_COUNT) {log.warn("[PowerDiscovery] can't find any available server for 3 consecutive times, It's time to kill all frequent job in this worker.");List<Long> frequentInstanceIds = HeavyTaskTrackerManager.getAllFrequentTaskTrackerKeys();if (!CollectionUtils.isEmpty(frequentInstanceIds)) {frequentInstanceIds.forEach(instanceId -> {HeavyTaskTracker taskTracker = HeavyTaskTrackerManager.removeTaskTracker(instanceId);taskTracker.destroy();log.warn("[PowerDiscovery] kill frequent instance(instanceId={}) due to can't find any available server.", instanceId);});}FAILED_COUNT = 0;}return null;} else {// 重置失败次数FAILED_COUNT = 0;log.debug("[PowerDiscovery] current server is {}.", result);return result;}}

discovery方法就是定时遍历配置的serverAddress地址列表,调用server端的acquire方法来获取可用的server

acquireServer

tech/powerjob/server/web/controller/ServerController.java

    @GetMapping("/acquire")public ResultDTO<String> acquireServer(ServerDiscoveryRequest request) {return ResultDTO.success(serverElectionService.elect(request));}

ServerController提供了acquire接口,它执行的是serverElectionService.elect(request)

elect

tech/powerjob/server/remote/server/election/ServerElectionService.java

    public String elect(ServerDiscoveryRequest request) {if (!accurate()) {final String currentServer = request.getCurrentServer();// 如果是本机,就不需要查数据库那么复杂的操作了,直接返回成功Optional<ProtocolInfo> localProtocolInfoOpt = Optional.ofNullable(transportService.allProtocols().get(request.getProtocol()));if (localProtocolInfoOpt.isPresent()) {if (localProtocolInfoOpt.get().getExternalAddress().equals(currentServer) || localProtocolInfoOpt.get().getAddress().equals(currentServer)) {log.info("[ServerElection] this server[{}] is worker[appId={}]'s current server, skip check", currentServer, request.getAppId());return currentServer;}}}return getServer0(request);}

ServerElectionService的elect方法主要是执行getServer0

getServer0

    private String getServer0(ServerDiscoveryRequest discoveryRequest) {final Long appId = discoveryRequest.getAppId();final String protocol = discoveryRequest.getProtocol();Set<String> downServerCache = Sets.newHashSet();for (int i = 0; i < RETRY_TIMES; i++) {// 无锁获取当前数据库中的ServerOptional<AppInfoDO> appInfoOpt = appInfoRepository.findById(appId);if (!appInfoOpt.isPresent()) {throw new PowerJobException(appId + " is not registered!");}String appName = appInfoOpt.get().getAppName();String originServer = appInfoOpt.get().getCurrentServer();String activeAddress = activeAddress(originServer, downServerCache, protocol);if (StringUtils.isNotEmpty(activeAddress)) {return activeAddress;}// 无可用Server,重新进行Server选举,需要加锁String lockName = String.format(SERVER_ELECT_LOCK, appId);boolean lockStatus = lockService.tryLock(lockName, 30000);if (!lockStatus) {try {Thread.sleep(500);}catch (Exception ignore) {}continue;}try {// 可能上一台机器已经完成了Server选举,需要再次判断AppInfoDO appInfo = appInfoRepository.findById(appId).orElseThrow(() -> new RuntimeException("impossible, unless we just lost our database."));String address = activeAddress(appInfo.getCurrentServer(), downServerCache, protocol);if (StringUtils.isNotEmpty(address)) {return address;}// 篡位,如果本机存在协议,则作为Server调度该 workerfinal ProtocolInfo targetProtocolInfo = transportService.allProtocols().get(protocol);if (targetProtocolInfo != null) {// 注意,写入 AppInfoDO#currentServer 的永远是 default 的绑定地址,仅在返回的时候特殊处理为协议地址appInfo.setCurrentServer(transportService.defaultProtocol().getAddress());appInfo.setGmtModified(new Date());appInfoRepository.saveAndFlush(appInfo);log.info("[ServerElection] this server({}) become the new server for app(appId={}).", appInfo.getCurrentServer(), appId);return targetProtocolInfo.getExternalAddress();}}catch (Exception e) {log.error("[ServerElection] write new server to db failed for app {}.", appName, e);} finally {lockService.unlock(lockName);}}throw new PowerJobException("server elect failed for app " + appId);}

getServer0方法会重试10次,它先针对discoveryRequest指定的currentServer进行activeAddress,成功则返回,没有可用server则加锁进行重新分配,这里优先本机判断

activeAddress

    private String activeAddress(String serverAddress, Set<String> downServerCache, String protocol) {if (downServerCache.contains(serverAddress)) {return null;}if (StringUtils.isEmpty(serverAddress)) {return null;}Ping ping = new Ping();ping.setCurrentTime(System.currentTimeMillis());URL targetUrl = ServerURLFactory.ping2Friend(serverAddress);try {AskResponse response = transportService.ask(Protocol.HTTP.name(), targetUrl, ping, AskResponse.class).toCompletableFuture().get(PING_TIMEOUT_MS, TimeUnit.MILLISECONDS);if (response.isSuccess()) {// 检测通过的是远程 server 的暴露地址,需要返回 worker 需要的协议地址final JSONObject protocolInfo = JsonUtils.parseObject(response.getData(), JSONObject.class).getJSONObject(protocol);if (protocolInfo != null) {downServerCache.remove(serverAddress);ProtocolInfo remoteProtocol = protocolInfo.toJavaObject(ProtocolInfo.class);log.info("[ServerElection] server[{}] is active, it will be the master, final protocol={}", serverAddress, remoteProtocol);// 4.3.3 升级 4.3.4 过程中,未升级的 server 还不存在 externalAddress,需要使用 address 兼容return Optional.ofNullable(remoteProtocol.getExternalAddress()).orElse(remoteProtocol.getAddress());} else {log.warn("[ServerElection] server[{}] is active but don't have target protocol", serverAddress);}}} catch (TimeoutException te) {log.warn("[ServerElection] server[{}] was down due to ping timeout!", serverAddress);} catch (Exception e) {log.warn("[ServerElection] server[{}] was down with unknown case!", serverAddress, e);}downServerCache.add(serverAddress);return null;}

activeAddress方法主要是对目标server发起ping请求,超时时间为1s,若目标server挂了,则抛出TimeoutException,将目标server加入到downServerCache中;若目标server响应成功,则从downServerCache中移除

小结

PowerJob的worker在初始化的时候会启动一个定时任务,每隔10s调度执行discovery()来更新当前worker的server地址;discovery方法就是定时遍历配置的serverAddress地址列表,调用server端的acquire方法来获取可用的server;ServerController提供了acquire接口,它执行的是serverElectionService.elect(request),ServerElectionService的elect方法主要是执行getServer0,getServer0方法会重试10次,它先针对discoveryRequest指定的currentServer进行activeAddress,成功则返回,没有可用server则加锁进行重新分配,这里优先本机判断。activeAddress方法主要是对目标server发起ping请求,超时时间为1s,若目标server挂了,则抛出TimeoutException,将目标server加入到downServerCache中;若目标server响应成功,则从downServerCache中移除。

worker定时任务 --> 轮询serverAddress请求acquire --> server端判断目标server的ping是否成功,不成功则加锁优先使用本机作为替代server。

这篇关于聊聊PowerJob Server的高可用的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/691383

相关文章

MySQL双主搭建+keepalived高可用的实现

《MySQL双主搭建+keepalived高可用的实现》本文主要介绍了MySQL双主搭建+keepalived高可用的实现,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,... 目录一、测试环境准备二、主从搭建1.创建复制用户2.创建复制关系3.开启复制,确认复制是否成功4.同

mysql出现ERROR 2003 (HY000): Can‘t connect to MySQL server on ‘localhost‘ (10061)的解决方法

《mysql出现ERROR2003(HY000):Can‘tconnecttoMySQLserveron‘localhost‘(10061)的解决方法》本文主要介绍了mysql出现... 目录前言:第一步:第二步:第三步:总结:前言:当你想通过命令窗口想打开mysql时候发现提http://www.cpp

SQL Server清除日志文件ERRORLOG和删除tempdb.mdf

《SQLServer清除日志文件ERRORLOG和删除tempdb.mdf》数据库再使用一段时间后,日志文件会增大,特别是在磁盘容量不足的情况下,更是需要缩减,以下为缩减方法:如果可以停止SQLSe... 目录缩减 ERRORLOG 文件(停止服务后)停止 SQL Server 服务:找到错误日志文件:删除

Windows Server服务器上配置FileZilla后,FTP连接不上?

《WindowsServer服务器上配置FileZilla后,FTP连接不上?》WindowsServer服务器上配置FileZilla后,FTP连接错误和操作超时的问题,应该如何解决?首先,通过... 目录在Windohttp://www.chinasem.cnws防火墙开启的情况下,遇到的错误如下:无法与

一文详解SQL Server如何跟踪自动统计信息更新

《一文详解SQLServer如何跟踪自动统计信息更新》SQLServer数据库中,我们都清楚统计信息对于优化器来说非常重要,所以本文就来和大家简单聊一聊SQLServer如何跟踪自动统计信息更新吧... SQL Server数据库中,我们都清楚统计信息对于优化器来说非常重要。一般情况下,我们会开启"自动更新

JAVA虚拟机中 -D, -X, -XX ,-server参数使用

《JAVA虚拟机中-D,-X,-XX,-server参数使用》本文主要介绍了JAVA虚拟机中-D,-X,-XX,-server参数使用,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有... 目录一、-D参数二、-X参数三、-XX参数总结:在Java开发过程中,对Java虚拟机(JVM)的启动参数进

Windows server服务器使用blat命令行发送邮件

《Windowsserver服务器使用blat命令行发送邮件》在linux平台的命令行下可以使用mail命令来发送邮件,windows平台没有内置的命令,但可以使用开源的blat,其官方主页为ht... 目录下载blatBAT命令行示例备注总结在linux平台的命令行下可以使用mail命令来发送邮件,Win

MySQL 中的服务器配置和状态详解(MySQL Server Configuration and Status)

《MySQL中的服务器配置和状态详解(MySQLServerConfigurationandStatus)》MySQL服务器配置和状态设置包括服务器选项、系统变量和状态变量三个方面,可以通过... 目录mysql 之服务器配置和状态1 MySQL 架构和性能优化1.1 服务器配置和状态1.1.1 服务器选项

查询SQL Server数据库服务器IP地址的多种有效方法

《查询SQLServer数据库服务器IP地址的多种有效方法》作为数据库管理员或开发人员,了解如何查询SQLServer数据库服务器的IP地址是一项重要技能,本文将介绍几种简单而有效的方法,帮助你轻松... 目录使用T-SQL查询方法1:使用系统函数方法2:使用系统视图使用SQL Server Configu

SQL Server数据库迁移到MySQL的完整指南

《SQLServer数据库迁移到MySQL的完整指南》在企业应用开发中,数据库迁移是一个常见的需求,随着业务的发展,企业可能会从SQLServer转向MySQL,原因可能是成本、性能、跨平台兼容性等... 目录一、迁移前的准备工作1.1 确定迁移范围1.2 评估兼容性1.3 备份数据二、迁移工具的选择2.1