本文主要是介绍IT学习笔记--日志收集系统EFK,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
前言
EFK可能都不熟悉,实际上EFK是大名鼎鼎的日志系统ELK的一个变种。
在没有分布式日志的时候,每次出问题了需要查询日志的时候,需要登录到Linux服务器,使用命令cat -n xxxx|grep xxxx 搜索出日志在哪一行,然后cat -n xxx|tail -n +n行|head -n 显示多少行,这样不仅效率低下,而且对于程序异常也不方便查询,日志少还好,一旦整合出来的日志达到几个G或者几十G的时候,仅仅是搜索都会搜索很长时间了,当然如果知道是哪天什么时候发生的问题当然也方便查询,但是实际上很多时候有问题的时候,是不知道到底什么时候出的问题,所以就必须要在聚合日志中去搜索(一般日志是按照天来分文件的,聚合日志就是把很多天的日志合并在一起,这样方便查询),而搭建EFK日志分析系统的目的就是将日志聚合起来,达到快速查看快速分析的目的,使用EFK不仅可以快速的聚合出每天的日志,还能将不同项目的日志聚合起来,对于微服务和分布式架构来说,查询日志尤为方便,而且因为日志保存在Elasticsearch中,所以查询速度非常之快。
我认为,日志数据在以下几方面具有非常重要的作用:
- 数据查找:通过检索日志信息,定位相应的 bug ,找出解决方案
- 服务诊断:通过对日志信息进行统计、分析,了解服务器的负荷和服务运行状态
- 数据分析:可以做进一步的数据分析,比如根据请求中的课程 id ,找出 TOP10 用户感兴趣课程。
认识EFK
EFK不是一个软件,而是一套解决方案,并且都是开源软件,之间互相配合使用,完美衔接,高效的满足了很多场合的应用,是目前主流的一种日志系统。EFK是三个开源软件的缩写,分别表示:Elasticsearch , FileBeat, Kibana , 其中ELasticsearch负责日志保存和搜索,FileBeat负责收集日志,Kibana 负责界面,当然EFK和大名鼎鼎的ELK只有一个区别,那就是EFK把ELK的Logstash替换成了FileBeat,因为Filebeat相对于Logstash来说有2个好处:
1、侵入低,无需修改程序目前任何代码和配置
2、相对于Logstash来说性能高,Logstash对于IO占用很大
Filebeat 是基于 logstash-forwarder 的源码改造而成,用 Go语言编写,无需依赖 Java 环境,效率高,占用内存和 CPU 比较少,非常适合作为 Agent 跑在服务器上。当然,FileBeat也并不是完全好过Logstash,毕竟Logstash对于日志的格式化这些相对FileBeat好很多,FileBeat只是将日志从日志文件中读取出来,当然如果你日志本身是有一定格式的,FileBeat也可以格式化,但是相对于Logstash来说,还是差一点。
Elasticsearch
Elasticsearch是个开源分布式搜索引擎,提供搜集、分析、存储数据三大功能。它的特点有:分布式,零配置,自动发现,索引自动分片,索引副本机制,restful风格接口,多数据源,自动搜索负载等。
具有高可伸缩、高可靠、易管理等特点。可以用于全文检索、结构化检索和分析,并能将这三者结合起来。Elasticsearch 基于 Lucene 开发,现在使用最广的开源搜索引擎之一,Wikipedia 、StackOverflow、Github 等都基于它来构建自己的搜索引擎。
FileBeat
轻量级数据收集引擎。基于原先 Logstash-fowarder 的源码改造出来。换句话说:Filebeat就是新版的 Logstash-fowarder,也会是 ELK Stack 在 shipper 端的第一选择。
Filebeat隶属于Beats。目前Beats包含六种工具:
Packetbeat(搜集网络流量数据)
Metricbeat(搜集系统、进程和文件系统级别的 CPU 和内存使用情况等数据)
Filebeat(搜集文件数据)
Winlogbeat(搜集 Windows 事件日志数据)
Auditbeat( 轻量型审计日志采集器)
Heartbeat(轻量级服务器健康采集器)
Kibana
可视化化平台。它能够搜索、展示存储在 Elasticsearch 中索引数据。使用它可以很方便的用图表、表格、地图展示和分析数据。
可以为 Logstash 、Beats和 ElasticSearch 提供的日志分析友好的 Web 界面,可以帮助汇总、分析和搜索重要数据日志。
EFK架构图
常用日志收集架构及使用场景
1 最简单架构
在这种架构中,只有一个 Logstash、Elasticsearch 和 Kibana 实例。Logstash 通过输入插件从多种数据源(比如日志文件、标准输入 Stdin 等)获取数据,再经过滤插件加工数据,然后经 Elasticsearch 输出插件输出到 Elasticsearch,通过 Kibana 展示。详见图 1。
图 1. 最简单架构
这种架构非常简单,使用场景也有限。初学者可以搭建这个架构,了解 ELK 如何工作。
2 Logstash 作为日志搜集器
这种架构是对上面架构的扩展,把一个 Logstash 数据搜集节点扩展到多个,分布于多台机器,将解析好的数据发送到 Elasticsearch server 进行存储,最后在 Kibana 查询、生成日志报表等。详见图 2。
图 2. Logstash 作为日志搜索器
这种结构因为需要在各个服务器上部署 Logstash,而它比较消耗 CPU 和内存资源,所以比较适合计算资源丰富的服务器,否则容易造成服务器性能下降,甚至可能导致无法正常工作。
3 Beats 作为日志搜集器
这种架构引入 Beats 作为日志搜集器。目前 Beats 包括四种:
Packetbeat(搜集网络流量数据);
Topbeat(搜集系统、进程和文件系统级别的 CPU 和内存使用情况等数据);
Filebeat(搜集文件数据);
Winlogbeat(搜集 Windows 事件日志数据)。
Beats 将搜集到的数据发送到 Logstash,经 Logstash 解析、过滤后,将其发送到 Elasticsearch 存储,并由 Kibana 呈现给用户。详见图 3。
图 3. Beats 作为日志搜集器
这种架构解决了 Logstash 在各服务器节点上占用系统资源高的问题。相比 Logstash,Beats 所占系统的 CPU 和内存几乎可以忽略不计。另外,Beats 和 Logstash 之间支持 SSL/TLS 加密传输,客户端和服务器双向认证,保证了通信安全。
因此这种架构适合对数据安全性要求较高,同时各服务器性能比较敏感的场景。
4 引入消息队列机制的架构
这种架构使用 Logstash 从各个数据源搜集数据,然后经消息队列输出插件输出到消息队列中。目前 Logstash 支持 Kafka、Redis、RabbitMQ 等常见消息队列。然后 Logstash 通过消息队列输入插件从队列中获取数据,分析过滤后经输出插件发送到 Elasticsearch,最后通过 Kibana 展示。详见图 4。
图 4. 引入消息队列机制的架构
这种架构适合于日志规模比较庞大的情况。但由于 Logstash 日志解析节点和 Elasticsearch 的负荷比较重,可将他们配置为集群模式,以分担负荷。引入消息队列,均衡了网络传输,从而降低了网络闭塞,尤其是丢失数据的可能性,但依然存在 Logstash 占用系统资源过多的问题。
5 基于 Filebeat 架构的配置部署详解
前面提到 Filebeat 已经完全替代了 Logstash-Forwarder 成为新一代的日志采集器,同时鉴于它轻量、安全等特点,越来越多人开始使用它。这个章节将详细讲解如何部署基于 Filebeat 的 ELK 集中式日志解决方案,具体架构见图 5。
图 5. 基于 Filebeat 的 ELK 集群架构
因为免费的 ELK 没有任何安全机制,所以这里使用了 Nginx 作反向代理,避免用户直接访问 Kibana 服务器。加上配置 Nginx 实现简单的用户认证,一定程度上提高安全性。另外,Nginx 本身具有负载均衡的作用,能够提高系统访问性能。
1. FileBeat
(1)概述:
Filebeat是一个日志文件托运工具,在你的服务器上安装客户端后,filebeat会监控日志目录或者指定的日志文件,追踪读取这些文件(追踪文件的变化,不停的读),并且转发这些信息到elasticsearch或者logstarsh中存放。
以下是filebeat的工作流程:当你开启filebeat程序的时候,它会启动一个或多个探测器(prospectors)去检测你指定的日志目录或文件,对于探测器找出的每一个日志文件,filebeat启动收割进程(harvester),每一个收割进程读取一个日志文件的新内容,并发送这些新的日志数据到处理程序(spooler),处理程序会集合这些事件,最后filebeat会发送集合的数据到你指定的地点。
(个人理解,filebeat是一个轻量级的logstash,当你需要收集信息的机器配置或资源并不是特别多时,使用filebeat来收集日志。日常使用中,filebeat十分稳定,笔者没遇到过宕机。)
(2)工作原理:
Filebeat由两个主要组成部分组成:prospector和 harvesters。这些组件一起工作来读取文件并将事件数据发送到您指定的output。
1)harvester是什么:
一个harvester负责读取一个单个文件的内容。
harvester逐行读取每个文件(一行一行地读取每个文件),并把这些内容发送到输出。每个文件启动一个harvester,harvester负责打开和关闭这个文件,这就意味着在harvester运行时文件描述符保持打开状态。
在harvester正在读取文件内容的时候,文件被删除或者重命名了,那么Filebeat会续读这个文件。这就有一个问题了,就是只要负责这个文件的harvester没用关闭,那么磁盘空间就不会释放。默认情况下,Filebeat保存文件打开直到close_inactive到达。
关闭harvester会产生以下结果:
1)如果在harvester仍在读取文件时文件被删除,则关闭文件句柄,释放底层资源。
2)文件的采集只会在scan_frequency过后重新开始。
3)如果在harvester关闭的情况下移动或移除文件,则不会继续处理文件。
要控制收割机何时关闭,请使用close_ *配置选项。
2)prospector是什么:
prospector 负责管理harvester并找到所有要读取的文件来源。如果输入类型为日志,则查找器将查找路径匹配的所有文件,并为每个文件启动一个harvester;每个prospector都在自己的Go协程中运行。
下面的例子配置Filebeat从所有匹配指定的glob模式的文件中读取行:
filebeat.inputs:
- type: logpaths:- /var/log/*.log- /var/path2/*.log
Filebeat目前支持两种prospector类型:log和stdin。
每个prospector类型可以定义多次。
log prospector检查每个文件以查看harvester是否需要启动,是否已经运行,或者该文件是否可以被忽略(请参阅ignore_older)。如果是在Filebeat运行过程中新创建的文件,只要在Harvster关闭后,文件大小发生了变化,新文件才会被Prospector选择到。
注:Filebeat prospector只能读取本地文件, 没有功能可以连接到远程主机来读取存储的文件或日志。
3)Filebeat如何保持文件状态
Filebeat保存每个文件的状态,并经常刷新状态到磁盘上的注册文件(registry)。状态用于记住harvester读取的最后一个偏移量,并确保所有日志行被发送(到输出)。如果输出,比如Elasticsearch 或者 Logstash等,无法访问,那么Filebeat会跟踪已经发送的最后一行,并只要输出再次变得可用时继续读取文件。当Filebeat运行时,会将每个文件的状态新保存在内存中。当Filebeat重新启动时,将使用注册文件中的数据重新构建状态,Filebeat将在最后一个已知位置继续每个harvester。
对于每个输入,Filebeat保存它找到的每个文件的状态。因为文件可以重命名或移动,所以文件名和路径不足以标识文件。对于每个文件,Filebeat存储惟一标识符,以检测文件是否以前读取过。
如果你的情况涉及每天创建大量的新文件,你可能会发现注册表文件变得太大了。
(画外音:Filebeat保存每个文件的状态,并将状态保存到registry_file中的磁盘。当重新启动Filebeat时,文件状态用于在以前的位置继续读取文件。如果每天生成大量新文件,注册表文件可能会变得太大。为了减小注册表文件的大小,有两个配置选项可用:clean_remove和clean_inactive。对于你不再访问且被忽略的旧文件,建议您使用clean_inactive。如果想从磁盘上删除旧文件,那么使用clean_remove选项。)
4) Filebeat如何确保至少投递一次(at-least-once)
Filebeat保证事件将被投递到配置的输出中至少一次,并且不会丢失数据。Filebeat能够实现这种行为,因为它将每个事件的投递状态存储在注册表文件中。在定义的输出被阻塞且没有确认所有事件的情况下,Filebeat将继续尝试发送事件,直到输出确认收到事件为止。
如果Filebeat在发送事件的过程中关闭了,则在关闭之前它不会等待输出确认所有事件。当Filebeat重新启动时,发送到输出(但在Filebeat关闭前未确认)的任何事件将再次发送。这确保每个事件至少被发送一次,但是你最终可能会将重复的事件发送到输出。你可以通过设置shutdown_timeout选项,将Filebeat配置为在关闭之前等待特定的时间。
注意:
Filebeat的至少一次交付保证包括日志轮换和删除旧文件的限制。如果将日志文件写入磁盘并且写入速度超过Filebeat可以处理的速度,或者在输出不可用时删除了文件,则可能会丢失数据。
在Linux上,Filebeat也可能因inode重用而跳过行。
(3)Filebeat简单使用
第1步:安装
第2步:配置
配置文件:filebeat.yml
为了配置Filebeat:
1. 定义日志文件路径
对于最基本的Filebeat配置,你可以使用单个路径。例如:
filebeat.inputs:
- type: logenabled: truepaths:- /var/log/*.log
在这个例子中,获取在/var/log/*.log路径下的所有文件作为输入,这就意味着Filebeat将获取/var/log目录下所有以.log结尾的文件。
为了从预定义的子目录级别下抓取所有文件,可以使用以下模式:/var/log/*/*.log。这将抓取/var/log的子文件夹下所有的以.log结尾的文件。它不会从/var/log文件夹本身抓取。目前,不可能递归地抓取这个目录下的所有子目录下的所有.log文件。
(画外音:
假设配置的输入路径是/var/log/*/*.log,假设目录结构是这样的:
那么只会抓取到2.log和3.log,而不会抓到1.log和4.log。因为/var/log/aaa/ccc/1.log和/var/log/4.log不会被抓到。
)
2. 如果你发送输出目录到Elasticsearch(并且不用Logstash),那么设置IP地址和端口以便能够找到Elasticsearch:
output.elasticsearch:hosts: ["192.168.1.42:9200"]
3. 如果你打算用Kibana仪表盘,可以这样配置Kibana端点:
setup.kibana:host: "localhost:5601"
4. 如果你的Elasticsearch和Kibana配置了安全策略,那么在你启动Filebeat之前需要在配置文件中指定访问凭据。例如:
output.elasticsearch:hosts: ["myEShost:9200"]username: "filebeat_internal"password: "{pwd}"
setup.kibana:host: "mykibanahost:5601"username: "my_kibana_user" password: "{pwd}"
第3步:配置Filebeat以使用Logstash
如果你想使用Logstash对Filebeat收集的数据执行额外的处理,那么你需要将Filebeat配置为使用Logstash。
output.logstash:hosts: ["127.0.0.1:5044"]
第4步:在Elasticsearch中加载索引模板
在Elasticsearch中,索引模板用于定义设置和映射,以确定如何分析字段。(画外音:相当于定义索引文档的数据结构,因为要把采集的数据转成标准格式输出)
Filebeat包已经安装了推荐的索引模板。如果你接受filebeat.yml中的默认配置,那么Filebeat在成功连接到Elasticsearch以后会自动加载模板。如果模板已经存在,不会覆盖,除非你配置了必须这样做。
通过在Filebeat配置文件中配置模板加载选项,你可以禁用自动模板加载,或者自动加载你自己的目标。
配置模板加载:
默认情况下,如果Elasticsearch输出是启用的,那么Filebeat会自动加载推荐的模板文件 ——— fields.yml。
- 加载不同的模板
-
setup.template.name: "your_template_name" setup.template.fields: "path/to/fields.yml"
覆盖一个已存在的模板
-
setup.template.overwrite: true
禁用自动加载模板
-
setup.template.enabled: false
修改索引名称
-
# 默认情况下,Filebeat写事件到名为filebeat-6.3.2-yyyy.MM.dd的索引,其中yyyy.MM.dd是事件被索引的日期。为了用一个不同的名字,你可以在Elasticsearch输出中设置index选项。例如:output.elasticsearch.index: "customname-%{[beat.version]}-%{+yyyy.MM.dd}" setup.template.name: "customname" setup.template.pattern: "customname-*" setup.dashboards.index: "customname-*"
手动加载模板:
./filebeat setup --template -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'
第5步:设置Kibana dashboards
Filebeat附带了Kibana仪表盘、可视化示例。在你用dashboards之前,你需要创建索引模式,filebeat-*,并且加载dashboards到Kibana中。为此,你可以运行setup命令或者在filebeat.yml配置文件中配置dashboard加载。
./filebeat setup --dashboards
第6步:启动Filebeat
./filebeat -e -c filebeat.yml -d "publish"
第7步:查看Kibana仪表板示例
http://127.0.0.1:5601
完整的配置:
#=========================== Filebeat inputs ==============
filebeat.inputs:- type: logenabled: truepaths:- /var/log/*.log#============================== Dashboards ===============
setup.dashboards.enabled: false#============================== Kibana ==================
setup.kibana:host: "192.168.101.5:5601"#-------------------------- Elasticsearch output ---------
output.elasticsearch:hosts: ["localhost:9200"]
启动Elasticsearch:
/usr/local/programs/elasticsearch/elasticsearch-6.3.2/bin/elasticsearch
启动Kibana:
/usr/local/programs/kibana/kibana-6.3.2-linux-x86_64/bin/kibana
设置dashboard:
./filebeat setup --dashboards
启动Filebeat:
./filebeat -e -c filebeat.yml -d "publish"
浏览器访问 http://192.168.101.5:5601
查看索引:
请求:
curl -X GET "localhost:9200/_cat/indices?v"
响应:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 59jD3B4FR8iifWWjrdMzUg 5 1 1000 0 475.1kb 475.1kb
green open .kibana DzGTSDo9SHSHcNH6rxYHHA 1 0 153 23 216.8kb 216.8kb
yellow open filebeat-6.3.2-2018.08.08 otgYPvsgR3Ot-2GDcw_Upg 3 1 255 0 63.7kb 63.7kb
yellow open customer DoM-O7QmRk-6f3Iuls7X6Q 5 1 1 0 4.5kb 4.5kb
更加详细的配置和说明参考:日志收集系统EFK之Filebeat 模块与配置
filebeat.yml(中文配置详解)
################### Filebeat Configuration Example ###################################################### Filebeat ######################################
filebeat:# List of prospectors to fetch data.prospectors:# Each - is a prospector. Below are the prospector specific configurations-# Paths that should be crawled and fetched. Glob based paths.# To fetch all ".log" files from a specific level of subdirectories# /var/log/*/*.log can be used.# For each file found under this path, a harvester is started.# Make sure not file is defined twice as this can lead to unexpected behaviour.# 指定要监控的日志,可以指定具体得文件或者目录paths:- /var/log/*.log (这是默认的)(自行可以修改)(比如我放在/home/hadoop/app.log里)#- c:\programdata\elasticsearch\logs\*# Configure the file encoding for reading files with international characters# following the W3C recommendation for HTML5 (http://www.w3.org/TR/encoding).# Some sample encodings:# plain, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk,# hz-gb-2312, euc-kr, euc-jp, iso-2022-jp, shift-jis, ...# 指定被监控的文件的编码类型,使用plain和utf-8都是可以处理中文日志的#encoding: plain# Type of the files. Based on this the way the file is read is decided.# The different types cannot be mixed in one prospector## Possible options are:# * log: Reads every line of the log file (default)# * stdin: Reads the standard in# 指定文件的输入类型log(默认)或者stdininput_type: log# Exclude lines. A list of regular expressions to match. It drops the lines that are# matching any regular expression from the list. The include_lines is called before# 在输入中排除符合正则表达式列表的那些行。# exclude_lines. By default, no lines are dropped.# exclude_lines: ["^DBG"]# Include lines. A list of regular expressions to match. It exports the lines that are# matching any regular expression from the list. The include_lines is called before# exclude_lines. By default, all the lines are exported.# 包含输入中符合正则表达式列表的那些行(默认包含所有行),include_lines执行完毕之后会执行exclude_lines# include_lines: ["^ERR", "^WARN"]# Exclude files. A list of regular expressions to match. Filebeat drops the files that# are matching any regular expression from the list. By default, no files are dropped.# 忽略掉符合正则表达式列表的文件# exclude_files: [".gz$"]# Optional additional fields. These field can be freely picked# to add additional information to the crawled log files for filtering# 向输出的每一条日志添加额外的信息,比如“level:debug”,方便后续对日志进行分组统计。# 默认情况下,会在输出信息的fields子目录下以指定的新增fields建立子目录,例如fields.level# 这个得意思就是会在es中多添加一个字段,格式为 "filelds":{"level":"debug"}#fields:# level: debug# review: 1# Set to true to store the additional fields as top level fields instead# of under the "fields" sub-dictionary. In case of name conflicts with the# fields added by Filebeat itself, the custom fields overwrite the default# fields.# 如果该选项设置为true,则新增fields成为顶级目录,而不是将其放在fields目录下。# 自定义的field会覆盖filebeat默认的field# 如果设置为true,则在es中新增的字段格式为:"level":"debug"#fields_under_root: false# Ignore files which were modified more then the defined timespan in the past.# In case all files on your system must be read you can set this value very large.# Time strings like 2h (2 hours), 5m (5 minutes) can be used.# 可以指定Filebeat忽略指定时间段以外修改的日志内容,比如2h(两个小时)或者5m(5分钟)。#ignore_older: 0# Close older closes the file handler for which were not modified# for longer then close_older# Time strings like 2h (2 hours), 5m (5 minutes) can be used.# 如果一个文件在某个时间段内没有发生过更新,则关闭监控的文件handle。默认1h#close_older: 1h# Type to be published in the 'type' field. For Elasticsearch output,# the type defines the document type these entries should be stored# in. Default: log# 设定Elasticsearch输出时的document的type字段 可以用来给日志进行分类。Default: log#document_type: log# Scan frequency in seconds.# How often these files should be checked for changes. In case it is set# to 0s, it is done as often as possible. Default: 10s# Filebeat以多快的频率去prospector指定的目录下面检测文件更新(比如是否有新增文件)# 如果设置为0s,则Filebeat会尽可能快地感知更新(占用的CPU会变高)。默认是10s#scan_frequency: 10s# Defines the buffer size every harvester uses when fetching the file# 每个harvester监控文件时,使用的buffer的大小#harvester_buffer_size: 16384# Maximum number of bytes a single log event can have# All bytes after max_bytes are discarded and not sent. The default is 10MB.# This is especially useful for multiline log messages which can get large.# 日志文件中增加一行算一个日志事件,max_bytes限制在一次日志事件中最多上传的字节数,多出的字节会被丢弃#max_bytes: 10485760# Mutiline can be used for log messages spanning multiple lines. This is common# for Java Stack Traces or C-Line Continuation# 适用于日志中每一条日志占据多行的情况,比如各种语言的报错信息调用栈#multiline:# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [# 多行日志开始的那一行匹配的pattern#pattern: ^\[# Defines if the pattern set under pattern should be negated or not. Default is false.# 是否需要对pattern条件转置使用,不翻转设为true,反转设置为false。 【建议设置为true】#negate: false# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern# that was (not) matched before or after or as long as a pattern is not matched based on negate.# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash# 匹配pattern后,与前面(before)还是后面(after)的内容合并为一条日志#match: after# The maximum number of lines that are combined to one event.# In case there are more the max_lines the additional lines are discarded.# Default is 500# 合并的最多行数(包含匹配pattern的那一行)#max_lines: 500# After the defined timeout, an multiline event is sent even if no new pattern was found to start a new event# Default is 5s.# 到了timeout之后,即使没有匹配一个新的pattern(发生一个新的事件),也把已经匹配的日志事件发送出去#timeout: 5s# Setting tail_files to true means filebeat starts readding new files at the end# instead of the beginning. If this is used in combination with log rotation# this can mean that the first entries of a new file are skipped.# 如果设置为true,Filebeat从文件尾开始监控文件新增内容,把新增的每一行文件作为一个事件依次发送,# 而不是从文件开始处重新发送所有内容#tail_files: false# Backoff values define how agressively filebeat crawls new files for updates# The default values can be used in most cases. Backoff defines how long it is waited# to check a file again after EOF is reached. Default is 1s which means the file# is checked every second if new lines were added. This leads to a near real time crawling.# Every time a new line appears, backoff is reset to the initial value.# Filebeat检测到某个文件到了EOF(文件结尾)之后,每次等待多久再去检测文件是否有更新,默认为1s#backoff: 1s# Max backoff defines what the maximum backoff time is. After having backed off multiple times# from checking the files, the waiting time will never exceed max_backoff idenependent of the# backoff factor. Having it set to 10s means in the worst case a new line can be added to a log# file after having backed off multiple times, it takes a maximum of 10s to read the new line# Filebeat检测到某个文件到了EOF之后,等待检测文件更新的最大时间,默认是10秒#max_backoff: 10s# The backoff factor defines how fast the algorithm backs off. The bigger the backoff factor,# the faster the max_backoff value is reached. If this value is set to 1, no backoff will happen.# The backoff value will be multiplied each time with the backoff_factor until max_backoff is reached# 定义到达max_backoff的速度,默认因子是2,到达max_backoff后,变成每次等待max_backoff那么长的时间才backoff一次,# 直到文件有更新才会重置为backoff# 根据现在的默认配置是这样的,每隔1s检测一下文件变化,如果连续检测两次之后文件还没有变化,下一次检测间隔时间变为10s#backoff_factor: 2# This option closes a file, as soon as the file name changes.# This config option is recommended on windows only. Filebeat keeps the files it's reading open. This can cause# issues when the file is removed, as the file will not be fully removed until also Filebeat closes# the reading. Filebeat closes the file handler after ignore_older. During this time no new file with the# same name can be created. Turning this feature on the other hand can lead to loss of data# on rotate files. It can happen that after file rotation the beginning of the new# file is skipped, as the reading starts at the end. We recommend to leave this option on false# but lower the ignore_older value to release files faster.# 这个选项关闭一个文件,当文件名称的变化。#该配置选项建议只在windows#force_close_files: false# Additional prospector#-# Configuration to use stdin input#input_type: stdin# General filebeat configuration options## Event count spool threshold - forces network flush if exceeded# spooler的大小,spooler中的事件数量超过这个阈值的时候会清空发送出去(不论是否到达超时时间)#spool_size: 2048# Enable async publisher pipeline in filebeat (Experimental!)# 是否采用异步发送模式(实验功能)#publish_async: false# Defines how often the spooler is flushed. After idle_timeout the spooler is# Flush even though spool_size is not reached.# spooler的超时时间,如果到了超时时间,spooler也会清空发送出去(不论是否到达容量的阈值)#idle_timeout: 5s# Name of the registry file. Per default it is put in the current working# directory. In case the working directory is changed after when running# filebeat again, indexing starts from the beginning again.# 记录filebeat处理日志文件的位置的文件,默认是在启动的根目录下#registry_file: .filebeat# Full Path to directory with additional prospector configuration files. Each file must end with .yml# These config files must have the full filebeat config part inside, but only# the prospector part is processed. All global options like spool_size are ignored.# The config_dir MUST point to a different directory then where the main filebeat config file is in.# 如果要在本配置文件中引入其他位置的配置文件,可以写在这里(需要写完整路径),但是只处理prospector的部分#config_dir:###############################################################################
############################# Libbeat Config ##################################
# Base config file used by all other beats for using libbeat features############################# Output ########################################### Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.
output:### Elasticsearch as outputelasticsearch: (这是默认的,filebeat收集后放到es里)(自行可以修改,比如我有时候想filebeat收集后,然后到redis,再到es,就可以注销这行)# Array of hosts to connect to.# Scheme and port can be left out and will be set to the default (http and 9200)# In case you specify and additional path, the scheme is required: http://localhost:9200/path# IPv6 addresses should always be defined as: https://[2001:db8::1]:9200hosts: ["localhost:9200"] (这是默认的,filebeat收集后放到es里)(自行可以修改,比如我有时候想filebeat收集后,然后到redis,再到es,就可以注销这行)
# Optional protocol and basic auth credentials. #protocol: "https" #username: "admin" #password: "s3cr3t" # Number of workers per Elasticsearch host. #worker: 1 # Optional index name. The default is "filebeat" and generates # [filebeat-]YYYY.MM.DD keys. #index: "filebeat" # A template is used to set the mapping in Elasticsearch # By default template loading is disabled and no template is loaded. # These settings can be adjusted to load your own template or overwrite existing ones #template: # Template name. By default the template name is filebeat. #name: "filebeat" # Path to template file #path: "filebeat.template.json" # Overwrite existing template #overwrite: false # Optional HTTP Path #path: "/elasticsearch" # Proxy server url #proxy_url: http://proxy:3128 # The number of times a particular Elasticsearch index operation is attempted. If # the indexing operation doesn't succeed after this many retries, the events are # dropped. The default is 3. #max_retries: 3 # The maximum number of events to bulk in a single Elasticsearch bulk API index request. # The default is 50. #bulk_max_size: 50 # Configure http request timeout before failing an request to Elasticsearch. #timeout: 90 # The number of seconds to wait for new events between two bulk API index requests. # If `bulk_max_size` is reached before this interval expires, addition bulk index # requests are made. #flush_interval: 1 # Boolean that sets if the topology is kept in Elasticsearch. The default is # false. This option makes sense only for Packetbeat. #save_topology: false # The time to live in seconds for the topology information that is stored in # Elasticsearch. The default is 15 seconds. #topology_expire: 15 # tls configuration. By default is off. #tls: # List of root certificates for HTTPS server verifications #certificate_authorities: ["/etc/pki/root/ca.pem"] # Certificate for TLS client authentication #certificate: "/etc/pki/client/cert.pem" # Client Certificate Key #certificate_key: "/etc/pki/client/cert.key" # Controls whether the client verifies server certificates and host name. # If insecure is set to true, all server host names and certificates will be # accepted. In this mode TLS based connections are susceptible to # man-in-the-middle attacks. Use only for testing. #insecure: true # Configure cipher suites to be used for TLS connections #cipher_suites: [] # Configure curve types for ECDHE based cipher suites #curve_types: [] # Configure minimum TLS version allowed for connection to logstash #min_version: 1.0 # Configure maximum TLS version allowed for connection to logstash #max_version: 1.2 ### Logstash as output #logstash: # The Logstash hosts #hosts: ["localhost:5044"] # Number of workers per Logstash host. #worker: 1 # The maximum number of events to bulk into a single batch window. The # default is 2048. #bulk_max_size: 2048 # Set gzip compression level. #compression_level: 3 # Optional load balance the events between the Logstash hosts #loadbalance: true # Optional index name. The default index name depends on the each beat. # For Packetbeat, the default is set to packetbeat, for Topbeat # top topbeat and for Filebeat to filebeat. #index: filebeat # Optional TLS. By default is off. #tls: # List of root certificates for HTTPS server verifications #certificate_authorities: ["/etc/pki/root/ca.pem"] # Certificate for TLS client authentication #certificate: "/etc/pki/client/cert.pem" # Client Certificate Key #certificate_key: "/etc/pki/client/cert.key" # Controls whether the client verifies server certificates and host name. # If insecure is set to true, all server host names and certificates will be # accepted. In this mode TLS based connections are susceptible to # man-in-the-middle attacks. Use only for testing. #insecure: true # Configure cipher suites to be used for TLS connections #cipher_suites: [] # Configure curve types for ECDHE based cipher suites #curve_types: [] ### File as output #file: # Path to the directory where to save the generated files. The option is mandatory. #path: "/tmp/filebeat" # Name of the generated files. The default is `filebeat` and it generates files: `filebeat`, `filebeat.1`, `filebeat.2`, etc. #filename: filebeat # Maximum size in kilobytes of each file. When this size is reached, the files are # rotated. The default value is 10 MB. #rotate_every_kb: 10000 # Maximum number of files under path. When this number of files is reached, the # oldest file is deleted and the rest are shifted from last to first. The default # is 7 files. #number_of_files: 7 ### Console output # console: # Pretty print json event #pretty: false ############################# Shipper ######################################### shipper: # The name of the shipper that publishes the network data. It can be used to group # all the transactions sent by a single shipper in the web interface. # If this options is not defined, the hostname is used. #name: # The tags of the shipper are included in their own field with each # transaction published. Tags make it easy to group servers by different # logical properties. #tags: ["service-X", "web-tier"] # Uncomment the following if you want to ignore transactions created # by the server on which the shipper is installed. This option is useful # to remove duplicates if shippers are installed on multiple servers. #ignore_outgoing: true # How often (in seconds) shippers are publishing their IPs to the topology map. # The default is 10 seconds. #refresh_topology_freq: 10 # Expiration time (in seconds) of the IPs published by a shipper to the topology map. # All the IPs will be deleted afterwards. Note, that the value must be higher than # refresh_topology_freq. The default is 15 seconds. #topology_expire: 15 # Internal queue size for single events in processing pipeline #queue_size: 1000 # Configure local GeoIP database support. # If no paths are not configured geoip is disabled. #geoip: #paths: # - "/usr/share/GeoIP/GeoLiteCity.dat" # - "/usr/local/var/GeoIP/GeoLiteCity.dat" ############################# Logging ######################################### # There are three options for the log ouput: syslog, file, stderr. # Under Windos systems, the log files are per default sent to the file output, # under all other system per default to syslog. # 建议在开发时期开启日志并把日志调整为debug或者info级别,在生产环境下调整为error级别 # 开启日志 必须设置to_files 属性为true logging: # Send all logging output to syslog. On Windows default is false, otherwise # default is true. # 配置beats日志。日志可以写入到syslog也可以是轮滚日志文件。默认是syslog # tail -f /var/log/messages #to_syslog: true # Write all logging output to files. Beats automatically rotate files if rotateeverybytes # limit is reached. # 日志发送到轮滚文件 #to_files: false # To enable logging to files, to_files option has to be set to true # to_files设置为true才可以开启轮滚日志记录 files: # The directory where the log files will written to. # 指定日志路径 #path: /var/log/mybeat # The name of the files where the logs are written to. # 指定日志名称 #name: mybeat # Configure log file size limit. If limit is reached, log file will be # automatically rotated # 默认文件达到10M就会滚动生成新文件 rotateeverybytes: 10485760 # = 10MB # Number of rotated log files to keep. Oldest files will be deleted first. # 保留日志文件周期。 默认 7天。值范围为2 到 1024 #keepfiles: 7 # Enable debug output for selected components. To enable all selectors use ["*"] # Other available selectors are beat, publish, service # Multiple selectors can be chained. #selectors: [ ] # Sets log level. The default log level is error. # Available log levels are: critical, error, warning, info, debug # 日志级别,默认是error #level: error
下一篇介绍Kibana: IT学习笔记--日志收集系统EFK之Kibana
参考文章:
日志搜集系统从ELK到EFK;
efk日志系统搭建;
Filebeat中文指南;
Elasticsearch 快速开始;
开始使用Filebeat;
Logstash 性能及其替代方案;
这篇关于IT学习笔记--日志收集系统EFK的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!