首页
About Me
推荐
weibo
github
Search
1
linuxea:gitlab-ci之docker镜像质量品质报告
49,451 阅读
2
linuxea:如何复现查看docker run参数命令
23,046 阅读
3
Graylog收集文件日志实例
18,582 阅读
4
linuxea:jenkins+pipeline+gitlab+ansible快速安装配置(1)
18,275 阅读
5
git+jenkins发布和回滚示例
18,181 阅读
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack/logs
Open-Falcon
Prometheus
victoriaMetrics
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
互联咨询
最后的净土
软件交付
持续集成
gitops
devops
登录
Search
标签搜索
kubernetes
docker
zabbix
Golang
mariadb
持续集成工具
白话容器
elk
linux基础
nginx
dockerfile
Gitlab-ci/cd
最后的净土
基础命令
gitops
jenkins
docker-compose
Istio
haproxy
saltstack
marksugar
累计撰写
690
篇文章
累计收到
139
条评论
首页
栏目
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack/logs
Open-Falcon
Prometheus
victoriaMetrics
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
互联咨询
最后的净土
软件交付
持续集成
gitops
devops
页面
About Me
推荐
weibo
github
搜索到
27
篇与
的结果
2023-08-25
linuxea: openobseve HA本地单集群模式
ha默认就不支持本地存储了,集群模式下openobseve会运行多个节点,每个节点都是无状态的,数据存储在对象存储中,元数据在etcd中,因此理论上openobseve可以随时进行水平扩容组件如下:router:处理数据写入和页面查询,作为路由etcd: 存储用户信息,函数,规则,元数据等s3: 数据本身querier: 数据查询ingester: 数据没有在被写入到s3中之前,数据会进行临时通过预写来确保数据不会丢失,这类似于prometheus的walcompactor: 合并小文件到大文件,以及数据保留时间要配置集群模式,我们需要一个 对象存储,awk的s3,阿里的oss,或者本地的minio,还需要部署一个etcd作为元数据的存储,并且为ingester数据提供一个pvc,因为openobseve是运行在k8s上etcd我们将etcd运行在外部k8s之外的外部节点version: '2' services: oo_etcd: container_name: oo_etcd #image: 'docker.io/bitnami/etcd/3.5.8-debian-11-r4' image: uhub.service.ucloud.cn/marksugar-k8s/etcd:3.5.8-debian-11-r4 #network_mode: host restart: always environment: - ALLOW_NONE_AUTHENTICATION=yes - ETCD_ADVERTISE_CLIENT_URLS=http://0.0.0.0:2379 #- ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379 #- ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380 - ETCD_DATA_DIR=/bitnami/etcd/data volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /data/etcd/date:/bitnami/etcd # chown -R 777 /data/etcd/date/ ports: - 2379:2379 - 2380:2380 logging: driver: "json-file" options: max-size: "50M" mem_limit: 2048mpvc需要一个安装好的storageClass,我这里使用的是nfs-subdir-external-provisioner创建的nfs-latestminio部署一个单机版本的minio进行测试即可version: '2' services: oo_minio: container_name: oo_minio image: "uhub.service.ucloud.cn/marksugar-k8s/minio:RELEASE.2023-02-10T18-48-39Z" volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /docker/minio/data:/data command: server --console-address ':9001' /data environment: - MINIO_ACCESS_KEY=admin #管理后台用户名 - MINIO_SECRET_KEY=admin1234 #管理后台密码,最小8个字符 ports: - 9000:9000 # api 端口 - 9001:9001 # 控制台端口 logging: driver: "json-file" options: max-size: "50M" mem_limit: 2048m healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] interval: 30s timeout: 20s retries: 3启动后创建一个名为openobserve的桶安装openObserve我们仍然使用helm进行安装helm repo add openobserve https://charts.openobserve.ai helm repo update kubectl create ns openobserve对values.yaml定制的内容如下,latest.yaml:image: repository: uhub.service.ucloud.cn/marksugar-k8s/openobserve pullPolicy: IfNotPresent # Overrides the image tag whose default is the chart appVersion. tag: "latest" # 副本数 replicaCount: ingester: 1 querier: 1 router: 1 alertmanager: 1 compactor: 1 ingester: persistence: enabled: true size: 10Gi storageClass: "nfs-latest" # NFS的storageClass accessModes: - ReadWriteOnce # Credentials for authentication # 账号密码 auth: ZO_ROOT_USER_EMAIL: "root@example.com" ZO_ROOT_USER_PASSWORD: "abc123" # s3地址 ZO_S3_ACCESS_KEY: "admin" ZO_S3_SECRET_KEY: "admin1234" etcd: enabled: false # if true then etcd will be deployed as part of openobserve externalUrl: "172.16.100.47:2379" config: # ZO_ETCD_ADDR: "172.16.100.47:2379" # etcd地址 # ZO_HTTP_ADDR: "172.16.100.47:2379" ZO_DATA_DIR: "./data/" #数据目录 # 开启minio ZO_LOCAL_MODE_STORAGE: s3 ZO_S3_SERVER_URL: http://172.16.100.47:9000 ZO_S3_REGION_NAME: local ZO_S3_ACCESS_KEY: admin ZO_S3_SECRET_KEY: admin1234 ZO_S3_BUCKET_NAME: openobserve ZO_S3_BUCKET_PREFIX: openobserve ZO_S3_PROVIDER: minio ZO_TELEMETRY: "false" # 禁用匿名 ZO_WAL_MEMORY_MODE_ENABLED: "false" # 内存模式 ZO_WAL_LINE_MODE_ENABLED: "true" # wal写入模式 #ZO_S3_FEATURE_FORCE_PATH_STYLE: "true" # 数据没有在被写入到s3中之前,数据会进行临时通过预写来确保数据不会丢失,这类似于prometheus的wal resources: ingester: {} querier: {} compactor: {} router: {} alertmanager: {} autoscaling: ingester: enabled: false minReplicas: 1 maxReplicas: 100 targetCPUUtilizationPercentage: 80 # targetMemoryUtilizationPercentage: 80 querier: enabled: false minReplicas: 1 maxReplicas: 100 targetCPUUtilizationPercentage: 80 # targetMemoryUtilizationPercentage: 80 router: enabled: false minReplicas: 1 maxReplicas: 100 targetCPUUtilizationPercentage: 80 # targetMemoryUtilizationPercentage: 80 compactor: enabled: false minReplicas: 1 maxReplicas: 100 targetCPUUtilizationPercentage: 80 # targetMemoryUtilizationPercentage: 80指定本地minio,桶名称,认证信息等;指定etcd地址;为ingester指定sc; 而后安装 helm upgrade --install openobserve -f latest.yaml --namespace openobserve openobserve/openobserve如下[root@master-01 ~/openObserve]# helm upgrade --install openobserve -f latest.yaml --namespace openobserve openobserve/openobserve Release "openobserve" does not exist. Installing it now. NAME: openobserve LAST DEPLOYED: Sun Aug 20 18:04:31 2023 NAMESPACE: openobserve STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: 1. Get the application URL by running these commands: kubectl --namespace openobserve port-forward svc/openobserve-openobserve-router 5080:5080 [root@master-01 ~/openObserve]# kubectl -n openobserve get pod NAME READY STATUS RESTARTS AGE openobserve-alertmanager-6f486d5df5-krtxm 1/1 Running 0 53s openobserve-compactor-98ccf664c-v9mkb 1/1 Running 0 53s openobserve-ingester-0 1/1 Running 0 53s openobserve-querier-695cf4fcc9-854z8 1/1 Running 0 53s openobserve-router-65b68b4899-j9hs7 1/1 Running 0 53s [root@master-01 ~/openObserve]# kubectl -n openobserve get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-openobserve-ingester-0 Bound pvc-5d86b642-4464-4b3e-950a-d5e0b4461c27 10Gi RWO nfs-latest 2m47s而后配置一个Ingress指向openobserve-routerapiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: openobserve-ui namespace: openobserve labels: app: openobserve annotations: # kubernetes.io/ingress.class: nginx cert-manager.io/issuer: letsencrypt kubernetes.io/tls-acme: "true" nginx.ingress.kubernetes.io/enable-cors: "true" nginx.ingress.kubernetes.io/connection-proxy-header: keep-alive nginx.ingress.kubernetes.io/proxy-connect-timeout: '600' nginx.ingress.kubernetes.io/proxy-send-timeout: '600' nginx.ingress.kubernetes.io/proxy-read-timeout: '600' nginx.ingress.kubernetes.io/proxy-body-size: 32m spec: ingressClassName: nginx rules: - host: openobserve.test.com http: paths: - path: / pathType: ImplementationSpecific backend: service: name: openobserve-router port: number: 5080添加本地hosts后打开此时是没有任何数据的测试我们手动写入测试数据[root@master-01 ~/openObserve]# curl http://openobserve.test.com/api/linuxea/0820/_json -i -u 'root@example.com:abc123' -d '[{"author":"marksugar","name":"www.linuxea.com"}]' HTTP/1.1 200 OK Date: Sun, 20 Aug 2023 11:02:08 GMT Content-Type: application/json Content-Length: 65 Connection: keep-alive Vary: Accept-Encoding vary: accept-encoding Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true Access-Control-Allow-Methods: GET, PUT, POST, DELETE, PATCH, OPTIONS Access-Control-Allow-Headers: DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization Access-Control-Max-Age: 1728000 {"code":200,"status":[{"name":"0820","successful":1,"failed":0}]}数据插入同时,在NFS的本地磁盘也会写入[root@Node-172_16_100_49 ~]# cat /data/nfs-share/openobserve/data-openobserve-ingester-0/wal/files/linuxea/logs/0820/0_2023_08_20_13_2c624affe8540b70_7099015230658842624DKMpVA.json {"_timestamp":1692537124314778,"author":"marksugar","name":"www.linuxea.com"}在minio内的数据也进行写入minio中存储的数据无法查看,因为元数据在etcd中。
2023年08月25日
211 阅读
0 评论
0 点赞
2023-08-20
linuxea: openobseve单节点和查询语法
OpenObserve声称可以比Elasticsearch 它⼤约可以节省 140 倍的存储成本,同时由Rust开发的可观测性平台(⽇志、指标、追踪),它可以进行日志搜索,基于sql查询语句和搜索的日志关键字的上下周围数据,高压缩比的存储,身份验证和多租户,支持S3,miniio的高可用和集群,并且兼容elasticsearch的摄取,搜索,聚合api,计划报警和实时报警等功能。如果只是对日志搜索引擎感兴趣,相比于Elasticsearch的和zincSearch,OpenObserve更轻量,它不依赖于数据索引,数据被压缩后存储或者使用parquet列格式存储到对象存储中,尽管如此,在分区和缓存等技术的加持下,速度也不会太慢,并且在聚合查询数据情况下,OpenObserve的速度要比es快的多。OpenObserve的节点是无状态的,因此在水平扩展中,无需担心数据的复制损坏,他的运维工作和成本比Elasticsearch要低得多。并且OpenObserve内置的图像界面,单节点无需使用其他组件,仅仅OpenObserve就完成了存储和查询。同时OpenObserve作为prometheus的远程存储和查看,但是对于查询语句并非全部支持,因此,我们只进行日志收集的处理,其他不进行测试。单节点sled单节点和本地磁盘单节点使用本地磁盘模式,也是默认的方式,对于简单使用和测试在官方数据中,每天可以处理超过2T的数据我们可以简单的理解为sled是存储的元数据的地方,本地磁盘存储的数据。sled单机和对象存储数据存放在对象存储后,高可用的问题交给了对象存储提供商,并且openobseve使用列式存储的方式,并且进行分区,这样的情况下就规避了一定部分因为对象存储与openobseve之间的网络延迟导致的网络问题etcd和对象存储除此之外,进一步来说,元数据也需要进行有个妥善的地方存储,因此,元数据存储的etcd,meta数据存储在本地或者s3来保证数据的安全性单机安装在k8s的用户可以参考官方的deploy文件, 而使用docker-compose我们需要指定三个环境变量,分别是数据的存储目录,用户名和密码 - ZO_DATA_DIR=/data - ZO_ROOT_USER_EMAIL=root@example.com - ZO_ROOT_USER_PASSWORD=Complexpass#123如下version: "2.2" services: openobserve: container_name: openobserve restart: always image: public.ecr.aws/zinclabs/openobserve:latest ports: - "5080:5080" volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /data/openobserve:/data environment: - ZO_DATA_DIR=/data - ZO_ROOT_USER_EMAIL=root@example.com - ZO_ROOT_USER_PASSWORD=Complexpass#123 logging: driver: "json-file" options: max-size: "100M" mem_limit: 4096m而后我们直接打开映射的5080端口当前的版本中多语言通过机器翻译,因此并不准确loglog支持curl,filebeat,fluentbit,fluentd, vector等,并且这些提供了一定的示例curlcurl在低版本中,我们需要-d指定文件最快,如果你的版本在7.82,那么可以使用--json指定,参考官方文档1.创建一个linuxea的组,组下创建一个名为0819的分区,在0819中写入'[{"author":"marksugar"}]'[root@Node-172_16_100_151 /data/openObserve]# curl http://172.16.100.151:5080/api/linuxea/0819/_json -i -u 'root@example.com:Complexpass#123' -d '[{"author":"marksugar"}]' HTTP/1.1 200 OK content-length: 65 content-type: application/json vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers date: Sat, 19 Aug 2023 07:43:15 GMT {"code":200,"status":[{"name":"0819","successful":1,"failed":0}]}而后创建一组数据,内容如下cat > linuxea.json << LOF [{ "name": "linuxea", "web": "www.linuxea.com", "time": "2023-08-19", "log": "2023-08-18 09:04:01 Info Super Saiyan , this is a normal phenomenon", "info": "this is test" }] LOF接着也添加到linuxea/0819curl http://172.16.100.151:5080/api/linuxea/0819/_json -i -u 'root@example.com:Complexpass#123' --data-binary "@linuxea.json"如:[root@Node-172_16_100_151 /data/openObserve]# cat > linuxea.json << LOF > [{ > "name": "linuxea", > "web": "www.linuxea.com", > "time": "2023-08-19", > "log": "2023-08-18 09:04:01 Info Super Saiyan , this is a normal phenomenon", > "info": "this is test" > }] > LOF [root@Node-172_16_100_151 /data/openObserve]# curl http://172.16.100.151:5080/api/linuxea/0819/_json -i -u 'root@example.com:Complexpass#123' --data-binary "@linuxea.json" HTTP/1.1 200 OK content-length: 65 content-type: application/json vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers date: Sat, 19 Aug 2023 07:46:37 GMT {"code":200,"status":[{"name":"0819","successful":1,"failed":0}]}3, 在linuxea组下创建一个0820的分区写入不同的数据curl http://172.16.100.151:5080/api/linuxea/0820/_json -i -u 'root@example.com:Complexpass#123' -d '[{"author":"marksugar","name":"www.linuxea.com"}]'如下:[root@Node-172_16_100_151 /data/openObserve]# curl http://172.16.100.151:5080/api/linuxea/0820/_json -i -u 'root@example.com:Complexpass#123' -d '[{"author":"marksugar","name":"www.linuxea.com"}]' HTTP/1.1 200 OK content-length: 65 content-type: application/json vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers date: Sat, 19 Aug 2023 07:47:45 GMT {"code":200,"status":[{"name":"0820","successful":1,"failed":0}]}接着 ,我们回到界面查看数据的插入情况08190820查询默认情况下,如果将要查询的数据的字段是msg, message, log, logs,可以使用match_all('error'),match_all('error')是说匹配所有的error,但前提是msg, message, log, logs的字段内如果不是msg, message, log, logs的,比如是body,我们可以使用str_match(body, 'error')当然,这也可以使用str_match(log, 'error')来进行查询其他更多用法,参考example-queries我们重组数据进行查询如果此时,以下字段中的log中包含了我们需要的数据,那么我们就可以使用match_allcat > linuxea.json << LOF [{ "name": "linuxea", "web": "www.linuxea.com", "time": "2023-08-19", "log": "2023-08-18 09:04:01 Info Super Saiyan , this is a normal phenomenon linuxea", "info": "this is test", "author":"marksugar", }] LOF比如,我们查询的关键字是linuxea如果我们要查询的字段是marksugar,就不能使用match_all('marksugar'),因为match_all只能在是msg, message, log, logs中默认使用,正确的语法是:str_match(author, 'marksugar'),并且str_match的查询比match_all要快而对于其他的,可以直接使用key:value来, 比如: name='linuxea'对于这条查询,也可以使用sql语句的方式 name='linuxea' 等于 SELECT * FROM "0819" WHERE name='linuxea'参考基于k8s上loggie/vector/openobserve日志收集https://openobserve.ai/docs/example-queries/https://everything.curl.dev/http/post/jsonhttps://openobserve.ai/docs/ingestion/logs/curl/
2023年08月20日
263 阅读
0 评论
0 点赞
2023-08-19
linuxea: 基于k8s上loggie/vector/openobserve日志收集
在上次的日志收集组件变化中简单的介绍了新方案,通常要么基于K8s收集容器的标准输出,要么收集文件。我们尝试使用最新的方式进行配置日志收集的组合进行测试,如下:但是,在开始之前,我们需要部署kafka,zookeeper和kowl1.kafka修改kafka的ip地址version: "2" services: zookeeper: container_name: zookeeper image: uhub.service.ucloud.cn/marksugar-k8s/zookeeper:latest container_name: zookeeper restart: always ports: - '2182:2181' environment: - ALLOW_ANONYMOUS_LOGIN=yes logging: driver: "json-file" options: max-size: "100M" mem_limit: 2048m kafka: hostname: 172.16.100.151 image: uhub.service.ucloud.cn/marksugar-k8s/kafka:2.8.1 container_name: kafka user: root restart: always ports: - '9092:9092' volumes: - "/data/log/kafka:/bitnami/kafka" # chmod 777 -R /data/kafka environment: - KAFKA_BROKER_ID=1 - KAFKA_LISTENERS=PLAINTEXT://:9092 - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://172.16.100.151:9092 - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 - ALLOW_PLAINTEXT_LISTENER=yes depends_on: - zookeeper logging: driver: "json-file" options: max-size: "100M" mem_limit: 2048m kowl: container_name: kowl # network_mode: host restart: always # image: quay.io/cloudhut/kowl:v1.5.0 image: uhub.service.ucloud.cn/marksugar-k8s/kowl:v1.5.0 restart: on-failure hostname: kowl ports: - "8081:8080" environment: KAFKA_BROKERS: 172.16.100.151:9092 volumes: - /etc/localtime:/etc/localtime:ro # 时区2 depends_on: - kafka logging: driver: "json-file" options: max-size: "100M" mem_limit: 2048m2.loggie接着参考官网helm-chart下载,而后解压,配置loggie的用例VERSION=v1.4.0 helm pull https://github.com/loggie-io/installation/releases/download/$VERSION/loggie-$VERSION.tgz && tar xvzf loggie-$VERSION.tgz根据官网的配置示例进行修改,而后得到一个如下的latest.yaml,我们关键需要定义资源配额,加速后镜像地址,外部挂载容器的实际目录image: uhub.service.ucloud.cn/marksugar-k8s/loggie:v1.4.0 resources: limits: cpu: 2 memory: 2Gi requests: cpu: 100m memory: 100Mi extraArgs: {} # log.level: debug # log.jsonFormat: true extraVolumeMounts: - mountPath: /var/log/pods name: podlogs - mountPath: /var/lib/docker/containers name: dockercontainers - mountPath: /var/lib/kubelet/pods name: kubelet extraVolumes: - hostPath: path: /var/log/pods type: DirectoryOrCreate name: podlogs - hostPath: # path: /var/lib/docker/containers path: /data/containerd # containerd的实际目录 type: DirectoryOrCreate name: dockercontainers - hostPath: path: /var/lib/kubelet/pods type: DirectoryOrCreate name: kubelet extraEnvs: {} timezone: Asia/Shanghai ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ nodeSelector: {} ## Affinity for pod assignment ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} # podAntiAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # - labelSelector: # matchExpressions: # - key: app # operator: In # values: # - loggie # topologyKey: "kubernetes.io/hostname" ## Tolerations for pod assignment ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ tolerations: [] # - effect: NoExecute # operator: Exists # - effect: NoSchedule # operator: Exists updateStrategy: type: RollingUpdate ## Agent mode, ignored when aggregator.enabled is true config: loggie: reload: enabled: true period: 10s monitor: logger: period: 30s enabled: true listeners: filesource: period: 10s filewatcher: period: 5m reload: period: 10s sink: period: 10s queue: period: 10s pipeline: period: 10s discovery: enabled: true kubernetes: # Choose: docker or containerd containerRuntime: containerd # Collect log files inside the container from the root filesystem of the container, no need to mount the volume rootFsCollectionEnabled: false # Automatically parse and convert the wrapped container standard output format into the original log content parseStdout: false # If set to true, it means that the pipeline configuration generated does not contain specific Pod paths and meta information, # and these data will be dynamically obtained by the file source, thereby reducing the number of configuration changes and reloads. dynamicContainerLog: false # Automatically add fields when selector.type is pod in logconfig/clusterlogconfig typePodFields: logconfig: "${_k8s.logconfig}" namespace: "${_k8s.pod.namespace}" nodename: "${_k8s.node.name}" podname: "${_k8s.pod.name}" containername: "${_k8s.pod.container.name}" http: enabled: true port: 9196 ## Aggregator mode, by default is disabled aggregator: enabled: false replicas: 2 config: loggie: reload: enabled: true period: 10s monitor: logger: period: 30s enabled: true listeners: reload: period: 10s sink: period: 10s discovery: enabled: true kubernetes: cluster: aggregator http: enabled: true port: 9196 servicePorts: - name: monitor port: 9196 targetPort: 9196 # - name: gprc # port: 6066 # targetPort: 6066 serviceMonitor: enabled: false ## Scrape interval. If not set, the Prometheus default scrape interval is used. interval: 30s relabelings: [] metricRelabelings: []而后调试并安装helm install loggie -f latest.yaml -nloggie --create-namespace --dry-run ./ helm install loggie -f latest.yaml -nloggie --create-namespace ./默认情况下会以ds的方式进行部署,也就是每个Node节点安装一个。[root@master-01 ~/loggie-io]# kubectl -n loggie get pod NAME READY STATUS RESTARTS AGE loggie-42rcs 1/1 Running 0 15d loggie-56sz8 1/1 Running 0 15d loggie-jnzrc 1/1 Running 0 15d loggie-k5xqj 1/1 Running 0 15d loggie-v84wf 1/1 Running 0 14d2.1 配置收集在配置收集日志之前,我们先创建一个pod,加入此时有一组pod,他的标签是app: linuxea,在kustomize中表现如下:commonLabels: app: linuxea而后开始loggie的配置。在loggie的配置可以大致理解为局部配置和全局配置,如果没有特别的要求,默认的全局配置是够用,倘若不够我们需要局部声明不同的配置信息。1,此时创建一个sink上游是kafka,ip地址是172.16.100.151:9092,我们输入类型,地址,即将创建的topic的名称apiVersion: loggie.io/v1beta1 kind: Sink metadata: name: default-kafka spec: sink: | type: kafka brokers: ["172.16.100.151:9092"] topic: "pod-${fields.environment}-${fields.topic}"but,如果这是一个加密的,你需要配置如下apiVersion: loggie.io/v1beta1 kind: Sink metadata: name: default-kafka spec: sink: | type: kafka brokers: ["172.16.100.151:9092"] topic: "pod-${fields.environment}-${fields.topic}" sasl: type: scram userName: 用户名 password: 密码 algorithm: sha2562,而在LogConfig使用的是标签来关联那些pod的日志将会被收集到,如下 labelSelector: app: linuxea # 对应deployment的标签 标记有app: linuxea标签的pod均被收集3,而这些pod的日志的路径paths是pod中标准输出stdout,如果是文件目录这里应该填写对应的地址和正则匹配4,接着配置一个fields来描述资源,key:value fields: topic: "java-demo" environment: "dev"而这个自定义的描述被sink中的环境变量所提取,既:topic: "pod-${fields.environment}-${fields.topic}"5,在interceptors中我们进行了限流,这意味着每秒最多只能处理 interceptors: | - type: rateLimit qps: 900006,最后使用sinkRef关联创建的sink: sinkRef: default-kafka完整的yaml如下:apiVersion: loggie.io/v1beta1 kind: Sink metadata: name: default-kafka spec: sink: | type: kafka brokers: ["172.16.100.151:9092"] topic: "pod-${fields.environment}-${fields.topic}" --- apiVersion: loggie.io/v1beta1 kind: LogConfig metadata: name: java-demo namespace: linuxea-dev spec: selector: type: pod labelSelector: app: linuxea # 对应deployment的标签 pipeline: sources: | - type: file name: production-java-demo paths: - stdout ignoreOlder: 12h workerCount: 128 fields: topic: "java-demo" environment: "dev" interceptors: | - type: rateLimit qps: 90000 - type: transformer actions: - action: jsonDecode(body) sinkRef: default-kafka interceptorRef: default创建完成[root@master-01 ~/loggie-io]# kubectl -n loggie get sink NAME AGE default-kafka 15d [root@master-01 ~/loggie-io]# kubectl -n linuxea-dev get LogConfig NAME POD SELECTOR AGE java-demo {"app":"linuxea"} 15d日志写入后,到kafka查看的日志格式如下:{ "fields":{ "containername":"java-demo" "environment":"dev" "logconfig":"java-demo" "namespace":"linuxea-dev" "nodename":"172.16.100.83" "podname":"production-java-demo-5cf5b97645-4xh89" "topic":"java-demo" } "body":"2023-08-15T22:10:22.773955049+08:00 stdout F 2023-08-15 22:10:22.773 INFO 7 --- [ main] com.example.demo.DemoApplication : Started DemoApplication in 1.492 seconds (JVM running for ..." }3.openobserve我们需要安装openobserve,日志将会被消费到openobserve,安装openobserve在172.16.100.151的节点上version: "2.2" services: openobserve: container_name: openobserve restart: always image: public.ecr.aws/zinclabs/openobserve:latest ports: - "5080:5080" volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /data/openobserve:/data environment: - ZO_DATA_DIR=/data - ZO_ROOT_USER_EMAIL=root@example.com - ZO_ROOT_USER_PASSWORD=Complexpass#123 logging: driver: "json-file" options: max-size: "100M" mem_limit: 4096m接着我们就可以消费kafka后,将日志用vector写入到172.16.100.151上的openobserve了4.vectorvector作为替代logstash的角色,在此处的作用是消费kafka中的数据此时,我们需要配置在github的vector的releases页面下载安装包,我直接下载的rpmhttps://github.com/vectordotdev/vector/releases/download/v0.31.0/vector-0.31.0-1.x86_64.rpm安装完之后,我们需要创建一个配置文件vector.toml。格式非常简单,如下:mv /etc/vector/vector.toml /etc/vector/vector.toml-bak cat > /etc/vector/vector.toml << EOF [api] enabled = true address = "0.0.0.0:8686" [sources.kafka151] type = "kafka" bootstrap_servers = "172.16.100.151:9092" group_id = "consumer-group-name" topics = [ "pod-dev-java-demo" ] [sources.kafka151.decoding] codec = "json" [sinks.openobserve] type = "http" inputs = [ "kafka151" ] uri = "http://172.16.100.151:5080/api/pod-dev-java-demo/default/_json" method = "post" auth.strategy = "basic" auth.user = "root@example.com" auth.password = "Complexpass#123" compression = "gzip" encoding.codec = "json" encoding.timestamp_format = "rfc3339" healthcheck.enabled = false EOFbut,如果kafka加密了的话,我们需要添加额外的sasl配置[sources.kafka151] type = "kafka" bootstrap_servers = "172.16.100.151:9092" group_id = "consumer-group-name" topics = [ "pod-dev-java-demo" ] sasl.enabled = true sasl.mechanism = "SCRAM-SHA-256" sasl.password = "密码" sasl.username = "用户名" [sources.kafka151.decoding] codec = "json"对于日志的内容的处理,可以借助https://playground.vrl.dev/将上述文件替换到/etc/vector/vector.toml而后,启动systemctl start vector systemctl enable vector注意:uri = "http://172.16.100.151:5080/api/pod-dev-java-demo/default/_json",我们可以理解成http://172.16.100.151:5080/api/[group]/[items]/_json,如果在一个项目组的多个项目,我们可以通过这种方式进行归类回到openobserve查看而后点击explore查看日志回到logs查看5.openobsever搜索此时我的日志字段如下{ "fields":{ "podname":"production-java-demo-5cf5b97645-9ws4w" "topic":"java-demo" "containername":"java-demo" "environment":"dev" "logconfig":"java-demo" "namespace":"linuxea-dev" "nodename":"172.16.100.83" } "body":"2023-08-15T23:19:33.032689346+08:00 stdout F 2023-08-15 23:19:33.032 INFO 7 --- [ main] com.example.demo.DemoApplication : Started DemoApplication in 1.469 seconds (JVM running for ..." }如果我想搜索的内容是body中包含DemoApplication的内容,语法如下str_match(body, 'DemoApplication')默认情况下,只有msg,meesage,logs才会被全局匹配,对于不是这些字段的,我们需要使用str_match,如果匹配的字段是body的,包含DemoApplication的日志,可以使用如下命令str_match(body, 'DemoApplication')现在,一个可以替代传统ELK的日志方案就完成了。
2023年08月19日
348 阅读
0 评论
0 点赞
2023-08-11
linuxea: 日志收集的悄然变化
日志收集短期发展史日志的查看和告警是日志收集最核心的两个原因之一,通常99%的日志都是无用的,除非这些日志被用来做数据聚合环比数据分析。而传统的ELK,无论是Logstash还是ES都是非常消耗系统资源的应用,大规模场景中,要即时消费kafka的数据是一件不太容易的事情。观测性我们知道,现在的大多数应用皆是分布式或者微服务。微服务架构是让开发人员能够更快构建和发布,而随着服务进一步扩张,我们越发的不清楚服务运行的状态。而opentelmetry是来解决这种问题手段之一,微服务扩展后自己的服务和服务依赖之间的关系,通过可观察的方式使开发和维护都能够获得对系统的可见性。为了具备这种能力,系统就需要具备观测行。观测性是用来描述对系统所发生情况的理解程度,正常运行还是已经停止,用户察觉是变快还是变慢。如何构建KPI和SLA约定指标,或者说能够接受怎么样的最坏状态。能够回答如上的这些问题,并且能够指出问题。理想情况下在中断服务之前快速响应并且快速解决。而在术语上,观测性分为:事件日志,链路追踪和聚合指标。如果听到这个,让我们想起了最近的开源领域一些状态,或许就明白了,他们在做什么小米夜莺监控Nightingale在2023.7月底发布了V6版,转而构建可观测性平台。需要时刻掌握运行状态,可能是无法避免浪费,云产品提供了昂贵的观测平台服务:阿里云的商用产品arms和腾讯的商用产品介绍到这里,我想今天的主题是从日志发展作为开始,而结束必然是与事件日志(logs),链路追踪(traces)和聚合指标(metrics)有关。ELK的开始最早的ELK日志收集逻辑如下后来Graylog也成为更多的选择。而随着容器的发展,收集端logstash显然不够轻量,作为收集段fluentd,Fluent-bit用插件的方式比logstash更加轻便。而日志告警除开logstash,就是es插件。但是,他们都有一个同样的问题,假如你只愿意收集某一些日志,而不是所有的日志 ,无论是Logstash还是fluentd,Fluent-bit都i不会很 轻松,你需要配置一些过滤规则或者标签,而整个配置清单至少需要100行上下。而在中间的插曲是阿里开源的 log-point,log-pilot不在对所有的pod收集,你可以通过传入环境变量的方式进行选择,这与早期阿里的平台收集日志是一样 但是好景不长,log-pilot突然就停更了,它不在支持新特性和变化。事情在短期内由回到了开源领域的fluentd。而在这个 过程中 ,有一家石墨公司推出了clickvisual,它不在使用es,而是clickhouse,因此在同配置下它的集群性能超过了ES集群并且,clickhouse的数据很容易通过ttl来进行修改删除过期数据。但是clickvisual的产品是自己公司内部使用,而后免费开源出来的,因此它的界面似乎并没有获得广大用户青睐,clickvisual社区比较清淡。但是维护人员很活跃。当然,这并没有完。除了上述这些,log-pilot在停更的2年后,阿里随后推出了新的开源项目ilogtail,但是ilogtail似乎和log-pilot的宿命一样,ilogtail的社区更多时候永远慢一些 ,无论是补丁还是PR合并,以及ISSUE回复,这让社区旁观的人仍然会认为这依旧是一个 KPI产品。而在ilogtail出现的同时,另一个 loggie-io悄然出现,loggie-io是网易公司的日志收集端, loggie-io与clickvisual一样,都是商业公司内部的产品,而后进行开源公开。而clickvisual和 loggie-io的维护者相对要活跃,因此使用者居多。并且 loggie-io成功了接替了没有log-pilot这些日志的空白。并且 loggie-io提供了更多的使用功能 。此时的拓扑如下而在这其中,唯一没有被取代的是logstash, logstash是老牌日志处理中最关键的一环,他几乎包含了所有能够被用到的功能他都有。但是Datadog公司的vector出现后,logstash有了被替代的可能。vector是由rust编写,相比较java 的logstash使用资源更小,vector能够替代logstash的日志收集,中转,过滤,处理。它几乎可以替代logstash.事情到了 这里 并没有完,VictoriaLogs还没有结束,而github上openobserve通过不到一年的时间收获6K星,它的出现对标了es和kibana,因为它可以同时替代es和kibana。并声称与 Elasticsearch 相比,后者可以将日志存储成本降低约 140 倍。支持日志、指标、跟踪(Opentelemetry),集群支持S3,警报和查询, SQL 和 PromQL。或许是得益于openobserve的parquet,openobserve声称单机每天可以处理超过 2 TB 的数据。Mac M2的处理速度为约 31 MB/秒,即每分钟处理 1.8 GB,每天处理 2.6 TB。而这种情况在上一次这样的描述的是vm存储TimescaleDB相比。而且openobserve与vm的存储都是无状态的,尽管他们并不相同 ,但他可以仍然水平扩展,这样一来,这一切就更加明显了。
2023年08月11日
341 阅读
0 评论
0 点赞
2023-08-08
linuxea: vector与alertmanager的调试日志警报
日志告警一直都是一个无法回避的问题,无论是在什么时候,能够掌握程序日志的报错信息是有利于早期发现并定位问题。而在过去,常用手段可以通过logstash的if判断进行正则匹配,或者通过第三方工具读取ES,再或者通过grafan来进行触发而在阿里云或者腾讯云中同样也具备日志过滤,并且自带多级处理。而在传统的ELK中,fluentd也是可以承担这个任务,而在新兴的开源软件中,以上逐渐被慢慢剥离。取而代之的是阿里的ilogtail, 网易的 loggie-io,以及Datadog公司的vector。vector是由rust编写,在处理和消费速度上优于logstash,我将会分享如何通过vector调试vector处理日志关键字触发告警。在logstash上是可以支持重复日志计数和沉默的,而vector只负责过滤和转发,因此alertmanager可以承担这一个功能开始之前,我们需要了解alertmanager是如何接受告警的:alertmanager安装alertmanager提供一个config.yml的示例mkdir /data/alertmanager -p cat > /data/alertmanager/config.yml << EOF global: resolve_timeout: 5m route: group_by: ['alertname', 'instance'] group_wait: 30s group_interval: 5m repeat_interval: 24h receiver: email routes: - receiver: 'webhooke' group_by: ['alertname', 'instance'] group_wait: 30s group_interval: 5m repeat_interval: 24h match: severity: 'critical' - receiver: 'webhookw' group_by: ['alertname', 'instance'] group_wait: 30s group_interval: 5m repeat_interval: 24h match: severity: '~(warning)$' receivers: - name: 'webhookw' webhook_configs: - send_resolved: true url: 'http://webhook-dingtalk:8060/dingtalk/webhookw/send' - name: 'webhooke' webhook_configs: - send_resolved: true url: 'http://webhook-dingtalk:8060/dingtalk/webhooke/send' inhibit_rules: - source_match: alertname: node_host_lost,PodMemoryUsage severity: 'critical' target_match: severity: 'warning' equal: ['ltype'] EOFdocker-composeversion: "2.2" services: kafka: container_name: alertmanager restart: always image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/alertmanager:v0.24.0 ports: - "9093:9093" volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /data/alertmanager/config.yml:/etc/alertmanager/config.yml # chmod 777 -R /data/kafka environment: - ALLOW_PLAINTEXT_LISTENER=yes logging: driver: "json-file" options: max-size: "100M" mem_limit: 4096m想要发送到alertmanager,我们需要符合的格式,如下[ { "labels": { "alertname": "name1", "dev": "sda1", "instance": "example3", "severity": "warning" } } ]如下alerts1='[ { "labels": { "alertname": "name1", "dev": "sda1", "instance": "example3", "severity": "warning" } } ]' curl -XPOST -d"$alerts1" http://172.16.100.151:9093/api/v1/alerts返回success[root@master-01 /var/log]# curl -XPOST -d"$alerts1" http://172.16.100.151:9093/api/v1/alerts {"status":"success"}可以在界面查看vectoralertmanager了解之后,我们按照官方的配置拿到如下信息,并且进行调试:配置说明[sources.filetest] : 数据来源[transforms.ftest]: 数据处理[transforms.remap_alert_udev]: ramap数据,相当于此前logstash的grok,比grok功能强大condition = "match!(.message, r'.*WebApplicationContext*.')" 过滤包含WebApplicationContext的关键字的日志而后将日志格式为json,重新组合为alertmanager的数据格式source = """ . = parse_json!(.message) . = [ { "labels": { "alertname": .fields.podname, "namespace": .fields.namespace, "environment": .fields.environment, "podname": .fields.podname, "nodename": .fields.nodename, "topic": .fields.topic, "body": .body, "severity": "critical" } } ] """用于调试打印[sinks.sink0] inputs = ["remap_alert_*"] target = "stdout" type = "console" [sinks.sink0.encoding] codec = "json"用于发送alertmanager[sinks.alertmanager] type = "http" inputs = ["remap_alert_*"] uri = "http://172.16.100.151:9093/api/v1/alerts" compression = "none" encoding.codec = "json" acknowledgements.enabled = truevector.toml最终如下[api] enabled = true address = "0.0.0.0:8686" [sources.filetest] type = "file" include = ["/var/log/test.log"] [transforms.ftest] type = "filter" inputs = ["filetest"] condition = "match!(.message, r'.*WebApplicationContext*.')" [transforms.remap_alert_udev] type = "remap" inputs = ["ftest"] source = """ . = parse_json!(.message) . = [ { "labels": { "alertname": .fields.podname, "namespace": .fields.namespace, "environment": .fields.environment, "podname": .fields.podname, "nodename": .fields.nodename, "topic": .fields.topic, "body": .body, "severity": "critical" } } ] """ [sinks.sink0] inputs = ["remap_alert_*"] target = "stdout" type = "console" [sinks.sink0.encoding] codec = "json" [sinks.alertmanager] type = "http" inputs = ["remap_alert_*"] uri = "http://172.16.100.151:9093/api/v1/alerts" compression = "none" encoding.codec = "json" acknowledgements.enabled = true对于其他的日志格式处理,参考https://playground.vrl.dev/启动 vector[root@master-01 ~/vector]# vector -c vector.toml 2023-08-05T06:42:30.336918Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info" 2023-08-05T06:42:30.337720Z INFO vector::app: Loading configs. paths=["vector.toml"] 2023-08-05T06:42:30.355841Z INFO vector::topology::running: Running healthchecks. 2023-08-05T06:42:30.355886Z INFO vector::topology::builder: Healthcheck passed. 2023-08-05T06:42:30.355907Z INFO vector::topology::builder: Healthcheck passed. 2023-08-05T06:42:30.355930Z INFO vector: Vector has started. debug="false" version="0.31.0" arch="x86_64" revision="0f13b22 2023-07-06 13:52:34.591204470" 2023-08-05T06:42:30.355940Z INFO source{component_kind="source" component_id=filetest component_type=file component_name=filetest}: vector::sources::file: Starting file server. include=["/var/log/test.log"] exclude=[] 2023-08-05T06:42:30.356284Z INFO source{component_kind="source" component_id=filetest component_type=file component_name=filetest}:file_server: file_source::checkpointer: Loaded checkpoint data. 2023-08-05T06:42:30.356411Z INFO source{component_kind="source" component_id=filetest component_type=file component_name=filetest}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/test.log file_position=4068 2023-08-05T06:42:30.356959Z INFO vector::internal_events::api: API server running. address=0.0.0.0:8686 playground=http://0.0.0.0:8686/playground手动 追加一条信息[root@master-01 ~]# echo '{"body":"2023-08-02T00:18:34.866228161+08:00 stdouts.b.w.embedded.tomcat.TomcatWebServer WebApplicationContext","fields":{"containername":"java-demo","environment":"dev","logconfig":"java-demo","namespace":"linuxea-dev","nodename":"172.16.100.83","podname":"production-java-demo-5cf5b97645-tsmxx","topic":"java-demo"}}' >> /var/log/test.log如果没有问题,这里 将会将日志打印到console,并且会发送到alertmanager[root@master-01 ~/vector]# vector -c vector.toml 2023-08-05T06:42:30.336918Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info" 2023-08-05T06:42:30.337720Z INFO vector::app: Loading configs. paths=["vector.toml"] 2023-08-05T06:42:30.355841Z INFO vector::topology::running: Running healthchecks. 2023-08-05T06:42:30.355886Z INFO vector::topology::builder: Healthcheck passed. 2023-08-05T06:42:30.355907Z INFO vector::topology::builder: Healthcheck passed. 2023-08-05T06:42:30.355930Z INFO vector: Vector has started. debug="false" version="0.31.0" arch="x86_64" revision="0f13b22 2023-07-06 13:52:34.591204470" 2023-08-05T06:42:30.355940Z INFO source{component_kind="source" component_id=filetest component_type=file component_name=filetest}: vector::sources::file: Starting file server. include=["/var/log/test.log"] exclude=[] 2023-08-05T06:42:30.356284Z INFO source{component_kind="source" component_id=filetest component_type=file component_name=filetest}:file_server: file_source::checkpointer: Loaded checkpoint data. 2023-08-05T06:42:30.356411Z INFO source{component_kind="source" component_id=filetest component_type=file component_name=filetest}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/test.log file_position=4068 2023-08-05T06:42:30.356959Z INFO vector::internal_events::api: API server running. address=0.0.0.0:8686 playground=http://0.0.0.0:8686/playground {"labels":{"alertname":"production-java-demo-5cf5b97645-tsmxx","body":"2023-08-02T00:18:34.866228161+08:00 stdouts.b.w.embedded.tomcat.TomcatWebServer WebApplicationContext","environment":"dev","namespace":"linuxea-dev","nodename":"172.16.100.83","podname":"production-java-demo-5cf5b97645-tsmxx","severity":"critical","topic":"java-demo"}}alertmanager已经收到一个匹配到的日志警报已经被发送到alertmanager,接着你可以用它发往任何地方。
2023年08月08日
293 阅读
0 评论
0 点赞
2018-08-16
linuxea:logstash6和filebeat6配置笔记
开始配置filebeat,在这之前,你或许需要了解下之前的配置结构[ELK6.3.2安装与配置[跨网络转发思路]](https://www.linuxea.com/1889.html),我又将配置优化了下。仅仅因为我一个目录下有多个nginx日志。配置filebeat之前使用过用一个个日志来做单个的日志过滤,现在使用*.log匹配所有以log结尾的日志在发送到redis中在配置filebeat中将/data/wwwlogs/的所有以.log结尾的文件都会被收集到%{[fields.list_id]的变量名称中,在这个示例中是100_nginx_access,output到redis,key名称则是100_nginx_access,这其中包含error日志[root@linuxea-0702-DTNode01 ~]# cat /etc/filebeat/filebeat.yml filebeat.prospectors: - type: log enabled: true paths: - /data/wwwlogs/*.log fields: list_id: 172_nginx_access exclude_files: - ^access - ^error - \.gz$ filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false setup.template.settings: index.number_of_shards: 3 output.redis: hosts: ["47.90.33.131:6379"] password: "OTdmOWI4ZTM4NTY1M2M4OTZh" db: 2 timeout: 5 key: "%{[fields.list_id]:unknow}"排除文件可以这样exclude_files: ["/var/wwwlogs/error.log"]为了提升性能,redis关闭持久存储save "" #save 900 1 #save 300 10 #save 60 10000 appendonly no aof-rewrite-incremental-fsync nologstash配置文件假如你也是rpm安装的logstash的话,那就巧了,我也是在logstash中修pipeline.workers的线程数和ouput的线程数以及batch.size,线程数可以和内核数量持平,如果是单独运行logstash,可以设置稍大些。配置文件过滤后就是这样[root@linuxea-VM-Node117 /etc/logstash]# cat logstash.yml node.name: node1 path.data: /data/logstash/data #path.config: *.yml log.level: info path.logs: /data/logstash/logs pipeline.workers: 16 pipeline.output.workers: 16 pipeline.batch.size: 10000 pipeline.batch.delay: 10pipelines 配置文件pipelines文件中包含了所有的日志配置文件,也就是管道存放的位置和启动的workers[root@linuxea-VM-Node117 /etc/logstash]# cat pipelines.yml # This file is where you define your pipelines. You can define multiple. # For more information on multiple pipelines, see the documentation: # https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html - pipeline.id: 172_nginx_access pipeline.workers: 1 path.config: "/etc/logstash/conf.d/172_nginx_access.conf" - pipeline.id: 76_nginx_access pipeline.workers: 1 path.config: "/etc/logstash/conf.d/76_nginx_access.conf"jvm.optionsjvm.options配置文件中修改xms的起始大小和最大的大小,视配置而定-Xms4g -Xmx7g文件目录树:[root@linuxea-VM-Node117 /etc/logstash]# tree ./ ./ |-- conf.d | |-- 172_nginx_access.conf | `-- 76_nginx_access.conf |-- GeoLite2-City.mmdb |-- jvm.options |-- log4j2.properties |-- logstash.yml |-- patterns.d | |-- nginx | |-- nginx2 | `-- nginx_error |-- pipelines.yml `-- startup.options 2 directories, 20 filesnginx配置文件在conf.d目录下存放是单个配置文件,他可以存放多个。单个大致这样的input { redis { host => "47.31.21.369" port => "6379" key => "172_nginx_access" data_type => "list" password => "OTdmOM4OTZh" threads => "5" db => "2" } } filter { if [fields][list_id] == "172_nginx_access" { grok { patterns_dir => [ "/etc/logstash/patterns.d/" ] match => { "message" => "%{NGINXACCESS}" } match => { "message" => "%{NGINXACCESS_B}" } match => { "message" => "%{NGINXACCESS_ERROR}" } match => { "message" => "%{NGINXACCESS_ERROR2}" } overwrite => [ "message" ] remove_tag => ["_grokparsefailure"] timeout_millis => "0" } geoip { source => "clent_ip" target => "geoip" database => "/etc/logstash/GeoLite2-City.mmdb" } useragent { source => "User_Agent" target => "userAgent" } urldecode { all_fields => true } mutate { gsub => ["User_Agent","[\"]",""] #将user_agent中的 " 换成空 convert => [ "response","integer" ] convert => [ "body_bytes_sent","integer" ] convert => [ "bytes_sent","integer" ] convert => [ "upstream_response_time","float" ] convert => [ "upstream_status","integer" ] convert => [ "request_time","float" ] convert => [ "port","integer" ] } date { match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ] } } } output { if [fields][list_id] == "172_nginx_access" { elasticsearch { hosts => ["10.10.240.113:9200","10.10.240.114:9200"] index => "logstash-172_nginx_access-%{+YYYY.MM.dd}" user => "elastic" password => "dtopsadmin" } } stdout {codec => rubydebug} }其中: match字段的文件位置和在/etc/logstash/patterns.d/ patterns_dir => [ "/etc/logstash/patterns.d/" ] match => { "message" => "%{NGINXACCESS}" } match => { "message" => "%{NGINXACCESS_B}" } match => { "message" => "%{NGINXACCESS_ERROR}" } match => { "message" => "%{NGINXACCESS_ERROR2}" }nginx日志grok字段[root@linuxea-VM-Node117 /etc/logstash]# cat patterns.d/nginx NGUSERNAME [a-zA-Z\.\@\-\+_%]+ NGUSER %{NGUSERNAME} NGINXACCESS %{IP:clent_ip} (?:-|%{USER:ident}) \[%{HTTPDATE:log_date}\] \"%{WORD:http_verb} (?:%{PATH:baseurl}\?%{NOTSPACE:params}(?: HTTP/%{NUMBER:http_version})?|%{DATA:raw_http_request})\" (%{IPORHOST:url_domain}|%{URIHOST:ur_domain}|-)\[(%{BASE16FLOAT:request_time}|-)\] %{NOTSPACE:request_body} %{QS:referrer_rul} %{GREEDYDATA:User_Agent} \[%{GREEDYDATA:ssl_protocol}\] \[(?:%{GREEDYDATA:ssl_cipher}|-)\]\[%{NUMBER:time_duration}\] \[%{NUMBER:http_status_code}\] \[(%{BASE10NUM:upstream_status}|-)\] \[(%{NUMBER:upstream_response_time}|-)\] \[(%{URIHOST:upstream_addr}|-)\] [root@linuxea-VM-Node117 /etc/logstash]# 由于使用了4层,nginx日志被报错在编译时候的日志格式,也做了grok[root@linuxea-VM-Node117 /etc/logstash]# cat patterns.d/nginx2 NGUSERNAME [a-zA-Z\.\@\-\+_%]+ NGUSER %{NGUSERNAME} NGINXACCESS_B %{IPORHOST:clientip} (?:-|(%{WORD}.%{WORD})) (?:-|%{USER:ident}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:http_status_code} %{NOTSPACE:request_body} "%{GREEDYDATA:User_Agent}" [root@linuxea-VM-Node117 /etc/logstash]# nginx错误日志的grok[root@linuxea-VM-Node117 /etc/logstash]# cat patterns.d/nginx_error NGUSERNAME [a-zA-Z\.\@\-\+_%]+ NGUSER %{NGUSERNAME} NGINXACCESS_ERROR (?<time>\d{4}/\d{2}/\d{2}\s{1,}\d{2}:\d{2}:\d{2})\s{1,}\[%{DATA:err_severity}\]\s{1,}(%{NUMBER:pid:int}#%{NUMBER}:\s{1,}\*%{NUMBER}|\*%{NUMBER}) %{DATA:err_message}(?:,\s{1,}client:\s{1,}(?<client_ip>%{IP}|%{HOSTNAME}))(?:,\s{1,}server:\s{1,}%{IPORHOST:server})(?:, request: %{QS:request})?(?:, host: %{QS:client_ip})?(?:, referrer: \"%{URI:referrer})? NGINXACCESS_ERROR2 (?<time>\d{4}/\d{2}/\d{2}\s{1,}\d{2}:\d{2}:\d{2})\s{1,}\[%{DATA:err_severity}\]\s{1,}%{GREEDYDATA:err_message} [root@linuxea-VM-Node117 /etc/logstash]#
2018年08月16日
4,982 阅读
0 评论
0 点赞
2018-08-08
linuxea:logstash6.3.2与redis+filebeat示例(三)
在之前的一篇中提到使用redis作为转发思路在前面两篇中写的都是elk的安装,这篇叙述在6.3.2中的一些filebeat收集日志和处理的问题,以nginx为例,后面的可能会有,也可能不会有filebeat安装和配置filebeat会将日志发送到reids,在这期间包含几个配置技巧,在配置文件出会有一些说明下载和安装[root@linuxea-VM_Node-113 ~]# wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-x86_64.rpm -O $PWD/filebeat-6.3.2-x86_64.rpm [root@linuxea-VM_Node_113 ~]# yum localinstall $PWD/filebeat-6.3.2-x86_64.rpm -y启动[root@linuxea-VM_Node-113 /etc/filebeat/modules.d]# systemctl start filebeat.service 查看日志[root@linuxea-VM_Node-113 /etc/filebeat/modules.d]# tail -f /var/log/filebeat/filebeat 2018-08-03T03:13:32.716-0400 INFO pipeline/module.go:81 Beat name: linuxea-VM-Node43_241_158_113.cluster.com 2018-08-03T03:13:32.717-0400 INFO instance/beat.go:315 filebeat start running. 2018-08-03T03:13:32.717-0400 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s 2018-08-03T03:13:32.717-0400 INFO registrar/registrar.go:80 No registry file found under: /var/lib/filebeat/registry. Creating a new registry file. 2018-08-03T03:13:32.745-0400 INFO registrar/registrar.go:117 Loading registrar data from /var/lib/filebeat/registry 2018-08-03T03:13:32.745-0400 INFO registrar/registrar.go:124 States Loaded from registrar: 0 2018-08-03T03:13:32.745-0400 INFO crawler/crawler.go:48 Loading Inputs: 1 2018-08-03T03:13:32.745-0400 INFO crawler/crawler.go:82 Loading and starting Inputs completed. Enabled inputs: 0 2018-08-03T03:13:32.746-0400 INFO cfgfile/reload.go:122 Config reloader started 2018-08-03T03:13:32.746-0400 INFO cfgfile/reload.go:214 Loading of config files completed. 2018-08-03T03:14:02.719-0400 INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s配置文件在此配中paths下的是写日志的路径,可以使用通配符,但是如果你使用通配符后就意味着目录下的日志写在一个fields的id中,这个id会传到redis中,在传递到logstash中,最终以一个id的形式传递到kibana当然,这里测试用两个来玩,如下filebeat.prospectors: - type: log enabled: true paths: - /data/wwwlogs/1015.log fields: list_id: 113_1015_nginx_access - input_type: log paths: - /data/wwwlogs/1023.log fields: list_id: 113_1023_nginx_access filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false setup.template.settings: index.number_of_shards: 3 output.redis: hosts: ["IP:PORT"] password: "OTdmOWI4ZTM4NTY1M2M4OTZh" db: 2 timeout: 5 key: "%{[fields.list_id]:unknow}"在output中的key: "%{[fields.list_id]:unknow}"意思是如果[fields.list_id]有值就匹配,如果没有就unknow,最终传递给redis中redis安装在我意淫的这套里面,redis用来转发数据的,他可以说集群也可以说单点,取决于数据量的大小按照我以往的骚操作,redis当然要用docker来跑,运行一下命令进行安装curl -Lks4 https://raw.githubusercontent.com/LinuxEA-Mark/docker-alpine-Redis/master/Sentinel/install_redis.sh|bash安装完成在/data/rds下有一个docker-compose.yaml文件,如下:[root@iZ /data/rds]# cat docker-compose.yaml version: '2' services: redis: build: context: https://raw.githubusercontent.com/LinuxEA-Mark/docker-alpine-Redis/master/Sentinel/Dockerfile container_name: redis restart: always network_mode: "host" privileged: true environment: - REQUIREPASSWD=OTdmOWI4ZTM4NTY1M2M4OTZh - MASTERAUTHPAD=OTdmOWI4ZTM4NTY1M2M4OTZh volumes: - /etc/localtime:/etc/localtime:ro - /data/redis-data:/data/redis:Z - /data/logs:/data/logsredis查看写入情况[root@iZ /etc/logstash/conf.d]# redis-cli -h 127.0.0.1 -a OTdmOWI4ZTM4NTY1M2M4OTZh 127.0.0.1:6379> select 2 OK 127.0.0.1:6379[2]> keys * 1) "113_1015_nginx_access" 2) "113_1023_nginx_access" 127.0.0.1:6379[2]> lrange 113_1023_nginx_access 0 -1 1) "{\"@timestamp\":\"2018-08-04T04:36:26.075Z\",\"@metadata\":{\"beat\":\"\",\"type\":\"doc\",\"version\":\"6.3.2\"},\"beat\":{\"name\":\"linuxea-VM-Node43_13.cluster.com\",\"hostname\":\"linuxea-VM-Node43_23.cluster.com\",\"version\":\"6.3.2\"},\"host\":{\"name\":\"linuxea-VM-Node43_23.cluster.com\"},\"offset\":863464,\"message\":\"IP - [\xe\xe9\x9797\xb4:0.005 [200] [200] \xe5\x9b4:[0.005] \\\"IP:51023\\\"\",\"source\":\"/data/wwwlogs/1023.log\",\"fields\":{\"list_id\":\"113_1023_nginx_access\"}}"logstash安装和配置logstash在内网进行安装和配置,用来抓取公网redis的数据,抓到本地后发送es,在到看kibana[root@linuxea-VM-Node117 ~]# curl -Lk https://artifacts.elastic.co/downloads/logstash/logstash-6.3.2.tar.gz|tar xz -C /usr/local && useradd elk && cd /usr/local/ && ln -s logstash-6.3.2 logstash && mkdir /data/logstash/{db,logs} -p && chown -R elk.elk /data/logstash/ /usr/local/logstash-6.3.2 && cd logstash/config/ && mv logstash.yml logstash.yml.bak 配置文件在这个配置文件之前下载ip库,在地图中会用到,稍后配置到配置文件准备工作安装GeoLite2-City[root@linuxea-VM-Node117 ~]# curl -Lk http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz|tar xz -C /usr/local/logstash-6.3.2/config/在之前5.5版本也做过nginx的格式化,直接参考groknginx log_format准备log_format upstream2 '$proxy_add_x_forwarded_for $remote_user [$time_local] "$request" $http_host' '[$body_bytes_sent] $request_body "$http_referer" "$http_user_agent" [$ssl_protocol] [$ssl_cipher]' '[$request_time] [$status] [$upstream_status] [$upstream_response_time] [$upstream_addr]';nginx patterns准备,将日志和patterns可以放在kibana grok检查,也可以在grokdebug试试,不过6.3.2的两个结果并不相同[root@linuxea-VM-Node117 /usr/local/logstash-6.3.2/config]# cat patterns.d/nginx NGUSERNAME [a-zA-Z\.\@\-\+_%]+ NGUSER %{NGUSERNAME} NGINXACCESS %{IP:clent_ip} (?:-|%{USER:ident}) \[%{HTTPDATE:log_date}\] \"%{WORD:http_verb} (?:%{PATH:baseurl}\?%{NOTSPACE:params}(?: HTTP/%{NUMBER:http_version})?|%{DATA:raw_http_request})\" (%{IPORHOST:url_domain}|%{URIHOST:ur_domain}|-)\[(%{BASE16FLOAT:request_time}|-)\] %{NOTSPACE:request_body} %{QS:referrer_rul} %{GREEDYDATA:User_Agent} \[%{GREEDYDATA:ssl_protocol}\] \[(?:%{GREEDYDATA:ssl_cipher}|-)\]\[%{NUMBER:time_duration}\] \[%{NUMBER:http_status_code}\] \[(%{BASE10NUM:upstream_status}|-)\] \[(%{NUMBER:upstream_response_time}|-)\] \[(%{URIHOST:upstream_addr}|-)\]配置文件如下:在input中的key写的是reids中的key其中在filebeat的 key是"%{[fields.list_id]:unknow}",这里进行匹配[fields.list_id],在其中表现的是if [fields][list_id] 如果等于113_1015_nginx_access,匹配成功则进行处理grok部分是nginx的patternsgeoip中的database需要指明,source到clent_ip对useragent也进行处理ooutput中需要填写 用户和密码以便于链接到es,当然如果你没有破解或者使用正版,你是不能使用验证的,但是你可以参考x-pack的破解input { redis { host => "47" port => "6379" key => "113_1015_nginx_access" data_type => "list" password => "I4ZTM4NTY1M2M4OTZh" threads => "5" db => "2" } } filter { if [fields][list_id] == "113_1023_nginx_access" { grok { patterns_dir => [ "/usr/local/logstash-6.3.2/config/patterns.d/" ] match => { "message" => "%{NGINXACCESS}" } overwrite => [ "message" ] } geoip { source => "clent_ip" target => "geoip" database => "/usr/local/logstash-6.3.2/config/GeoLite2-City.mmdb" } useragent { source => "User_Agent" target => "userAgent" } urldecode { all_fields => true } mutate { gsub => ["User_Agent","[\"]",""] #将user_agent中的 " 换成空 convert => [ "response","integer" ] convert => [ "body_bytes_sent","integer" ] convert => [ "bytes_sent","integer" ] convert => [ "upstream_response_time","float" ] convert => [ "upstream_status","integer" ] convert => [ "request_time","float" ] convert => [ "port","integer" ] } date { match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ] } } } output { if [fields][list_id] == "113_1023_nginx_access" { elasticsearch { hosts => ["10.10.240.113:9200","10.10.240.114:9200"] index => "logstash-113_1023_nginx_access-%{+YYYY.MM.dd}" user => "elastic" password => "linuxea" } } stdout {codec => rubydebug} }json但是也不是很骚,于是这次加上json,像这样log_format json '{"@timestamp":"$time_iso8601",' '"clent_ip":"$proxy_add_x_forwarded_for",' '"user-agent":"$http_user_agent",' '"host":"$server_name",' '"status":"$status",' '"method":"$request_method",' '"domain":"$host",' '"domain2":"$http_host",' '"url":"$request_uri",' '"url2":"$uri",' '"args":"$args",' '"referer":"$http_referer",' '"ssl-type":"$ssl_protocol",' '"ssl-key":"$ssl_cipher",' '"body_bytes_sent":"$body_bytes_sent",' '"request_length":"$request_length",' '"request_body":"$request_body",' '"responsetime":"$request_time",' '"upstreamname":"$upstream_http_name",' '"upstreamaddr":"$upstream_addr",' '"upstreamresptime":"$upstream_response_time",' '"upstreamstatus":"$upstream_status"}';在nginx.conf中添加后,在主机段进行修改,但是这样一来,你日志的可读性就低了。但是,你的lostash性能会提升,因为logstash不会处理grok,直接将收集的日子转发到es这里需要说明的是,我并没有使用json,是因为他不能将useragent处理好,我并没有找到可行的方式,如果你知道,你可以告诉我但是,你可以这样。比如说使用*.log输入所有到redis,一直到kibana,然后通过kibana来做分组显示启动:nohup sudo -u elk /usr/local/logstash-6.3.2/bin/logstash -f ./conf.d/*.yml >./nohup.out 2>&1 &如果不出意外,你会在kibana中看到以logstash-113_1023_nginx_access-%{+YYYY.MM.dd}的索引
2018年08月08日
3,442 阅读
0 评论
0 点赞
1
2
...
4