安装Fluntd

fluntd是打通web日志流到HDFS的工具,虽然它是第三方的,并不在hadoop生态圈,但是比生态圈里面的flume易用。

参考文档 http://docs.fluentd.org/articles/quickstart

1.下载安装

> curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh
> /etc/init.d/td-agent start

2.开启webHDFS

webHDFS指令集参见http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

> cd /home/bigdata/hadoop-2.6.4
> vi etc/hadoop/hdfs-site.xml

配置如下

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.support.append</name>
  <value>true</value>
</property>
<property>
  <name>dfs.support.broken.append</name>
  <value>true</value>
</property>
> sbin/stop-dfs.sh
> sbin/start-dfs.sh
> bin/hdfs dfs -chmod 777 /tmp        #上传文件会遇到权限问题
> curl -i  "http://localhost:50070/webhdfs/v1/?op=LISTSTATUS"        #测试一下webHDFS是否正常工作

3.配置fluntd将nginx日志输出到HDFS

> vi /etc/td-agent/td-agent.conf

配置如下

<source>
  type tail
  format /^(?<remote_addr>[^ ]*) - - \[(?<time_local>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<status>[^ ]*) (?<size>[^ ]*) "(?<refer>[^ ]*)" "(?<agent>[^\"]*)" "(?<forward>[^ ]*)"$/
  path /var/log/nginx/access.log
  tag nginx.access
</source>
<match nginx.access>
  type webhdfs
  host localhost
  port 50070
  path /tmp/access.log
  flush_interval 10s
</match>
> chmod 777 /var/log/nginx/access.log    #开放fluntd对log的访问权限
> /etc/init.d/td-agent restart    #访问一下 http://{serverip}/    ,让nginx产生访问日志
> tail -f /var/log/td-agent/td-agent.log    #可以看到fluntd的服务日志,判断是否正常工作
> cd /home/bigdata/hadoop-2.6.4
> bin/hdfs dfs -tail /tmp/access.log    #看看是不是nginx日志已经成功推送到了HDFS,也可以访问http://{serverip}:50070,web上查看该文件