安装Fluntd
fluntd是打通web日志流到HDFS的工具,虽然它是第三方的,并不在hadoop生态圈,但是比生态圈里面的flume易用。
参考文档 http://docs.fluentd.org/articles/quickstart
1.下载安装
> curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh
> /etc/init.d/td-agent start
2.开启webHDFS
webHDFS指令集参见http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
> cd /home/bigdata/hadoop-2.6.4
> vi etc/hadoop/hdfs-site.xml
配置如下
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.support.broken.append</name>
<value>true</value>
</property>
> sbin/stop-dfs.sh
> sbin/start-dfs.sh
> bin/hdfs dfs -chmod 777 /tmp #上传文件会遇到权限问题
> curl -i "http://localhost:50070/webhdfs/v1/?op=LISTSTATUS" #测试一下webHDFS是否正常工作
3.配置fluntd将nginx日志输出到HDFS
> vi /etc/td-agent/td-agent.conf
配置如下
<source>
type tail
format /^(?<remote_addr>[^ ]*) - - \[(?<time_local>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<status>[^ ]*) (?<size>[^ ]*) "(?<refer>[^ ]*)" "(?<agent>[^\"]*)" "(?<forward>[^ ]*)"$/
path /var/log/nginx/access.log
tag nginx.access
</source>
<match nginx.access>
type webhdfs
host localhost
port 50070
path /tmp/access.log
flush_interval 10s
</match>
> chmod 777 /var/log/nginx/access.log #开放fluntd对log的访问权限
> /etc/init.d/td-agent restart #访问一下 http://{serverip}/ ,让nginx产生访问日志
> tail -f /var/log/td-agent/td-agent.log #可以看到fluntd的服务日志,判断是否正常工作
> cd /home/bigdata/hadoop-2.6.4
> bin/hdfs dfs -tail /tmp/access.log #看看是不是nginx日志已经成功推送到了HDFS,也可以访问http://{serverip}:50070,web上查看该文件