Friday, August 16, 2013

A simple tutorial on how to setup Apache flume, HDFS, Oozie and Hive (1)


A simple tutorial on how to setup Apache flume, HDFS, Oozie and Hive (2)

In this tutorial we will use Apache flume, HDFS, Oozie and Hive to design a data pipeline that will enable us to analyze Twitter data. The contorl flow data pipeline is below:



Use Flume to get data from Twitter:

Apache Flume is a data ingestion system for HDFS. Flume is configured by defining endpoints in a data flow called sources and sinks. Since we use tweets as sample data, so each tweets is called an event in this tutorial. The source (Twitter Streaming API) produces events, and the sink writes the events out to a location. Between sourece and the sink, there is channel. Source sends data to sink through channel.

Before we start configuing flume agent, I assume you have CDH4 installed, specially Hadoop, Flume, Oozie, and Hive.

Install flume:
I installed my flume agent through Cloudera Manager, if you want to install flume by packages, you need to intall:
  • flume-ng — Everything you need to run Flume
  • flume-ng-agent — Handles starting and stopping the Flume agent as a service
  • flume-ng-doc — Flume documentation

Install Flume, Flume agent and Flume doc on Ubuntu and other Debian systems:
# apt-get install flume-ng
# apt-get install flume-ng-agent
# apt-get install flume-ng-doc


Install Flume, Flume agent and Flume doc on Red Hat-compatible systems:
# yum install flume-ng
# yum install flume-ng-agent
# yum install flume-ng-doc

Configure flume:
1. Download the pre-built version of Flume from here: flume-sources-1.0-SNAPSHOT.jar
2. Add the JAR to the Flume classpath:
$ sudo cp /etc/flume-ng/conf/flume-env.sh.template /etc/flume-ng/conf/flume-env.sh
Edit the flume-env.sh file and uncomment the FLUME_CLASSPATH line, and enter the path to the JAR. If adding multiple paths, separate them with a colon.

Note: if you use Cloudera Manager, you need to make the modification in CM, because Cloudera Manager use instance services and /etc/flume/conf is just a symlink to a service.

Add classpath in CM:
"Services" -> "flume1" -> "Configuration" -> "Agent(Default)" -> "Advanced" -> "Java Configuration Options for Flume Agent", add:
--classpath /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar

From command line (assume flume-sources-1.0-SNAPSHOT.jar is in your ~):
$ sudo cp ~/flume-sources-1.0-SNAPSHOT.jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/flume-ng/lib/


3. Set the Flume agent name to TwitterAgent in /etc/default/flume-ng-agent, or in CM, "Services" -> "flume1" -> "Configuration" -> "Agent(Default)" -> "Agent Name".

4. Modify the provided Flume configuration and copy it to /etc/flume-ng/conf
Download flume.conf
Note: The relevant information is available on the Details page for your Twitter app. Fill in the consumer key, consumer secret, access token, and access token secret. The keywords parameter accepts a comma-separated list of keywords to use to filter tweets and collect a relevant set of data. If the parameter is not defined, the Twitter Sample API will be used to collect a sample of the entire Twitter Firehose.

For CM, go to "Services" -> "flume1" -> "Configuration" -> "Agent(Default)" and change the "Configuration File" there.

You need to create a Twitter app to have the consumer key, consumer secret, access token, and access token secret. (Twitter dev API)

It is important that in your flume.conf file you set the correct "TwitterAgent.sinks.HDFS.hdfs.path", "hdfs://your_name_node:8020/user/flume/tweets/%Y/%m/%d/%H/", otherwise you might get the following error:
HDFS IO error

java.net.ConnectException: Call From hadoop1/192.168.1.11 to hadoop1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
at org.apache.hadoop.ipc.Client.call(Client.java:1229)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at ....


6. Restart your flume agent.

7. Check you flume agent log file, you should see the following message:

11:03:28.317 AM  INFO  org.apache.flume.sink.hdfs.BucketWriter  
Creating hdfs://hadoop2:8020/user/flume/tweets/2013/08/16/11//FlumeData.1376665408231.tmp
11:03:59.782 AM INFO org.apache.flume.sink.hdfs.BucketWriter 
Renaming hdfs://hadoop2:8020/user/flume/tweets/2013/08/16/11/FlumeData.1376665408231.tmp to hdfs://hadoop2:8020/user/flume/tweets/2013/08/16/11/FlumeData.1376665408231
11:04:00.367 AM INFO org.apache.flume.sink.hdfs.BucketWriter 
Creating hdfs://hadoop2:8020/user/flume/tweets/2013/08/16/11//FlumeData.1376665408232.tmp


8. Also, from hue web UI, you will see the data is coming:

Note: If you are getting the following twitter error:

1:08:39.817 AM ERROR org.apache.flume.lifecycle.LifecycleSupervisor
Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
1:08:39.826 AM WARN org.apache.flume.lifecycle.LifecycleSupervisor
Component EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:STOP} } stopped, since it could not besuccessfully started due to missing dependencies

You should check the version of your twitter4j-*jar files. Such as "twitter4j-core-3.0.3.jar, twitter4j-media-support-3.0.3.jar, twitter4j-stream-3.0.3.jar". The error is from the twitter4j jar files, if 3.x doesn't work for you, try a lower version of twitter4j file such as 2.6.6 version.

14 comments:

orakanggo said...

hi.. i always get an errors below :

2013-12-23 16:07:36,537 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


please help

Tony Xu said...

Are you using Cloudera Manger + CDH? Also, can you confirm your CM and CDH version (Mine is Cloudera standard 4.6.3 and CDH 4.3.1)? What's the version of your JDK? I am using jdk1.7.x.

Another you might want to check is that FilterQuery.class exists in flume-sources-1.0-SNAPSHOT.jar.

You can use the following command:
/usr/lib/jvm/jdk1.7.0/bin# ./jar tvf /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar | grep FilterQuery
4451 Tue Nov 13 10:06:42 EST 2012 twitter4j/FilterQuery.class

stephane elias said...

Hello, I am on Cloudera. And I use CM. I followed your tutorial and this is my flume log:

1:08:39.720 AM INFO org.apache.flume.node.Application

Starting Sink HDFS

1:08:39.721 AM INFO org.apache.flume.node.Application

Starting Source Twitter

1:08:39.725 AM INFO org.apache.flume.instrumentation.MonitoredCounterGroup

Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.

1:08:39.725 AM INFO org.apache.flume.instrumentation.MonitoredCounterGroup

Component type: SINK, name: HDFS started

1:08:39.817 AM ERROR org.apache.flume.lifecycle.LifecycleSupervisor

Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

1:08:39.826 AM WARN org.apache.flume.lifecycle.LifecycleSupervisor

Component EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:STOP} } stopped, since it could not besuccessfully started due to missing dependencies



Do you have any clue on why I get the error?
Ps: I have CDH4

stephane elias said...

Hello, I am on Cloudera. And I use CM. I followed your tutorial and this is my flume log:

1:08:39.720 AM INFO org.apache.flume.node.Application

Starting Sink HDFS

1:08:39.721 AM INFO org.apache.flume.node.Application

Starting Source Twitter

1:08:39.725 AM INFO org.apache.flume.instrumentation.MonitoredCounterGroup

Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.

1:08:39.725 AM INFO org.apache.flume.instrumentation.MonitoredCounterGroup

Component type: SINK, name: HDFS started

1:08:39.817 AM ERROR org.apache.flume.lifecycle.LifecycleSupervisor

Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

1:08:39.826 AM WARN org.apache.flume.lifecycle.LifecycleSupervisor

Component EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:STOP} } stopped, since it could not besuccessfully started due to missing dependencies



Do you have any clue on why I get the error?
Ps: I have CDH4

stephane elias said...

Hello i have CDH 4 and I use Cloudera Manager. I folowed the steps but here is my flume log:
1:08:39.720 AM INFO org.apache.flume.node.Application

Starting Sink HDFS

1:08:39.721 AM INFO org.apache.flume.node.Application

Starting Source Twitter

1:08:39.725 AM INFO org.apache.flume.instrumentation.MonitoredCounterGroup

Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.

1:08:39.725 AM INFO org.apache.flume.instrumentation.MonitoredCounterGroup

Component type: SINK, name: HDFS started

1:08:39.817 AM ERROR org.apache.flume.lifecycle.LifecycleSupervisor

Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

1:08:39.826 AM WARN org.apache.flume.lifecycle.LifecycleSupervisor

Component EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:STOP} } stopped, since it could not besuccessfully started due to missing dependencies

do you have any clue on how to solve this?

Tony Xu said...

Hi Stephane, can you check the version of your twitter4j jar files? Looks like the conflicts are from twitter4j*.jar files. Are you using 3.X version of twitter4j files?

vishal singh said...

Hi Tony I am using CH4 4.6.I followed steps according to your blog I am facing following error:
Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:42)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

vishal singh said...

Hi Tony I am using CH4 4.6.I followed steps according to your blog I am facing following error:
Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:42)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

vishal singh said...

Hi Tony,
I am using cdh4 4.6 and trying to follow your method to configure flume with the help of cloudera manager.I am getting the following error.
.810 PM INFO org.apache.flume.node.PollingPropertiesFileConfigurationProvider
Configuration provider starting
9:46:17.937 PM INFO org.apache.flume.node.PollingPropertiesFileConfigurationProvider
Reloading configuration file:/var/run/cloudera-scm-agent/process/154-flume-AGENT/flume.conf
9:46:17.954 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.956 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Added sinks: HDFS Agent: TwitterAgent
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:17.957 PM INFO org.apache.flume.conf.FlumeConfiguration
Processing:HDFS
9:46:22.379 PM INFO org.apache.flume.conf.FlumeConfiguration
Post-validation flume configuration contains configuration for agents: [TwitterAgent]
9:46:22.380 PM INFO org.apache.flume.node.AbstractConfigurationProvider
Creating channels
9:46:22.412 PM INFO org.apache.flume.channel.DefaultChannelFactory
Creating instance of channel MemChannel type memory
9:46:22.436 PM INFO org.apache.flume.node.AbstractConfigurationProvider
Created channel MemChannel
9:46:22.437 PM INFO org.apache.flume.source.DefaultSourceFactory
Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
9:46:25.338 PM ERROR org.apache.flume.node.PollingPropertiesFileConfigurationProvider
Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:42)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Tony Xu said...

Hi Vishal, have you checked your twitter jar file version? Did you try a newer version of twitter jar file?

Kumar Vaibhav said...

Dear Vishal,

I was also getting same error

Please follow this link:
http://ambracode.com/index/show/97774

Khushboo Tiwari said...

from where do i get a newer version of twitter jar file ?

Khushboo Tiwari said...

from where do i get newer version of twitter jar file?

Tony Xu said...

A quick google returns : http://twitter4j.org/en/