Skip to content
This repository has been archived by the owner on Jul 10, 2019. It is now read-only.

switch to new Hadoop API #18

Open
jnioche opened this issue Mar 30, 2011 · 3 comments
Open

switch to new Hadoop API #18

jnioche opened this issue Mar 30, 2011 · 3 comments

Comments

@jnioche
Copy link
Member

jnioche commented Mar 30, 2011

No description provided.

@butlermh
Copy link

See https://github.com/butlermh/behemoth/commit/7411aa9cbd0fd1bddd61545a9a503daff5d8dcf8

It turns out updating to the new API is a bad idea, DistributedCache does not work with the new API - see
https://issues.apache.org/jira/browse/MAPREDUCE-898
http://lucene.472066.n3.nabble.com/Distributed-Cache-with-New-API-td722187.html
this breaks the SOLR, UIMA and GATE modules.

Also for the IO module, for WARC, not quite sure how to deal with MultiFileSplits. This has been replaced by CombineFileSplit, however it still implements the interface InputSplit, whereas the new api uses a class called InputSplit. So not clear how this needs to change either.

@butlermh
Copy link

@butlermh
Copy link

butlermh commented Jun 6, 2011

In the end, I did manage to find a way of doing this, except for WARC - see https://github.com/butlermh/behemoth/commit/97150bd579ae74eefacae85422937698f2c72445

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants