Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide hooks for custom processing #1003

Open
janhoy opened this issue Sep 4, 2020 · 0 comments · May be fixed by #1004
Open

Provide hooks for custom processing #1003

janhoy opened this issue Sep 4, 2020 · 0 comments · May be fixed by #1004
Labels
feature_request for feature request

Comments

@janhoy
Copy link
Contributor

janhoy commented Sep 4, 2020

Is your feature request related to a problem? Please describe.

Want to use fscrawler for more complex processing before indexing to ES

Describe the solution you'd like

Now the "pipeline" is hardcoded in Java

find new file -> OCR/parse with tika -> index in ES

Rather, provide an ProcessingPipeline plugin that users can replace with their own implementation MyCustomPipeline. This pipeline plugin woud have a simple interface and provide a default impl which works exactly like todays hardcoded:

public interface ProcessingPipeline {
    public SomeContext processFile(SomeContext ctx);
}
public class DefaultProcessingPipeline implements ProcessingPipeline {
    @Override
    public SomeContext processFile(SomeContext ctx) {
        // Default impl goes here, i.e.
        ctx.setBodyText(parseTika(ctx))
        ctx.setEsDoc = createElasticDoc(ctx)
        return ctx
    }

    protected SomeContext createElasticDoc(SomeContext ctx) ...
}

However, users can provide their custom processing logic

public class CustomProcessingPipeline extends DefaultProcessingPipeline {
    @Override
    public SomeContext processFile(SomeContext ctx) {
        // Custom impl, override what you need
        return ctx
    }
}

Describe alternatives you've considered

Just forking the project, or build something from scratch

@janhoy janhoy added the feature_request for feature request label Sep 4, 2020
janhoy added a commit to cominvent/fscrawler that referenced this issue Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature_request for feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant