Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BitSail][Connector] Support ElasticSearch Source connector. #146

Open
BlockLiu opened this issue Nov 9, 2022 · 6 comments · May be fixed by #336
Open

[BitSail][Connector] Support ElasticSearch Source connector. #146

BlockLiu opened this issue Nov 9, 2022 · 6 comments · May be fixed by #336
Assignees
Labels
difficulty-medium Medium difficulty to fix this issue help wanted Extra attention is needed

Comments

@BlockLiu
Copy link
Collaborator

BlockLiu commented Nov 9, 2022

Is your feature request related to a problem? Please describe

Support ElasticSearch reader.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@garyli1019 garyli1019 added help wanted Extra attention is needed difficulty-medium Medium difficulty to fix this issue labels Nov 9, 2022
@liuxiaocs7
Copy link
Contributor

I want to try this, please assign to me, thx.

@BlockLiu
Copy link
Collaborator Author

Nice, please take your time :D

@liuxiaocs7
Copy link
Contributor

liuxiaocs7 commented Jan 17, 2023

In PR #336

I use scroll api to implement paging query.

Each index is now considered a split, may be we coule use the slice parameter to break it down later, just like this link

The job conf looks like this:

{
  "job": {
    "reader": {
      "class": "com.bytedance.bitsail.connector.elasticsearch.source.ElasticsearchSource",
      "es_hosts": ["http://localhost:1234"],
      "es_index": "test1, test2, test3",
      "scroll_size": 3,
      "scroll_time": "1m",
      "columns": [
        {
          "index": 0,
          "name": "id",
          "type": "integer"
        },
        {
          "index": 1,
          "name": "text_type",
          "type": "text"
        },
        {
          "index": 2,
          "name": "keyword_type",
          "type": "keyword"
        },
        {
          "index": 3,
          "name": "long_type",
          "type": "long"
        },
        {
          "index": 4,
          "name": "date_type",
          "type": "date"
        }
      ]
    }
  }
}

@BlockLiu do you think it's ok, thx.

@BlockLiu
Copy link
Collaborator Author

In PR #336

I use scroll api to implement paging query.

Each index is now considered a split, may be we coule use the slice parameter to break it down later, just like this link

The job conf looks like this:

{
  "job": {
    "reader": {
      "class": "com.bytedance.bitsail.connector.elasticsearch.source.ElasticsearchSource",
      "es_hosts": ["http://localhost:1234"],
      "es_index": "test1, test2, test3",
      "scroll_size": 3,
      "scroll_time": "1m",
      "columns": [
        {
          "index": 0,
          "name": "id",
          "type": "integer"
        },
        {
          "index": 1,
          "name": "text_type",
          "type": "text"
        },
        {
          "index": 2,
          "name": "keyword_type",
          "type": "keyword"
        },
        {
          "index": 3,
          "name": "long_type",
          "type": "long"
        },
        {
          "index": 4,
          "name": "date_type",
          "type": "date"
        }
      ]
    }
  }
}

@BlockLiu do you think it's ok, thx.

I think it's a good idea.
And from the note below, I think we can get the shard count at first and then build slices.
截屏2023-01-29 11 48 26

@BlockLiu
Copy link
Collaborator Author

Namely, we can use shard count as slice number.

@liuxiaocs7
Copy link
Contributor

Namely, we can use shard count as slice number.

Thank you for your suggestion, I will continue to complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty-medium Medium difficulty to fix this issue help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants