Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when getting all records. #491

Open
bruno-trindade opened this issue Feb 11, 2021 · 9 comments
Open

Performance issues when getting all records. #491

bruno-trindade opened this issue Feb 11, 2021 · 9 comments

Comments

@bruno-trindade
Copy link

Hello, we're new to using Dynamoid, and DynamoDB in general. I'm hoping someone can explain why there is a large difference in latency between these calls:

MyDynamoidModel.where(the_hash_key: key).all.to_a

-VS-

@aws_ddb_resouce_table.query(
  table_name: "dynamoid_my_dynamoid_models",
  key_condition_expression: "the_hash_key = :the_hash_key",
  expression_attribute_values: {
    ":the_hash_key" => the_hash_key
   }
)

The first version runs in about 28s. The second one about 1.5s. Having to call to_a is a little weird, maybe there's a preferred way?

@andrykonchin
Copy link
Member

andrykonchin commented Feb 11, 2021

I would say there is only one valid case when this situation can happen. Query result is paginated with 1 MB size limit (documentation). #where method loads lazily by demand all the pages. I assume that #query does not.

So please check result of each call and compare number of returned records. If they equal - we have performance issue in Dynamoid. Otherwise it's expected result.

@bruno-trindade
Copy link
Author

I would say there is only one valid case when this situation can happen. Query result is paginated with 1 MB size limit (documentation). #where method loads lazily by demand all the pages. I assume that #query does not.

So please check result of each call and compare number of returned records. If they equal - we have performance issue in Dynamoid. Otherwise it's expected result.

The number of returned records is exactly the same, a total of 1113

@andrykonchin
Copy link
Member

andrykonchin commented Feb 11, 2021

Well, I would like to check logs and compare API requests in both cases to ensure the performance issue is related to Dynamoid internals and not related to generated API requests to DynamoDB.

Could you please provide logged requests to DynamoDB for both cases?

For Dynamoid you can enable logging this way:

Dynamoid::Config.logger.level = :debug

@andrykonchin
Copy link
Member

I have another idea what could happen. If a hash key name in a table is not default id then it should be specified explicitly in a model class:

table key: :the_hash_key

Dynamoid doesn't analyze a DynamoDB table schema (maybe we have to change this) and relies on a schema declared explicitly in code. If the_hash_key name isn't recognized as a hash key in the expression where(the_hash_key: key) the Scan operation will be used. If hash key is recognized - then will be used more efficient Query operation.

It will be clear when we have logged API requests to DynamoDB.

@bruno-trindade
Copy link
Author

bruno-trindade commented Feb 12, 2021

I'm not seeing any log output even after adding that configuration. But maybe it's helpful to note that I have a range key as well as a hash key, and my hash key is also declared as a field which maybe it doesn't need to be - could this be the problem? (I'll keep trying for some log output.)

table capacity_mode:      :provisioned,
          key:                :the_hash_key,
          read_capacity:      350,
          write_capacity:     100
range :the_sort_key,  :string
field :the_hash_key, :integer

@bruno-trindade
Copy link
Author

Confirming that removing that unneeded field declaration seems to fix the problem, and now it runs blazing fast. This is obviously a bug in my code, but I wonder if there are other dummies like me who might benefit from the docs saying you shouldn't declare the field for the hash key or something. :)

@bruno-trindade
Copy link
Author

bruno-trindade commented Feb 12, 2021

Actually, without that I'm getting this error:

Aws::DynamoDB::Errors::ValidationException: One or more parameter values were invalid: Condition parameter type does not match schema type

Edit
I think here's a clue. The field I had was field :the_hash_key, :integer. Now that I removed the field, I can use find to locate the record, and I can see that the attributes before typecast have the hash ID as a decimal, but after it's a string:

@attributes=
  {:created_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :updated_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :the_hash_key=>"209963.0",
  },
   @attributes_before_type_cast=
  {:created_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :updated_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :the_hash_key=>0.209963e6,
}

Is there any way I can drop the field :user_id and not lose my ability to query via where?

@bruno-trindade
Copy link
Author

Yeah so removing the field just means that Dynamoid now just assumes the key is of the "string" type, which causes the mismatch with DynamoDB. My initial tests were flawed, so I think this long query lag is still a problem. I'll continue to try to get log output.

@andrykonchin
Copy link
Member

andrykonchin commented Feb 12, 2021

Regarding logging. This code snippet

class MyDynamoidModel
  include Dynamoid::Document

  table capacity_mode:      :provisioned,
            key:                :the_hash_key,
            read_capacity:      350,
            write_capacity:     100
  range :the_sort_key,  :string
  field :the_hash_key, :integer
end

Dynamoid.logger.level = :debug
MyDynamoidModel.create_table
MyDynamoidModel.where(the_hash_key: 1).all.to_a

# => []

produces the following logs:

D, [2021-02-12T23:28:58.087077 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.007128 0 retries] list_tables(exclusive_start_table_name:nil)

D, [2021-02-12T23:28:58.087569 #43894] DEBUG -- : (18.69 ms) LIST TABLES
D, [2021-02-12T23:28:58.087618 #43894] DEBUG -- : (19.08 ms) CACHE TABLES
I, [2021-02-12T23:28:58.087669 #43894]  INFO -- : Creating dynamoid_mydynamoidmodels table. This could take a while.
D, [2021-02-12T23:28:58.110728 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.022555 0 retries] create_table(table_name:"dynamoid_mydynamoidmodels",key_schema:[{attribute_name:"the_hash_key",key_type:"HASH"},{attribute_name:"the_sort_key",key_type:"RANGE"}],attribute_definitions:[{attribute_name:"the_hash_key",attribute_type:"N"},{attribute_name:"the_sort_key",attribute_type:"S"}],billing_mode:"PROVISIONED",provisioned_throughput:{read_capacity_units:350,write_capacity_units:100})

D, [2021-02-12T23:28:58.110923 #43894] DEBUG -- : (23.25 ms) CREATE TABLE
D, [2021-02-12T23:28:58.119548 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.008031 0 retries] describe_table(table_name:"dynamoid_mydynamoidmodels")

D, [2021-02-12T23:28:58.135503 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.01518 0 retries] query(consistent_read:false,scan_index_forward:true,table_name:"dynamoid_mydynamoidmodels",key_conditions:{"the_hash_key"=>{comparison_operator:"EQ",attribute_value_list:[{n:"1"}]}},query_filter:{},attributes_to_get:nil)

I assume that AWS client log formatter could not be set so please try to set any from the list described here #459.

Anyway, you can just provide complete code snippets to reproduce the issue - both for Dynamoid (with model class declaration) and AWS SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants