-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integer 32 & 64 types #800
Conversation
Flow PHP - BenchmarksResults of the benchmarks from this PR are compared with the results from 1.x branch. Extractors+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench | bench_extract_10k | 1 | 3 | 34.748mb +0.02% | 1.154s +0.29% | ±0.39% -25.13% |
| CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.605mb +0.08% | 300.840ms -0.91% | ±0.17% -92.61% |
| JsonExtractorBench | bench_extract_10k | 1 | 3 | 4.772mb +0.13% | 1.388s -0.66% | ±0.18% -81.60% |
| ParquetExtractorBench | bench_extract_10k | 1 | 3 | 239.476mb +0.00% | 1.575s -0.61% | ±0.50% +91.90% |
| TextExtractorBench | bench_extract_10k | 1 | 3 | 4.558mb +0.01% | 24.261ms +0.16% | ±1.62% -44.37% |
| XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.558mb +0.01% | 416.016ms +2.28% | ±1.43% +233.90% |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 110.249mb +0.01% | 62.523ms -3.33% | ±1.27% -42.01% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench | bench_load_10k | 1 | 3 | 94.728mb +0.01% | 485.995ms +9.52% | ±1.47% +212.00% |
| CSVLoaderBench | bench_load_10k | 1 | 3 | 54.711mb +0.01% | 71.594ms +1.43% | ±0.37% -71.09% |
| JsonLoaderBench | bench_load_10k | 1 | 3 | 105.311mb +0.01% | 53.975ms -2.95% | ±0.80% +28.03% |
| ParquetLoaderBench | bench_load_10k | 1 | 3 | 320.783mb +0.00% | 1.457s +1.31% | ±0.52% +5.30% |
| TextLoaderBench | bench_load_10k | 1 | 3 | 17.589mb +0.02% | 40.720ms +0.24% | ±0.21% -82.79% |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| RowsBench | bench_chunk_10_on_10k | 2 | 3 | 76.296mb +0.01% | 2.242ms -6.30% | ±3.13% -10.20% |
| RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 96.086mb +0.01% | 181.941ms +0.19% | ±2.67% +200.48% |
| RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 74.612mb +0.01% | 18.019ms -0.85% | ±0.22% -92.99% |
| RowsBench | bench_drop_1k_on_10k | 2 | 3 | 75.434mb +0.01% | 1.728ms +2.65% | ±3.89% +95.73% |
| RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 75.434mb +0.01% | 1.695ms -2.02% | ±2.53% +13.07% |
| RowsBench | bench_entries_on_10k | 2 | 3 | 74.648mb +0.01% | 2.620ms +0.59% | ±1.46% +10.56% |
| RowsBench | bench_filter_on_10k | 2 | 3 | 75.177mb +0.01% | 14.116ms +0.94% | ±1.20% +41.02% |
| RowsBench | bench_find_on_10k | 2 | 3 | 75.176mb +0.01% | 14.504ms +4.31% | ±1.15% -30.73% |
| RowsBench | bench_find_one_on_10k | 10 | 3 | 73.079mb +0.01% | 1.794μs -5.28% | ±2.67% +5.66% |
| RowsBench | bench_first_on_10k | 10 | 3 | 73.079mb +0.01% | 0.400μs 0.00% | ±0.00% 0.00% |
| RowsBench | bench_flat_map_on_1k | 2 | 3 | 86.638mb +0.01% | 12.345ms -6.78% | ±2.30% -32.14% |
| RowsBench | bench_map_on_10k | 2 | 3 | 115.995mb +0.01% | 62.161ms -6.90% | ±2.13% +238.89% |
| RowsBench | bench_merge_1k_on_10k | 2 | 3 | 75.697mb +0.01% | 1.923ms +1.27% | ±0.41% -81.11% |
| RowsBench | bench_partition_by_on_10k | 2 | 3 | 77.965mb +0.01% | 33.076ms -0.98% | ±0.96% +2590.90% |
| RowsBench | bench_remove_on_10k | 2 | 3 | 77.798mb +0.01% | 4.524ms +2.71% | ±1.06% +4.02% |
| RowsBench | bench_sort_asc_on_1k | 2 | 3 | 73.222mb +0.01% | 38.607ms +1.37% | ±0.23% -64.07% |
| RowsBench | bench_sort_by_on_1k | 2 | 3 | 73.223mb +0.01% | 38.801ms -1.83% | ±0.36% -69.05% |
| RowsBench | bench_sort_desc_on_1k | 2 | 3 | 73.222mb +0.01% | 38.816ms +0.70% | ±0.36% +4.71% |
| RowsBench | bench_sort_entries_on_1k | 2 | 3 | 75.522mb +0.01% | 7.420ms +2.40% | ±0.25% -4.26% |
| RowsBench | bench_sort_on_1k | 2 | 3 | 73.079mb +0.01% | 28.819ms -0.04% | ±0.67% -32.68% |
| RowsBench | bench_take_1k_on_10k | 10 | 3 | 73.079mb +0.01% | 13.342μs -0.58% | ±2.23% +139.02% |
| RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 73.079mb +0.01% | 15.400μs -1.78% | ±0.00% -100.00% |
| RowsBench | bench_unique_on_1k | 2 | 3 | 96.087mb +0.01% | 185.643ms +1.41% | ±1.20% +176.53% |
| NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 115.842mb +0.01% | 772.500ms +1.47% | ±0.87% -32.64% |
| NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 59.560mb +0.01% | 391.210ms +0.26% | ±1.21% +153.42% |
| NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 14.682mb +0.05% | 79.806ms +0.23% | ±0.39% -64.43% |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
|
int32/int64 should be also represented by schema definition |
Schema\Definition::integer('id', $nullable = false), | ||
Schema\Definition::string('name', $nullable = true), | ||
Schema\Definition::boolean('active', $nullable = false), | ||
Schema\Definition::integer('id', ScalarType::integer32()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to decide here if we prefer to use \Definition::integer()
or \Definition::integer32()
- my problem with passing Type as a second parameter is that we need to validate it, which pretty much defeats the purpose of strongly typed arguments.
Alternatively, we can just drop all static factory methods (string/integer/boolean) and just use the constructor, however, that looks a bit less user-friendly.
Additionally, I would put type into all Definition types, String, Boolean, Integer, we can also unify the metadata name to be something like:
METADATA_PHP_TYPE
which would always hold the instance of Type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure either how to decide about default that's why it's detecting inside what's max, but that's "magic", so I would go with BC break and force to choose either 32 or 64 on the user side but internally use 64 (similarly as it's now ie in parquet schema converter), with that I would also add check and throw exception when 64 is used when available is max 32.
So it would be: Entry::integer64()
/Entry::integer32()
, Definition::integer64()
etc
Other than that I would save the helpers as DX is good there. Type for scalar is saved too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it over night, and I can't figure out any good use case for the end users to decide if they want to use int64 vs int63, yeah, one requires less space but with current storage prices I'm not sure how relevant that is.
Originally I was planning to use that in order to know how to map Flow Int into Parquet Int and maybe we just keep it this way. Let me think more about it, I'm closing this PR for now, will get back to it
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
Closes #771