Make pipelines aware of a timezone configuration #249
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why? 馃摉
While Spark's TimestampType timezone is controlled by the
spark.sql.session.timeZone
configuration option, python's datetime objects have their timezone controlled by the system's timezone (when they don't have a fixed tz suffix). This means some transformations can have their timestamps converted in different ways when running on different systems.An example of possible irregular results happens when we automatically set the
start_date
ofAggregatedFeatureSets
(here). Sometimes the spark and the system can have different timezones, meaning that the timestamp coming from the spark dataframe, when collected into plain python as a datetime object can change, generating astart_date
different then expected.What? 馃敡
This PR proposes to apply a timezone configuration that should be aware by each pipeline and that should be the same between spark and system. This timezone is configurable.
Type of change
Please delete options that are not relevant.
How everything was tested? 馃搹
TODO.
Checklist
bug
,enhancement
,feature
, andreview
.Attention Points鈿狅笍
Replace me for what the reviewer will need to pay attention to in the PR or just to cover any concerns after the merge.