Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in several calendars with new continuity test #4904

Merged
merged 23 commits into from
May 23, 2024

Conversation

sffc
Copy link
Member

@sffc sffc commented May 15, 2024

Fixes #2703
Fixes #4914
Mostly fixes #2713

I wrote a test that checks for invariants on calendar behavior. In addition to ISO round-trip, it tests simple day-based calendar arithmetic, which is supported in all calendars, and it exercises a lot of code paths that aren't otherwise covered, such as month length, year length, and leap year status, which unveiled bugs in multiple calendars (Observational Islamic, Saudi Islamic, Hebrew, Chinese, Coptic, and Ethiopian).

There are several loosely related changes in this PR so I made an effort for it reviewable commit-by-commit.

@@ -269,7 +269,18 @@ pub(crate) fn midnight<C: ChineseBased>(moment: Moment) -> Moment {
pub(crate) fn new_year_in_sui<C: ChineseBased>(prior_solstice: RataDie) -> (RataDie, RataDie) {
// s1 is prior_solstice
// Using 370 here since solstices are ~365 days apart
// Both solstices should fall in December
Copy link
Member Author

@sffc sffc May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion fails for years before about 20,000 BCE. Is this WAI? I'm not sure which factor influences this more:

  1. Is this Gregorian drift? (some web sites tell me that it drifts by a day every 3216 years)
  2. Or is this an artifact of floating point error in astronomical calculations?
  3. Or is this sidereal drift?

This matters because it breaks our invariant that Chinese new year lands on a day that is a positive offset relative to January 19 of the related ISO year. I moved it back to January 1 in this PR, which makes it work for several more millennia, but it still breaks with sufficiently large negative years.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is WAI but we could make the assertion stop failing outside that range, or clamp the Chinese calendar.

AIUI the Gregorian calendar accounts for the precession of the equinoxes (the biggest factor here). There are some other smaller factors that lead to the 3216 number, but the thing is that at that scale the planetary bodies are not that predictable. Basically I'm not convinced there's a sensible way to actually make any predictions about dates that far off for something like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think this is a combination of Gregorian drift and floating point error.

@sffc
Copy link
Member Author

sffc commented May 15, 2024

Tests are failing because I need to regen Chinese cached calendar data (does this break semver or is this new in 1.5?), but I want to wait until we discuss the above question.

@Manishearth
Copy link
Member

I disagree with the commit changing the bounds of the Chinese calendar. Firstly, the calendar is not that old (even the thing about January 20 in 30 AD is dubious, but that at least works with the math).

More importantly, the way the Chinese new year is defined ("such that the winter solstice falls in the eleventh month") mathematically gives you a narrow range of ~30 days in which it may occur. I don't have the math at hand but when implementing this optimization I did the math and.

It also matches Aslaksen's The Mathematics of the Chinese Calendar.

This is a researched and mathematically accurate set of bounds and I do not think we should get rid of it because floating point error is breaking tests for date ranges that are irrelevant for the calendar.

@Manishearth
Copy link
Member

It's no longer an optimization since we have unused space there but I'd still prefer to keep it around, if the computer is producing dates outside of that range the computer is probably wrong.

@sffc
Copy link
Member Author

sffc commented May 21, 2024

Please see my comment above regarding the Chinese calendar drift.

@Manishearth
Copy link
Member

Ah, I was reviewing this on a phone and couldn't see all the comments. I am still leaning towards considering those as cases we shouldn't try to fix. I'm somewhat open to bumping up the new year offset range to allow for more dates, it doesn't cost us anything. We should still document this well and have assertions that between 31 AD and maybe 2000 years from now, the new year falls in the narrower range specified, citing Reingold and Aslasken.

@sffc
Copy link
Member Author

sffc commented May 21, 2024

According to Wolfram Alpha (query):

Winter Solstice Time Winter Solstice Date
2:41 pm LMT Saturday, November 24, 20000 BC (extrapolated Gregorian calendar)
11:27 am LMT Saturday, November 28, 19000 BC (extrapolated Gregorian calendar)
4:45 am LMT Saturday, December 1, 18000 BC (extrapolated Gregorian calendar)
4:27 pm LMT Friday, December 4, 17000 BC (extrapolated Gregorian calendar)
8:57 pm LMT Thursday, December 6, 16000 BC (extrapolated Gregorian calendar)
5:09 pm LMT Wednesday, December 9, 15000 BC (extrapolated Gregorian calendar)
4:16 am LMT Tuesday, December 11, 14000 BC (extrapolated Gregorian calendar)
6:30 am LMT Sunday, December 13, 13000 BC (extrapolated Gregorian calendar)
12:16 am LMT Friday, December 14, 12000 BC (extrapolated Gregorian calendar)
10:50 am LMT Tuesday, December 15, 11000 BC (extrapolated Gregorian calendar)
3:31 pm LMT Saturday, December 15, 10000 BC (extrapolated Gregorian calendar)
4:09 pm LMT Wednesday, December 16, 9000 BC (extrapolated Gregorian calendar)
2:22 pm LMT Sunday, December 16, 8000 BC (extrapolated Gregorian calendar)
11:41 am LMT Thursday, December 17, 7000 BC (extrapolated Gregorian calendar)
9:18 am LMT Monday, December 17, 6000 BC (extrapolated Gregorian calendar)
7:58 am LMT Friday, December 18, 5000 BC (extrapolated Gregorian calendar)
8:15 am LMT Tuesday, December 18, 4000 BC (extrapolated Gregorian calendar)
9:47 am LMT Saturday, December 19, 3000 BC (extrapolated Gregorian calendar)
12:16 pm LMT Wednesday, December 19, 2000 BC (extrapolated Gregorian calendar)
2:50 pm LMT Sunday, December 20, 1000 BC (extrapolated Gregorian calendar)
4:21 pm LMT Thursday, December 20, 1 AD (extrapolated Gregorian calendar)
9:54 am LMT Sunday, December 21, 1000 AD (extrapolated Gregorian calendar)
5:21 am PST Thursday, December 21, 2000
8:18 pm PST Sunday, December 21, 3000
5:21 am PST Thursday, December 21, 4000
7:44 am PST Sunday, December 21, 5000
3:02 am PST Wednesday, December 20, 6000
2:59 pm PST Friday, December 19, 7000
7:57 pm PST Sunday, December 17, 8000
6:36 pm PST Tuesday, December 16, 9000
12:01 pm PST Thursday, December 14, 10000
1:28 am PST Saturday, December 13, 11000
12:14 pm PST Sunday, December 10, 12000
9:42 pm PST Monday, December 8, 13000
6:41 am PST Wednesday, December 6, 14000
3:53 pm PST Thursday, December 4, 15000
1:10 am PST Saturday, December 2, 16000
10:11 am PST Sunday, November 30, 17000
5:55 pm PST Monday, November 27, 18000
11:13 pm PST Tuesday, November 25, 19000
12:47 am PST Thursday, November 23, 20000

@sffc
Copy link
Member Author

sffc commented May 21, 2024

The problem is that everything currently ends up calculating a PackedChineseBasedYearInfo which has this requirement about the allowed start dates in Jaunary/February, but it's trivial to break that requirement, as shown above.

Should we use a different non-packed intermediate type without so many constraints?

@sffc
Copy link
Member Author

sffc commented May 21, 2024

The range of dates Temporal supports is from -100 million to 100 million days on either side of January 1 1970 (about 273000 AD).

@Manishearth
Copy link
Member

According to Wolfram Alpha (query):

I understand this. I don't think that affects what I said: These dates are beyond precise predictability and it is not meaningful to talk about them in this way. There is probably going to be a shift in the solstice of that form, but I don't consider those dates as important for the Chinese calendar in the first place.

Should we use a different non-packed intermediate type without so many constraints?

No, we should not introduce a separate code path. I am somewhat open to relaxing the constraints provided it doesn't have a perf impact (and has the strong assertions for the actual date ranges we care about), but we should not introduce multiple codepaths in calendar code (once a Date has been constructed) since that makes it so much easier for things to subtly break.

@sffc
Copy link
Member Author

sffc commented May 22, 2024

According to Wolfram Alpha (query):

I understand this. I don't think that affects what I said: These dates are beyond precise predictability and it is not meaningful to talk about them in this way. There is probably going to be a shift in the solstice of that form, but I don't consider those dates as important for the Chinese calendar in the first place.

I do care about making things work for all dates in the range supported by Temporal. They don't have to be perfectly accurate, but the data model should support them.

Even if I increase the date offset to 64 days, we only get to about 20,000 AD/BC which is still only about 10% of the range required by Temporal.

Should we use a different non-packed intermediate type without so many constraints?

No, we should not introduce a separate code path. I am somewhat open to relaxing the constraints provided it doesn't have a perf impact (and has the strong assertions for the actual date ranges we care about), but we should not introduce multiple codepaths in calendar code (once a Date has been constructed) since that makes it so much easier for things to subtly break.

I don't want to introduce another code path, but currently we use PackedChineseBasedYearInfo in the DateInner's ChineseBasedYearInfo. I was proposing that we could instead use the packed year info in the data provider but then expand its fields into ChineseBasedYearInfo, increasing its stack size a little bit but removing the constraint on the new years offset, and for calculated data we go directly into the unpacked struct.

@sffc
Copy link
Member Author

sffc commented May 22, 2024

Brainstorming a few options:

  1. Keep the invariants as-is, debug-panic when violated, and fall back to January 19 as the new year, which may or may not be a new moon. (current behavior on main)
  2. Keep the invariants as-is, debug-panic when violated, and hack the Winter Solstice to always land on December 20/21, so that the new year contributes to be on a new moon.
  3. Use an unpacked date inner type without the constraints, and use whatever garbage we get from the solstice function to calculate the new year. Note that it's possible to end up with a negative offset from January 1 when the solstice starts drifting into November or earlier.
  4. Fall back to Proleptic Gregorian dates and eras when outside of a max supported range that we define for the Chinese calendar, such as 1000 BC to 5000 AD (±3000 years). -- we already do this in Japanese and could do it in other calendars as well.
  5. Allow offsets up to 64 days within PackedChineseBasedYearInfo with January 1 as the reference, debug-panic if out of range, and fall back to January 1 as the new year. (what is currently in this PR)

@Manishearth
Copy link
Member

Right, I understand your proposal, I don't want to increase the stack size of the Date type here.

I think the status quo with an explicit clamp for the solstice and no panic when the years are out of "bounds" is my ideal choice, and I don't think it breaks the data model. However I think expanding to Jan 1 is acceptable too.

In general I am fine with restricting our assertions to year ranges we consider relevant.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the previous solution to this: I was expecting we could bind things without tweaking the algorithm too much. Let's stick to that, but not remove any asserts (just tweak them to only fire for modern year ranges).

let following_solstice = winter_solstice_on_or_before::<C>(prior_solstice + 370); // s2
debug_assert_eq!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: keep these asserts for modern year ranges (±2000 years?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts are obsolete if I use the bind functions.

@sffc sffc requested a review from robertbastian as a code owner May 22, 2024 18:33
@sffc sffc requested a review from Manishearth May 22, 2024 19:26
@sffc
Copy link
Member Author

sffc commented May 22, 2024

Okay, as demonstrated in #4929, increasing the size of the offset above 32 bits is unavoidable.

Given the increased bits, I've now implemented a solution that pins the winter solstice to December 20-23 and allows the new year to land as early as January 19, which is the earliest the second New Moon could occur after December 20. It's not actually possible for a real-life Chinese New Year to land on January 19, because if the Winter Solstice were December 20 and there was a 29-day month starting on December 21, either it or the month after it would be a leap month. However, since we pin the winter solstice, it is possible that there are no winter leap months, and therefore January 19 ends up being a possible output of the algorithm.

This solution is still better than the original solution in this PR because it works for a longer range of years and should theoretically be able to work for an arbitrarily long range.

@sffc sffc removed the request for review from robertbastian May 22, 2024 19:34
@sffc
Copy link
Member Author

sffc commented May 22, 2024

I already pulled out some of the changes, but if desired, I can split this PR up further.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the chinese fix: I'm okay both with the current solution and the previous one of using more unused PackedYearInfo bits (provided no stack or data size increases) to set FIRST_NY to 1 and not do any clamping. I'm not sure which I prefer, I think I actually slightly prefer the original approach given that Gregorian drift is real, but either is fine.

@@ -221,31 +221,34 @@ impl IslamicTabular {
#[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)]
pub(crate) struct IslamicYearInfo {
packed_data: PackedIslamicYearInfo,
/// Is the previous year 355 days (short = 354)
prev_year_long: bool,
prev_year_length: u16,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this was an attempt to keep stack size small, and similar to the Chinese thing I'm not sure what we gain from breaking the calendrical rule that the calendar is 354 or 355 days (at least in Chinese case it is in part due to Gregorian drift, here this is an actual facet of the calendar's definition)

What years is this failing for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails for various years in the islamic-observational calendar in the range under test. I think it always too short (353 days). If you want to keep the bits small, I can compress this down to a 3 or 4 value enum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be fine. However, can you post the actual years it fails for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and ideally, I would like to have that assertion stick around for the other calendars

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024-05-22T23:12:39.835Z WARN  [calendrical_calculations::islamic] (ObservationalIslamic) Found year -269 AH with length 353
2024-05-22T23:12:40.852Z WARN  [calendrical_calculations::islamic] (ObservationalIslamic) Found year -40 AH with length 353
2024-05-22T23:12:41.775Z WARN  [calendrical_calculations::islamic] (ObservationalIslamic) Found year 168 AH with length 353
2024-05-22T23:12:43.965Z WARN  [calendrical_calculations::islamic] (ObservationalIslamic) Found year 670 AH with length 353
2024-05-22T23:12:44.374Z WARN  [calendrical_calculations::islamic] (ObservationalIslamic) Found year 763 AH with length 353
2024-05-22T23:12:45.868Z WARN  [calendrical_calculations::islamic] (ObservationalIslamic) Found year 1107 AH with length 353
2024-05-22T23:12:49.495Z WARN  [calendrical_calculations::islamic] (SaudiIslamic) Found year 454 AH with length 353
2024-05-22T23:12:52.285Z WARN  [calendrical_calculations::islamic] (SaudiIslamic) Found year 669 AH with length 353
2024-05-22T23:12:55.263Z WARN  [calendrical_calculations::islamic] (SaudiIslamic) Found year 899 AH with length 353

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed code that retains the assertion in calendrical_calculations, and icu_calendar is fine with all three year lengths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, thanks. these years are in practical range for these calendars so we should ideally figure out what's wrong with the math but til then it's fine to hack it this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #4930 and I'll add references in the code

if new_rd - prev_rd == 30 {
lengths[11] = true;
});
// To maintain invariants for calendar arithmetic, if astronomy finds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: thanks, i was considering a technique like this

@sffc sffc requested a review from Manishearth May 22, 2024 23:23
enum IslamicYearLength {
L355,
L354,
L353,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document this as an edge case that shouldn't actually occur (and list the positive years/cals for which it does occur)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

};
if iso_month < 12 || iso_day < 20 {
#[cfg(feature = "logging")]
log::trace!("({}) Solstice out of bounds: {solstice:?}", C::DEBUG_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should upgrade this to a debug assertion for iso_years that are ~±2000years.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sffc sffc requested a review from Manishearth May 23, 2024 02:17
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel that the original Jan 1 changes were better, at this point (sorry, I've gotten turned around by the original arguments and the data) but let's first land this.

@sffc sffc merged commit d938156 into unicode-org:main May 23, 2024
30 checks passed
@sffc sffc deleted the negative-tests branch May 24, 2024 07:52
@sffc
Copy link
Member Author

sffc commented May 24, 2024

Branch with individual commits archived at https://github.com/sffc/omnicu/tree/archive/negative-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants