Skip to content
This repository was archived by the owner on Apr 14, 2021. It is now read-only.

Fix 2-digit date formatting and repeated month symbol for ja/zh #145

Closed
wants to merge 1 commit into from

Conversation

ianhk
Copy link

@ianhk ianhk commented Nov 27, 2015

This is a significant change, though fixes the issues for us in key languages so that formats we use are the same as native Intl on FF and Chrome (ignoring the comma difference for one language in Chrome). Tests pass.

Fix issues in intl.js polyfill that meant 2-digit years, days and months were formatted incorrectly, plus the month symbol within Japanese and Chinese date strings was repeated.

  • change the interpretation of CLDR data so that matching a format uses the 'availableFormats skeleton', and formatting an object uses the format string. Previously both tasks used (an adjusted) format string.
  • once a format has been located, allow a requested format to be used if it is similar to the located format. e.g. numbers can be formatted as 2-digit, but short can't. Previously, once the requested format had been used to locate a nearest match, only the format in that string would be used.

…ths were formatted incorrectly, plus the month symbol within Japanese and Chinese date strings was repeated.

- change the interpretation of CLDR data so that matching a format uses the 'availableFormats skeleton', and formatting an object uses the format string. Previously both tasks used the format string.
- once a format has been located, allow a requested format to be used if it is similar to the located format. e.g.  numbers can be formatted as 2-digit, but short can't.  Previously, once the requested format had been used to locate a nearest match, only the format in that string would be used.
@caridy
Copy link
Collaborator

caridy commented Nov 27, 2015

hey @ianhk, thanks a lot for putting some time on this. Few notes:

  • can you give me a quick overview of the solution? it is hard to infer it from the code changes since this is a complex topic.
  • spec'd code cannot be changed. I noticed that you made some changes in core.js, unfortunately, if the code has spec references, and steps, it means that we can't change it.

I expect that the solution comes from the data side, and the data that we generate for internal consumption.

@ianhk
Copy link
Author

ianhk commented Nov 27, 2015

Hi @caridy,

Sorry for being cryptic!

I had two issues - both of which overlap some open issues.
a) Intl.DateTimeFormat("{year:'2-digit', month:'2-digit', day:'2-digit'}") came out as m/d/yy for en-US. i.e. not honouring 2-digit.
Most regions we need have a similar problem - en-UK comes out as dd/mm/yyyy - a 4 digit year.
Native Intl for Chrome/FF and IE11 (not IE11 on W7) get it right with 2-digit values for day/month and year.
b) Formatting a short month for Japanese / Chinese would result in a double month character. One from the CLDR matched date format string, and one from the CLDR short month names.

I took a look at the spec - http://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems

My interpretation of the spec suggests the availableFormats skeleton (the key) should be used for best-fit matching and the format (the value) should be used for building the string. And that the format string should not be taken literally - so that a request for a 2-digit year, but an entry that says 'y' in the format string can be formatted as 2-digits.

Currently the polyfill uses the format string for matching (after a bit of tweaking in expandFormat()) and for formatting. And that once located the format string is used without regard to the desired format specified when matching.

The change is two part (github makes it look worse than I think it is).

src/cldr.js
Change createDateTimeFormat() so that it generates a set of date/time options for both the skeleton (I've called it match) and the format string (called format). The tweaks that expandFormat() was making to the format string are no longer required (and had a comment that indicated an area of doubt).

src/core.js
Change the calculateScore() function to use the match options (from cldr.js) to calculate best-fit.
Once found, return the format options (from cldr.js), but allow options to be replaced by those on the original API call - if compatible. i.e. format string may contain d, but I can also format that as dd / 2-digit.

I'm not sure if it changes the spec - although I believe this is closer to the desired behaviour.

Both my issues are fixed by this change.

The double symbol issue is fixed because I request a short month and now match on the skeleton. From ja.json:

availableFormats: {
    "yMMMd": "y年M月d日",
}

Here the format string indicates month should be a number followed by the month symbol. My request for a short month is ignored as it's not compatible with the CLDR format.

Hope this makes sense. (and understand this is a significant change)

@caridy
Copy link
Collaborator

caridy commented Nov 27, 2015

ok, that makes sense. Ideally, we can work out the details by producing the right data structure rather than making any changes in the algo from spec. The way I see it, we can modify the way we compute the data structure (which today I'm using the right hang side operand from CLDR, IIRC), by changing that we should be good because the algo computes the best match based on the structure, and output the formatted token based on the pattern and pattern12, which make easy to modify the input from cldr.

I will look into the details on monday.

@caridy
Copy link
Collaborator

caridy commented Nov 27, 2015

I took a look at the spec - http://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems

My interpretation of the spec suggests the availableFormats skeleton (the key) should be used for best-fit matching and the format (the value) should be used for building the string. And that the format string should not be taken literally - so that a request for a 2-digit year, but an entry that says 'y' in the format string can be formatted as 2-digits.

btw, if this is true, then this was completely my fault, an oversight on my part when interpreting the unicode information, and we should be able to quickly fix this without touching the 402 spec.

@ianhk
Copy link
Author

ianhk commented Nov 27, 2015

Thanks for your speedy reply.

It may not be clear from the diffs, but I haven't touched the format string parsing or matching algorithm. Parsing is now called twice, for left (skeleton) and right (format) hand side and places the results into the same rules array. Matching is against the left, but the return value from the match is based on the right. (if any of that makes any sense...)

@caridy
Copy link
Collaborator

caridy commented Nov 28, 2015

I wonder why not just matching against the left hand side, and picking the pattern from the right, which implies keeping the exact same data structure we use today, just different values since we are computing everything based on the right hand side today.

@ianhk
Copy link
Author

ianhk commented Nov 28, 2015

I think we need parsed options (day/month/year etc formats) from both sides. Left hand for matching, right for formatting. Hence storing both sides in the same rule.
The same data structure you had has been duplicated for left and right. The only attribute in those structures that is redundant is the left hand pattern/pattern12.

@caridy
Copy link
Collaborator

caridy commented Dec 5, 2015

I'm taking a crack at this one, will have something on monday.

@caridy
Copy link
Collaborator

caridy commented Dec 7, 2015

#146

@caridy
Copy link
Collaborator

caridy commented Feb 11, 2016

#146 is now merged, and released as 1.1.0. @ianhk thanks a lot for the ground work here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants