2010-10-10

Representing Date and Time in computer programs. Part 1 (theory)

Disclaimer

This post contains no code, just plain theory. If you feel comfortable with various calendars, date & time formats, you could safely skip it.

Background

People started to measure time... Stop. If you're like me, you don't have time to read such crap. I will try to present just the important information.
According to Wikipedia:
Artifacts from the Palaeolithic suggest that the moon was used to calculate time as early as 12,000, and possibly even 30,000 BC. Lunar calendars were among the first to appear, either 12 or 13 lunar months (either 346 or 364 days).
Another article says:
The Julian calendar, a reform of the Roman calendar, was introduced by Julius Caesar in 46 BC, and came into force in 45 BC (709 ab urbe condita).
Finally:
The Gregorian calendar, also known as (...) the Christian calendar, is the internationally accepted civil calendar. It was introduced by Pope Gregory XIII, after whom the calendar was named, by a decree signed on 24 February 1582, a papal bull known by its opening words Inter gravissimas.
Why I quoted this? Because I am trying to make a point. Throughout a history people used different calendars:
  • based on Moon phases or Solar year (or both, actually)
  • forced by political rulers
  • dependent on religious believes
In other words, time measuring & its representation was strictly tied to Culture.
And you know what? Turns out, it still is.

Date formats
10/10/10
Q: What date this represents?
A: That one was obvious, wasn't it? I meant Sunday, October 10, 2010 A.D.

However, the following date is not so obvious:
10/11/12
Is it October 11, 2012? Well, yes if you live in United States. However, if you live in Great Britain, it actually represents November 10, 2012. And if you live in Japan, it surely must be November 12, 2010. Since I live in Poland, these are simply three unrelated integers.
Tip: People interpret date formats according to their cultural background.
Q: Will representing year with for digits help?
A: Not so much.

Take a look at this date:
11/12/2010
Is it any better? Well, at least you could easily guess which year it refers to. However, month and day interpretation will still vary. What about other short date formats? Here are some examples:
08.10.2010 г. (Bulgaria)
2010/10/8 (Taiwan)
8.10.2010 (Czech Republic)
08-10-2010 (Denmark)
08.10.2010 (Germany)
2010. 10. 08. (Hungary)
2010-10-08 (Korea)
8-10-2010 (The Netherlands)
10/8/2010 (Kenya)
8/10/2010 (Australia)
08-10-10 (Bangladesh)
2010-10-8 Uyghur (People's Republic of China)
As you can see, we people tend to be very creative when it comes to formatting dates. We tend to create our own format instead of adopting just one common for all the mankind.
Tip: Software should display dates formatted the way current user expects it.
I will explain in details how to do that in the future, so stay tuned.

Q: So maybe I could use long date format to avoid misrepresentation?
A: Of course you can, but let me show you some examples:
9/شوال/1431 Arabic (Saudi Arabia)
08 Октомври 2010 г. Bulgarian (Bulgaria)
divendres, 8 / octubre / 2010 Catalan (Catalan)
2010年10月8日 Chinese (Taiwan)
8. října 2010 Czech (Czech Republic)
8. oktober 2010 Danish (Denmark)
Freitag, 8. Oktober 2010 German (Germany)
These dates have clearly one interpretation. (Un)fortunately, they need to be expressed in user's language.
Tip: No matter what date format (short, medium, long) you decided to use in your application, it should respect user's locale settings.
If you happen to understand Czech or Bulgarian language, you will notice something. This is apparently specific for Slavic languages and I don't know if it holds true for other language groups. Month name is expressed using genitive case as oppose to nominative case.
Tip: Never assume that target language holds the same grammar properties as English.
Unfortunately, this exactly what has been violated by JDK designers, so if you happen to use standard Java to format dates, avoid using long date format.

Time formats

As for time formats, we people have plenty of space for improvements.
09:08 م (Saudi Arabia)
下午 09:08 (Taiwan)
9:08 μμ (Greece)
9:08 PM (United States)
21:08 (France)
오후 9:08 (Korea)
9:08.MD (Albania)
09:08 ب.ظ (Iran)
ਸ਼ਾਮ 09:08 (India)
09:08 ܒ.ܛ (Syria)
PM 9:08 (Singapore)
Not so many differences after all. It seems that the mankind developed only two kind of time format, actually:
  • 12 hour time format (PM symbol and its placement differs)
  • 24 hour time format
In both cases trailing zeros will, or will not be displayed depending on culture. Although it seems that 24 hour time format will be recognizable by everyone, many people have strong preferences.
Tip: Format time in respect to current user's locale settings.
That way it will be easier to understand for the client.

Time zones and related issues

Up until now, we assumed that we know in which time zone our date & time lives. Therefore, following date-time string will regard to exactly one point in time, exactly the same in the whole wide world:
2010-05-20 19:54
Well, not necessary. The real problem here is, we have no reference to actual time zone. Therefore people living in India will interpret it in different way than people living in New Zealand and definitely different than people living in Central Europe. These differences range from 4 hours and 30 minutes to 11 hours and 45 minutes.
Tip: Present date & time in user's local time zone. If that is not possible, add reference to actual time zone.
This leads us to another issue:
1978-12-31 09:31 (Central European Standard Time)
How comprehensible is that? It depends. If you happen to live in California, it may be totally incomprehensible unless you know that usual difference between your current time zone and CEST is +9 hours. Usually, we don't memorize such values. And if we do, it will be easier to remember the name of the city than actual time zone name.
Tip: Avoid using standard time zone names. Use its UTC offset and short list of cities instead.
That said, it is much more readable this way:
1978-12-31 09:31 (UTC+01:00) Sarajevo, Skopje, Warsaw, Zagreb
Q: What the heck is UTC and what we need this for?
A: This is universal, coordinated time which specify exactly one, unique point in time. We need this in order to avoid time misrepresentation. If you're in doubt, take a look at this time:
2010-03-28 02:11 (UTC+01:00) Sarajevo, Skopje, Warsaw, Zagreb
Looks good, right? The only problem, it does not exist. This is related to Daylight Saving Time. Some dates are simply invalid in local time zones. Some are here twice. Without UTC there will be no way to avoid disambiguity of this date:
2010-10-31 02:16 (UTC+01:00) Sarajevo, Skopje, Warsaw, Zagreb
This date refers to one of two points in time:
2010-10-31 00:15 (UTC)
or
2010-10-31 01:15 (UTC)
That is just because, we're going to change time from 03:00 AM to 02:00 AM here in Europe, on this particular date.
Tip: To avoid disambiguity, always instantiate and store Date & Time objects in UTC. Convert it to user's local time before displaying it.
Now it's time to talk about interchangeability.

ISO 8601, serializing and exchanging date & time values

You probably noticed that I used specific date format in my examples. This has something to do with ISO 8601. ISO 8601 is a document that describes interchangeable date & time formats. It thoroughly describes how dates, times, periods and durations should be formatted in order to make them easily exchangeable.
Tip: If you need to serialize date & time values to string in order to store it or exchange it via network, always use one of ISO 8601's formats.
That said, I need to give you an example of valid ISO 8601 timestamp:
 2010-10-09T11:22Z
This points to Saturday, October 9, 2010 11:22 UTC.
Tip: Allegedly, YYYY-MM-DDThh:mmZ is most widely recognizable pattern for interchanging date & time values. If you cannot use strongly typed DateTime objects to store or exchange information, this format should be used instead.
I will explain it further in future posts, however I cannot do that without specific code examples.

Calendars

Till now, we assumed that there is only one, Gregorian calendar. Unfortunately, such an assumption is not correct. Have you noticed something strange about this:
 9/شوال/1431 Arabic (Saudi Arabia)
example?

Year seems to be somehow strange, isn't it? That's just because default calendar in Saudi Arabia is Islamic calendar.
Tip: Do not assume that Gregorian is default calendar for entire planet. Always present date & time values in accordance to user's local calendar.
There are also few other countries that defaults to non-Gregorian calendars. One of them is Thailand which defaults to Thai solar calendar (which is actually Gregorian calendar equivalent to some extent, but years are counted in a different way), another one is Israel which defaults to Hebrew calendar (actually it may not be the official calendar for Israel, but that's what you get by default when you install Hebrew version of Windows 2003, thus it is what user expects to see). There are few problems with Hebrew calendar. Apart from totally different number of years (according to Wikipedia Hebrew year 5771 has just began), Hebrew year could have 12 or 13 months (and totally different number of days as well).
Tip: Never assume that year have 12 months. It doesn't hold true for all calendars.
I could elaborate further about week numbering, year starting and so on, but this seems quite obvious.

Pretty time

Q: What's pretty time?
A: Nothing, actually. I named this section after Java library which allegedly allows for "pretty" timestamp formatting.

Sometimes, instead of this:
2010-09-09 11:44
you want this:
1 month ago
or
3 minutes ago
or
in 10 minutes
or
next year
or
in January
Et cetera.
This is nowhere near the easy task. I have already said about target language properties. There is more than one problem here, actually. Apart from Declension (i.e. "in January" would be "w styczniu" in Polish, which apparently uses Locative case) there is a problem with plural forms here. In English it is quite easy:
1 minute ago
2 minutes ago
5 minutes ago
However, if you translate it into Polish (or many other languages for that matter):
1 minutę temu
2 minuty temu
5 minut temu
Have you noticed something strange? There is more than one plural form of word "minute" when translated into Polish.
Tip: Never assume the number of plural forms the target language could have.
It is pretty strong, isn't it? Well, it just because we (the programmers) are not linguistic specialists and therefore we should not make any assumptions. Yes, fortunately it is possible.

Summary

In this post I wrote about cultural differences in date and time representations. I tried to be thorough and concise at the same time.
Due to these cultural implications, parsing and formatting date & time values is nowhere near the simple problem. It is no wonder, almost nobody got it right (unfortunately, Blogspot is one of the examples: it gave me plenty of choices how my articles and comments should be timestamped, but among them there where no correct option; It should be formatted as per browser locale).

What's next?

In future, I will try to explain how to correctly parse and format Date & Time values using Java, C#/.Net and C++. There are tons of i18n issues built into these languages (or supporting libraries) so stay tuned if you're interested.