Unix Timestamp Conversion before .NET 4.6

Until now, you had to implement conversions to/from Unix time yourself. That actually isn’t hard to do. By definition, Unix time is the number of seconds since 1st January 1970, 00:00:00 UTC. Thus we can convert from a local DateTime to Unix time as follows:

var dateTime = new DateTime(2015, 05, 24, 10, 2, 0, DateTimeKind.Local); var epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc); var unixDateTime = (dateTime.ToUniversalTime() - epoch).TotalSeconds;

~~We can convert back to a local DateTime as follows:~~

var timeSpan = TimeSpan.FromSeconds(unixDateTime); var localDateTime = new DateTime(timeSpan.Ticks).ToLocalTime();

Update 7th May 2016: This approach gets most of the date right, but the year is wrong:

So please use the following conversion instead:

var timeSpan = TimeSpan.FromSeconds(unixDateTime); var localDateTime = epoch.Add(timeSpan).ToLocalTime();

Unix Timestamp Conversion in .NET 4.6

New methods have been added to support converting DateTime to or from Unix time. The following APIs have been added to DateTimeOffset:

static DateTimeOffset FromUnixTimeSeconds(long seconds)
static DateTimeOffset FromUnixTimeMilliseconds(long milliseconds)
long ToUnixTimeSeconds()
long ToUnixTimeMilliseconds()

So .NET 4.6 gives us some new methods, but to use them, you’ll first have to convert from DateTime to DateTimeOffset. First, make sure you’re targeting the right version of the .NET Framework:

You can then use the new methods:

var dateTime = new DateTime(2015, 05, 24, 10, 2, 0, DateTimeKind.Local); var dateTimeOffset = new DateTimeOffset(dateTime); var unixDateTime = dateTimeOffset.ToUnixTimeSeconds();

…and to change back…

var localDateTimeOffset = DateTimeOffset.FromUnixTimeSeconds(unixDateTime) .DateTime.ToLocalTime();

When you’re dealing with multiple languages, searching for text can be a little tricky. Using normal string comparison techniques, a search for “Malmo” will not match “Malmö”. Technically it shouldn’t, because the characters are actually different, but it’s a great usability feature to allow people to search for text regardless of diacritics (accents and such).

The Normalization Method

The first idea I had was to strip off the diacritics and simply compare the simplified version of both the query and the text being searched. Using the same example, “Malmö” would become “Malmo” in the text, and so the query would match, since RemoveDiacritics(query) == RemoveDiacritics(text).

The RemoveDiacritics() method is defined in this StackOverflow answer:

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

As I pointed out in my followup question, this approach doesn’t work very well for search. If we run a simple test using the same words from my question…

            var words = new List<string>()
            {
                "Malmö",
                "München",
                "Åge",
                "Strømsgodset",
                "Kulħadd"
            };
            var simplifiedWords = words.Select(word => RemoveDiacritics(word)).ToList();

…you’ll notice that it works for basic accents that seem to be external to the base character, but not for others where it is embedded. Below is the output I got in the immediate window (since the Console can’t handle some of the characters with the default encoding):

simplifiedWords
Count = 5
    [0]: "Malmo"
    [1]: "Munchen"
    [2]: "Age"
    [3]: "Strømsgodset"
    [4]: "Kulħadd"

Apart from this, there is no way to simplify combined characters such as æ into a graphically similar ae.

This all makes sense, because technically æ and ae are different characters, as are ħ and h. But from a user’s perspective, it feels pretty natural to be able to interchange them when searching.

The Collation Method

The answer to my question shows that it is actually pretty easy to have diacritic-insensitive search in C#, even without doing any stripping operations. It is necessary only to specify CompareOptions.IgnoreNonSpace in string comparison methods. Here’s an example from that same answer:

int ix = CultureInfo.CurrentCulture.CompareInfo.IndexOf(
    "Ad aeternitatem", 
    "æter", 
    CompareOptions.IgnoreNonSpace); // 3

Here’s the same thing applied to one of my original examples:

            int ix = CultureInfo.CurrentCulture.CompareInfo.IndexOf(
                "Kulħadd",
                "hadd",
                CompareOptions.IgnoreNonSpace); // returns 3

This other answer shows the string.Compare() being used instead, using the same flag:

string s1 = "hello";
string s2 = "héllo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture,
    CompareOptions.IgnoreNonSpace) == 0)
{
    // both strings are equal
}

In either case, just add the CompareOptions.IgnoreCase flag to make it case insensitive as well.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Gigi Labs

Monthly Archives: May 2015

Converting to/from Unix Timestamp in C#

Unix Timestamp Conversion before .NET 4.6

Unix Timestamp Conversion in .NET 4.6

Diacritic-insensitive search in C#

The Normalization Method

The Collation Method

"You don't learn to walk by following rules. You learn by doing, and by falling over." — Richard Branson