How do we handle timestamp formatting errors on large data imports? [closed]

Posted on

Question :

We have a large database with millions of rows of data from ship AIS pings.

We were recently formatting and importing the data from our holder, ping_raw relation into a ping relation.

During our last go at this our query was stopped: ERROR: invalid value "2.01" for "YYYY", SQL state: 22007. We are trying to import a text string (ping_raw.datetime_pos) into a timestamp with timezone field in ping.dt_pos using the to_datetime() function (in PostgreSQL 9.3).

Last thing – we searched for any values SELECT * FROM ping_raw WHERE datetime_pos LIKE '%2.0%', which returned 0 rows. It seems like all the text values in ping_raw are formatted YYYY-MM-DD HH24:MI:SS.

Why is this error coming up? How can we successfully convert text value ping_raw.datetime_pos into a timestamp with timezone datatype in ping.dt_pos?

Answer :

We figured it out.

Some of our timestamp text data was imported in scientific notation (i.e. 2.0160924 E+13. This threw an error, as we didn’t include the . in the template format in the to_datetime() function.

Our solution was to pull the datetime from another column that contained more or less duplicate data.

Point is – the to_datetime() function in PostgreSQL is very particular and requires data going in to match the format you define. If you’re having a similar issue run a few tests to figure out where your data isn’t matching up to what the function expects.

We also saw that Postgres MAY support a conversion from scientific notation. We couldn’t find anything in 9.3 that did, but it seems like it is on their list of things to implement.

Leave a Reply

Your email address will not be published. Required fields are marked *