Question :
I’ve got a postgres 8.4 database with a few mangled characters in one of the columns on a table. For example, the word “don’t” gets displayed within psql
as:
donâu0080u0099t
I want to be able to dig out these odd characters and replace them with simple '
or "
alternatives. However, I’m having a bit of a job getting either ~
or LIKE
to be able to pull the correct rows in, let alone be able to replace them.
The database is UTF8 and the client_encoding
is also set to UTF8. I’ve tried selecting u
string and x
strings, but can’t seem to get anything reliable. For example, all of the following queries return 0 rows or ERROR: invalid byte sequence for encoding "UTF8": 0x80
select col from tab where col like '%u0080u0099%';
select col from tab where col ~'u0080u0099';
select col from tab where col like '%x8099%';
select col from tab where col ~'x8099';
Any ideas?
Answer :
If what you see is donâu0080u0099t
then those are literal backslashes in your table, not escape sequences.
Try to double the backslashes to get literal ones:
select col from tab where col ~~ E'%\u0080%';
And use the proper E''
syntax for PostgreSQL 8.4 with that.