Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters displayed wrongly #794

Closed
grasdk opened this issue Dec 1, 2023 · 3 comments
Closed

Special characters displayed wrongly #794

grasdk opened this issue Dec 1, 2023 · 3 comments

Comments

@grasdk
Copy link
Contributor

grasdk commented Dec 1, 2023

Describe the bug

Danish letters Ææ, Øø and Åå and German ü - and probably other characters are not displayed correctly in the user interface when saved as "Region Person Display Name" or "Region Name", not sure which one is actually read. The metadata is added by DigiKam 8.1.0, but as far as I can gather, it is stored as UTF-8.

See screenshot and attached photo (borrowed from wikimedia)

exiftool output:

$ exiftool -codedcharacterset 2023-08-27-120000-example.jpg
Coded Character Set             : UTF8
$ exiftool -Region* -Keywords -XP* 2023-08-27-120000-example.jpg
Region Person Display Name      : Person Carrying ChildInYellowDress, Pærsøn Åkessün Æñtestå, RedHairedPerson SittingOnBench
Region Rectangle                : 0.611242, 0.453747, 0.0214017, 0.0391972, 0.226011, 0.500157, 0.019285, 0.0319849, 0.365945, 0.486359, 0.00470367, 0.00940734
Region Applied To Dimensions H  : 3189
Region Applied To Dimensions Unit: pixel
Region Applied To Dimensions W  : 4252
Region Area H                   : 0.0391972, 0.0319849, 0.00940734
Region Area Unit                : normalized, normalized, normalized
Region Area W                   : 0.0214017, 0.019285, 0.00470367
Region Area X                   : 0.621943, 0.235654, 0.368297
Region Area Y                   : 0.473346, 0.516149, 0.491063
Region Name                     : Person Carrying ChildInYellowDress, Pærsøn Åkessün Æñtestå, RedHairedPerson SittingOnBench
Region Type                     : Face, Face, Face
Keywords                        : Holiday, RedHairedPerson SittingOnBench, Person Carrying ChildInYellowDress, Pærsøn Åkessün Æñtestå
XP Keywords                     : Holiday;RedHairedPerson SittingOnBench;Person Carrying ChildInYellowDress;Pærsøn Åkessün Æñtestå

Photo/video (optional) that causes the bug

2023-08-27-120000-example

Screenshot

image

Note how the Keywords or XP Keywords are displayed correctly

Used app version:

  • docker:latest
@bpatrik bpatrik added the bug label Dec 2, 2023
@bpatrik bpatrik added this to the Next (probably v2.5) milestone Dec 2, 2023
@grasdk
Copy link
Contributor Author

grasdk commented Dec 3, 2023

Did some further testing. Saved more person-metadata to the using "Tag That Photo". This makes the display correct in PiGallery.

2023-08-27-120000-example-ttp

Once the data is rewritten by exiftool, PiGallery displays it wrongly. This goes for both exiftool Windows executable and the ubuntu version under WSL

WSL (ubuntu)

$ cp 2023-08-27-120000-example-ttp.jpg 2023-08-27-120000-example-ttp-exifcopy.jpg
$ exiftool -all= -tagsfromfile @ -all:all -IPTC:All -XMP:All -ColorSpaceTags -F -codedcharacterset=utf8 2023-08-27-120000-example-ttp-exifcopy.jpg

2023-08-27-120000-example-ttp-exifcopy

cmd.exe (windows 10)

>copy 2023-08-27-120000-example-ttp.jpg 2023-08-27-120000-example-ttp-exifwincopy.jpg
>exiftool -all= -tagsfromfile @ -all:all -IPTC:All -XMP:All -ColorSpaceTags -F -codedcharacterset=utf8 2023-08-27-120000-example-ttp-exifwincopy.jpg

2023-08-27-120000-example-ttp-exifwincopy

When sorting and comparing the exif data as displayed by exiftool, there are no differences.

This is confusing, because I think "Tag That Photo" uses exiftool under the hood

@grasdk
Copy link
Contributor Author

grasdk commented Dec 13, 2023

I had the chance to play around a bit.

Converting variable "name" in line 487 of MetaDataLoader.ts from Ascii to utf-8 at least seems to fix the problem when viewed in the log. Without this conversion the same wrong characters show up in the log, as show up in the UI
https://github.com/bpatrik/pigallery2/blob/3489f1d55ad4b7a5e83149887c665f7a5beddef0/src/backend/model/fileaccess/MetadataLoader.ts#L487C18-L487C18

				Logger.info(LOG_TAG, 'name:                                     ' + name);
				Logger.info(LOG_TAG, 'name converted from ascii to utf-8:       ' + Buffer.from(name, 'ascii').toString('utf-8'));
				Logger.info(LOG_TAG, 'name converted from ascii to utf-8 twice: ' + Buffer.from(Buffer.from(name, 'ascii').toString('utf-8'), 'ascii').toString('utf-8'));

the output is:
image

So it could be that the library that reads the metadata assumes that it is ascii-encoded, which is why the conversion works. According to https://exiftool.org/TagNames/MWG.html, the MWG group recommends ASCII, but exiftool uses UTF-8. This may be the cause of the assumed ASCII format.

Contrary to the EXIF specification, the MWG recommends that EXIF "ASCII" string values be stored as UTF-8. To honour this, the exiftool application sets the default internal EXIF string encoding to "UTF8" when the MWG module is loaded, but via the API this must be done manually by setting the CharsetEXIF option.

I'm not yet comfortable enough with the code to suggest a solution and create pull request with a correction, but wanted to share my findings.

grasdk added a commit to grasdk/pigallery2 that referenced this issue Feb 5, 2024
bpatrik added a commit that referenced this issue Feb 8, 2024
@grasdk grasdk closed this as completed Mar 21, 2024
@grasdk
Copy link
Contributor Author

grasdk commented Mar 21, 2024

Fixed with #826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants