zotero pdf reader character value shifting problem

Problem: In Zotero pdf reader, some Korean characters' value are shifted. Thereby there are inappropriate characters are copied to an item note with the 'Add Item from Annotations' function.
for example, the first line of the abstract of a pdf file from the link below can be used to demonstrate the problem.
http://dx.doi.org/10.5762/KAIS.2014.15.11.6922

I tried to figure it out by converting a line with shifted characters to another format.
In UTF-8 code units, -4 in the middle
In UTF-16 code units, -100
In Decimal, -256

These are what I found and I have no idea what made this shifting and how to fix it.
Using Adobe Acrobat pro to convert the pdf file to pdf/a, pdf/x, pdf/e could fix the problem but at the same time, it makes misplacement of space problem.

format
What I copied from another pdf reader (proper one)
What I copied from Zotero pdf reader (not the proper one)
original part
shifted part

Text
현대 과학기술과 컴퓨터공학의 발전은 정보통신 분야의 비약적 발전과
프대 과학기술과 컴퓨터공학의 발위은 정보통신 분야의 비약윁 발위과
현 | 전 | 적 | 전
프 | 위 | 윁 | 위

UTF-8 code units
ED 98 84 EB 8C 80 20 EA B3 BC ED 95 99 EA B8 B0 EC 88 A0 EA B3 BC 20 EC BB B4 ED 93 A8 ED 84 B0 EA B3 B5 ED 95 99 EC 9D 98 20 EB B0 9C EC A0 84 EC 9D 80 20 EC A0 95 EB B3 B4 ED 86 B5 EC 8B A0 20 EB B6 84 EC 95 BC EC 9D 98 20 EB B9 84 EC 95 BD EC A0 81 20 EB B0 9C EC A0 84 EA B3 BC
ED 94 84 EB 8C 80 20 EA B3 BC ED 95 99 EA B8 B0 EC 88 A0 EA B3 BC 20 EC BB B4 ED 93 A8 ED 84 B0 EA B3 B5 ED 95 99 EC 9D 98 20 EB B0 9C EC 9C 84 EC 9D 80 20 EC A0 95 EB B3 B4 ED 86 B5 EC 8B A0 20 EB B6 84 EC 95 BC EC 9D 98 20 EB B9 84 EC 95 BD EC 9C 81 20 EB B0 9C EC 9C 84 EA B3 BC
ED 98 84 | EC A0 84 | EC A0 81 | EC A0 84
ED 94 84 | EC 9C 84 | EC 9C 81 | EC 9C 84

UTF-16 code units
D604 B300 0020 ACFC D559 AE30 C220 ACFC 0020 CEF4 D4E8 D130 ACF5 D559 C758 0020 BC1C C804 C740 0020 C815 BCF4 D1B5 C2E0 0020 BD84 C57C C758 0020 BE44 C57D C801 0020 BC1C C804 ACFC
D504 B300 0020 ACFC D559 AE30 C220 ACFC 0020 CEF4 D4E8 D130 ACF5 D559 C758 0020 BC1C C704 C740 0020 C815 BCF4 D1B5 C2E0 0020 BD84 C57C C758 0020 BE44 C57D C701 0020 BC1C C704 ACFC
D604 | C804 | C801 | C804
D504 | C704 | C701 | C704

Decimal
54788 45824 32 44284 54617 44592 49696 44284 32 52980 54504 53552 44277 54617 51032 32 48156 51204 51008 32 51221 48372 53685 49888 32 48516 50556 51032 32 48708 50557 51201 32 48156 51204 44284
54532 45824 32 44284 54617 44592 49696 44284 32 52980 54504 53552 44277 54617 51032 32 48156 50948 51008 32 51221 48372 53685 49888 32 48516 50556 51032 32 48708 50557 50945 32 48156 50948 44284
54788 | 51204 | 51201 | 51204
54532 | 50948 | 50945 | 50948
  • This is fixed in a newer PDF.js version (the underlying library behind Zotero PDF reader). But we'll upgrade only after releasing Zotero 7.
Sign In or Register to comment.