Problems with text in beta annotation extraction
The new annotation function produces text that is not as readable as the old Zotfile extraction tool.
* importantly the new function is not handling line breaks well.
From the Zotfile extraction:
Chapter 4 A burgeoning credit economy (note on p.104)
"In the 1920s, outstanding debt again doubled, increasing from $3.3 billion in 1920 to over $7.6 billion in 1929. About half as much debt was outstanding in 1933 ($3.9 billion), after which time debt again grew rapidly, doubling in just six years." (Olney 1991:104)
"Debt as a percentage of income increased only gradually before World War I, as seen in figure 4.1, but then doubled in the 1920s." (Olney 1991:109)
"Cash purchase of many major durable goods commanded a large share of the average household's disposable income. In the 1920s, a Chevrolet cost about 20 percent of annual household income, and a Chrysler could cost over 60" (Olney 1991:113)
From the new function:
(Olney, 1991, p. 104) Chapter 4 A burgeoning credit economy
“In the 1920s, outstanding debt again doubled, increas- 0 C\l 0 ,.... m @ @ @ @ @ g g ,.... ,.... ,.... much debt was outstanding in 1933 ($3 .9 billion), after which time debt .,...; Q) O> O> O> >- ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... again grew rapidly, doubling in just six years.” (Olney, 1991, p. 104)
In the 1920s, outstanding debt again doubled, increas-
ing from $3.3 billion in 1920 to over $7.6 billion in 1929. About half as
much debt was outstanding in 1933 ($3.9 billion), after which time debt
again grew rapidly, doubling in just six years.
“1 Debt as a percent::::> .2 O>(;:t> (/) <13 t:: ·- ~ c: > 0 al .s age of income increased only gradually before World War I, as seen in o_ EE (.) <13 "ffi y; . -- figure 4.1, but then doubled in the 1920s.” (Olney, 1991, p. 109)
. 1 Debt as a percent-
age of income increased only gradually before World War I, as seen in
figure 4.1, but then doubled in the 1920s.
* importantly the new function is not handling line breaks well.
From the Zotfile extraction:
Chapter 4 A burgeoning credit economy (note on p.104)
"In the 1920s, outstanding debt again doubled, increasing from $3.3 billion in 1920 to over $7.6 billion in 1929. About half as much debt was outstanding in 1933 ($3.9 billion), after which time debt again grew rapidly, doubling in just six years." (Olney 1991:104)
"Debt as a percentage of income increased only gradually before World War I, as seen in figure 4.1, but then doubled in the 1920s." (Olney 1991:109)
"Cash purchase of many major durable goods commanded a large share of the average household's disposable income. In the 1920s, a Chevrolet cost about 20 percent of annual household income, and a Chrysler could cost over 60" (Olney 1991:113)
From the new function:
(Olney, 1991, p. 104) Chapter 4 A burgeoning credit economy
“In the 1920s, outstanding debt again doubled, increas- 0 C\l 0 ,.... m @ @ @ @ @ g g ,.... ,.... ,.... much debt was outstanding in 1933 ($3 .9 billion), after which time debt .,...; Q) O> O> O> >- ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... ,.... again grew rapidly, doubling in just six years.” (Olney, 1991, p. 104)
In the 1920s, outstanding debt again doubled, increas-
ing from $3.3 billion in 1920 to over $7.6 billion in 1929. About half as
much debt was outstanding in 1933 ($3.9 billion), after which time debt
again grew rapidly, doubling in just six years.
“1 Debt as a percent::::> .2 O>(;:t> (/) <13 t:: ·- ~ c: > 0 al .s age of income increased only gradually before World War I, as seen in o_ EE (.) <13 "ffi y; . -- figure 4.1, but then doubled in the 1920s.” (Olney, 1991, p. 109)
. 1 Debt as a percent-
age of income increased only gradually before World War I, as seen in
figure 4.1, but then doubled in the 1920s.
Using PDFxchange and Zotfile:
"our government is destroying two vital instruments of that growth-the system of contract rights and the large corporation." (Jensen and Meckling 1978:31)
"The courts have often taken the lead in revoking private rights, especially in the civil rights arena" (Jensen and Meckling 1978:31)
Using the PDFxchange and the built-in extractor:
Annotations(2/4/2022, 9:14:00 AM)
“our government is destroying two vital instruments of that growth-the system of contract rights and the large corporation.” our government is destroying two vital instruments of that growth-the system of contract rights and the large corporation.
“The courts have often taken the lead in revoking private rights, especially in the civil rights arena” The courts have often taken the lead in revoking private rights, especially in the civil rights arena
What's an example of the line break issue?
The breaks seem to be in the duplicated comment text. See below:
Annotations(2/4/2022, 9:29:30 AM)
“If a foreign creditor is so kind as to wait his time and buy the bullion as it comes into the country, he may be paid without troubling the Bank or distressing the money market. The German Government has recently been so kind”
If a foreign creditor
is so kind as to wait his time and buy the bullion
as it comes into the country, he may be paid
without troubling the Bank or distressing the
money market. The German Government has
recently been so kind
I'm guessing ZotFile automatically ignores (whitespace-normalized?) comments that match the highlighted text (and this suggests it does something like that). We've encountered this before with other PDF readers, so we've create a ticket to do something similar when parsing annotations.