Skip to main content
Figure 3 | Source Code for Biology and Medicine

Figure 3

From: Layout-aware text extraction from full-text PDF of scientific articles

Figure 3

Text Flow Interruptions. The image (A) in the figure above is a snippet of text extracted from the corresponding PDF file (shown in image B) by PDF2Text. The red arrows on the extracted text mark a break in text flow generated by PDF2Text owing to its inability to discount formatting embellishments like footers. Our evaluation of text extraction accuracy quantifies the effect of such flow-interruption on the quality of the output text produced by both PDF2Text and LA-PDFText.

Back to article page