Skip to main content


Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Figure 3 | Source Code for Biology and Medicine

Figure 3

From: Layout-aware text extraction from full-text PDF of scientific articles

Figure 3

Text Flow Interruptions. The image (A) in the figure above is a snippet of text extracted from the corresponding PDF file (shown in image B) by PDF2Text. The red arrows on the extracted text mark a break in text flow generated by PDF2Text owing to its inability to discount formatting embellishments like footers. Our evaluation of text extraction accuracy quantifies the effect of such flow-interruption on the quality of the output text produced by both PDF2Text and LA-PDFText.

Back to article page