Figure 3From: Layout-aware text extraction from full-text PDF of scientific articlesText Flow Interruptions. The image (A) in the figure above is a snippet of text extracted from the corresponding PDF file (shown in image B) by PDF2Text. The red arrows on the extracted text mark a break in text flow generated by PDF2Text owing to its inability to discount formatting embellishments like footers. Our evaluation of text extraction accuracy quantifies the effect of such flow-interruption on the quality of the output text produced by both PDF2Text and LA-PDFText.Back to article page