Skip to main content
Figure 2 | Source Code for Biology and Medicine

Figure 2

From: Layout-aware text extraction from full-text PDF of scientific articles

Figure 2

Flexibility of the block identification algorithm. The image shown on left of the figure is taken from page 2, with two distinct articles, of the Nature editorial Volume 466 Issue no. 7303. The image on the right is an example of the debug output generated by LA-PDFText. Our block detection algorithm identifies the text blocks in the right column of the article page as distinct blocks allowing the subsequent block classification step of the system to apply rules that treat these blocks as parts of different articles.

Back to article page