Franco Pasut Blog

Posts

Showing posts with the label PDF

A Python program to copy text from various PDFs and collect it into a single document in Markdown language.

May 01, 2024

1. Subject of this article. 2. Python source analysis. 3. The full source code in Python. 1. Subject of this article. The goal is to generate a simple program to collect the text contained in various PDFs generated directly from word processing programs and insert the various fragments into a single document in Markdown language by separating the fragments with second-level headings corresponding to the name of the source documents. The "hands-on" solution is to copy the text from the individual documents, one by one, and paste it into a second document. Or you could build a simple application in Python that does all the work automatically, saving an amount of time directly proportional to the number of documents to be processed. The script , generated with the help of Copilot ( because I am not a programmer), requires the installation of the PyPDF2 library. Warning: this article does NOT discuss optical character recognition (OCR). For that topic, yo...

Reducing the size of single or multiple PDF documents in GNU/Linux Bash and Python

March 11, 2024

Abstract : Compression of PDF documents is a useful technique to reduce the space occupied by these files and facilitate their transmission and storage. In this article, starting from a page devoted to compressing single PDFs, I present two methods for compressing multiple PDF documents. The reference page is as follows: " Linux shell script to reduce PDF file size (simple verification required to enter) and allows you to operate on single PDFs in command-line bash code in the GNU/Linux terminal. Based on the previous one, I tried to extend the procedure to operate on multiple PDFs. In the end I present a simple application in Python with graphical interface. I admit that I asked for some help from ChatGPT and Copilot. Table of Contents: 1. The necessary condition. 2. The reference script for size reduction of individual PDFs 2.1. Script analysis and usage. 3. Derived script to operate on multiple PDFs. ...

About LaTeX, standalone, PDF and PNG

March 12, 2023

1 LaTeX and the document format 2 Exporting documents from PDF to PNG. 3 Exporting images from PDF to PNG. 4 Examples 1 LaTeX and the document format As, of course, you all already know (:smiley:), LaTeX is a language dedicated to the typesetting of documents with state-of-the-art quality. It is, therefore, unnecessary to note that the basics of LaTeX can be learned in about 30 minutes, as illustrated in this page . For the purposes of this article, it is sufficient to recall that sources in LaTeX are composed of a preliminary part, also called the preamble , and the document with the contents. The format of documents generated by LaTeX is defined at the beginning of the preamble with the command \documentclass . The (probably) most widely used classes, such as article , report and book , produce documents in formats with predetermined sizes, specified by specific options, such as A4 (...

Text documents: from PDF to vector images

May 06, 2022

Subject of this article Recently I needed to convert some documents from PDF format, containing text generated by LaTeX in GNU/Linux operating systems, into vector images. While avoiding conversion from online services, I basically found three interesting solutions: two in command-line mode ( pdf2svg and pdftocairo ) and one, very famous, in graphical mode ( Inkscape ). In this article I report my evaluations highlighting some differences deriving from the source of the PDF documents and the behaviour of three Linux distributions. Some interesting references on the subject: - Exporting .png or .jpg files directly from LaTeX code. Possible? - LaTeX/Export To Other Formats pdf2svg It’s a command line software, very easy to use, reliable and fast. The followin is the command scheme: pdf2svg < in file.pdf > < out file.svg > [ < page no > ] You can specify the number of pages to be exported. Ideal for quick and direct o...

LaTeX inverse search with SumatraPDF

April 15, 2018

In this article I will explain how the inverse search (or inverse synchronisation if you prefer) may work, in a Windows operating system, between a source document in LaTeX, written via a command line editor, and a PDF reader.