Posts

Powerful OCR system under GNU/Linux for PDF documents managed from command line and with refinement by Vim.

Image
1 Introduction. 2 The installation of components. 3 OCR of PDF documents with “tesseract”: description of steps. 4 The single steps. 5 Everything in one command! 6 And now: Vim with RegEx. 7 In Conclusion. {{% toc %}} 1 Introduction. The idea came from reading this article about optical character recognition (OCR) in the GNU/Linux environment from images and PDF, managed from the command line. Obviously, PDF documents are those scanned from paper original, i.e., not obtained by direct saving of document in digital format. For the latter, no OCR is needed. The article is very well written and the end result is very good. I wondered if it would be possible to aggregate all the steps into a single text command. In this article I report my solution. Next, then, I added some con RegEx steps by Vim to reformat the raw result of optical recognition. Again, I tried to combine several separate formula...

Notes on resolving differences between two documents with the built-in resources of Vim and Emacs

Image
1 What are we talking about? 2 The test documents. 3 Vim and the vimdiff function. 4 The “vimdiff” interface. 5 Emacs and the “ediff” function. 6 How do you use ediff ? 7 The “ediff” Interface. 8 Summary table. 1 What are we talking about? In this article, I report my practical experience on how to use the built-in resources of Vim and Emacs to resolve and undo differences between two documents. In fact, I’ve occasionally updated two documents in such a “messy” way that I could no longer remember which updates to keep on one and the other. Both Vim and Emacs offer very simple and effective built-in tools for checking and resolving such differences. All operations were performed in a GNU/Linux environment, specifically Ubuntu and Arch. I do not mention any additional plugins in the article, although they exist, and I do not address the issue of version control with git . There is no “merit opinion” on the two writing systems, just a concise exposition of t...

Emacs, portable self-installing configuration with “use-package” and “straight”

Image
1 Emacs, use-package and straight in short. 2 What is configuration exportability? 3 Beyond exportability: “straight”. 4 Configuring “straight”. 5 Configuration of “use-package”. 6 Little example with little analysis. 7 List of applications in my configuration file. 8 And now? 1 Emacs, use-package and straight in short. Emacs uses the GNU ELPA (Emacs Lisp Package Archive) for updating installed packages. MELPA (Milkypostman’s Emacs Lisp Package Archive) is, on the other hand, an unofficial manager with numerous other packages and must be installed by following the directions on this page . The original methods for installing packages are as follows: M-x package-install <RET> package-name <RET> M-x package-list-packages <RET> followed by i on the desired packages and x for the actual installation of the selected packages. To update packages you open the package-list above...

About LaTeX, standalone, PDF and PNG

Image
1 LaTeX and the document format 2 Exporting documents from PDF to PNG. 3 Exporting images from PDF to PNG. 4 Examples 1 LaTeX and the document format As, of course, you all already know (:smiley:), LaTeX is a language dedicated to the typesetting of documents with state-of-the-art quality. It is, therefore, unnecessary to note that the basics of LaTeX can be learned in about 30 minutes, as illustrated in this page . For the purposes of this article, it is sufficient to recall that sources in LaTeX are composed of a preliminary part, also called the preamble , and the document with the contents. The format of documents generated by LaTeX is defined at the beginning of the preamble with the command \documentclass . The (probably) most widely used classes, such as article , report and book , produce documents in formats with predetermined sizes, specified by specific options, such as A4 (...

A study on the export and import of musical scores between LilyPond and MuseScore via MusicXML

Image
1 About LilyPond and MuseScore. 2 Is it possible to convert score formats directly between both applications? 3 First attempt: from LilyPond to MuseScore with python-ly. 4 Second attempt: from LilyPond to MuseScore via MIDI files. 5 Third attempt: from LilyPond to MuseScore via optical PDF score recognition. 6 Fourth attempt: from LilyPond to MuseScore via original Audiveris. 7 Summary. 1 About LilyPond and MuseScore. LilyPond and MuseScore are the two most interesting open source projects dedicated to music notation. LilyPond is a markup language, compatible with LaTeX (I refer you to this my article on LaTeX and LilyPond for writing text and music ), that allows you to write musical scores of high graphical quality using text characters. This feature makes it, among other things, also usable by AI systems, such as ChatGPT, to write autogenerated scores. MuseScore has...

Vim and the Markdown preview

Image
1 Preamble 2 Previewing Vim in Markdown with the “Livedown” plugin 3 Previewing Vim in Markdown with the “Vim Markdown Preview” plugin 4 Simple solution without plugins 1 Preamble Vim is an excellent editor for Markdown, both for its own “native” features and for the possibility of adding specific functions via “plugins.” Previewing documents requires, unless you use the “manual” solution mentioned at the end of this article, using a dedicated “plugin.” Until a few years ago, my favorite was “ iamcco/markdown-preview.vim ” because it provided a true real-time live effect while typing text, even before periodic saving. Unfortunately, that project has been abandoned since February 2020. However, there are interesting alternative solutions that are the subject of these notes. The operating system predominantly used for this article is Arch Linux but there are indications for other GNU/Linux distributions as well. 2 Previewing...

Place side-by-side or overlapping images in GIMP with automatic Container adjustment

Image
1 What is GIMP? 2 Side-by-side and overlapping images 3 Canvas (Container) larger than the overall image 4 Overall image larger than the Canvas (Container) 5 Exporting the result 1 What is GIMP? GIMP is an excellent cross-platform image editor: Windows, macOS and any GNU/Linux distribution. It is a full-featured software, strictly open-source and free. Perfect for trying to create a “collage” of overlapping or side-by-side images, even with different sizes, and export the result as a single image in .jpg or .png format. 2 Side-by-side and overlapping images I have repeatedly needed to “blend” several images side by side or overlapping. The advice that is usually given on the net for such an operation in GIMP involves adding by hand the space in the “Canvas” necessary for inserting the images following the first one. For example: after opening a 100x100 px image in GIMP, to place another one of the same size next to it, it is usually advised to create an add...

Converting documents from the .tex format of LaTeX to the .docx format of MS Word

Image
Table of Contents 1. Preface on LaTeX and MS Word 2. Conversion through htlatex . 3. Conversion through Pandoc 4. Document export examples 1. Preface on LaTeX and MS Word I have been using LaTeX for about thirty years to write documents of all kinds: court documents, reports, research, projects and more. After overcoming the challenging initial learning curve, therefore, it is very unlikely that one will go back to "traditional" word processing systems: the extraordinary typographic quality and considerable time savings have become indispensable. Sometimes, however, it is necessary to exchange material written in LaTeX to friends and colleagues who have not yet had the opportunity to appreciate it. In such cases, it is necessary to convert the text into readable word processing software format. Mainly I am referring to the .docx format, typical of MS Word but also readable by other similar systems, such as LibreOffice Writer that I used for this...