random, infrequent posts about R and programming.

mediocre economist.
aspiring data scientist.
ultimate player.

## Cropping a Multiple Page PDF Using Command Line (macOS)

pdfcrop command I learned about pdfcrop in a stackoverflow post. pdfcrop is a tool that can crop multiple-page PDFs (not to be confused with multiple PDFs). Discovering it was memorable enough that I thought it warranted a post. Installation The full version of MacTex comes with a command line tool called pdfcrop. See if you have it by typing: \$ which pdfcrop /Library/TeX/texbin/pdfcrop If you don’t have, it can be installed from TexLive:

## xmltools package to help convert XML data to tidy data frames

I created a new, small package called xmltools that helps simplify the process of converting XML data into tidy data frames. It has not yet been tested on a ton of XML files so it may have some bugs. I also have not created any tests. But, at least for me, it helps drastically cut down on the code I have to write to get the data I want from an XML file.

## Thinking in highcharter - How to build any Highcharts plot in R

Rstudio’s Mine Cetinkaya-Rundel had a post about the highcharter package, a wrapper for the Highcharts javascripts library that lets you create super sweet interactive charts in R. Joshua Kunst’s highcharter package has become my go-to plotting package once I reach the production phase and know I will be using HTML.

## RMarkdown (.Rmd) to MS Word (.docx) aka rmarkdown2docx

This post is just a copy of README.md file for the repo https://github.com/dantonnoriega/rmarkdown2docx. But it’s got everything you need to get your R Markdown file (.Rmd) to a clean, useful MS Word file (.docx). Description This set of scripts help convert the output of Rmd files to docx files. It is done by creating a clean html file, then opening, converting, and saving the html to docx using Applescript and Microsoft Word.

## Converting Unsearchable PDF Files (aka PDF scans) to Raw Text Using Command Line Tools convert and tesseract

Often, one gets a PDF file that is a scan of a book or text, which cannot be searched (boo!). A good (but not perfect) solution is to use Optical Character Recognition (OCR) to convert the pdf to a txt file and search that instead. Here is my solution. Requirements Command line tools convert tesseract I installed both using homebrew. I’m using Mac OS X 10.