random, infrequent posts about R and programming.

mediocre economist.
aspiring data scientist.
ultimate player.

last update:

pdfcrop command I learned about pdfcrop in a stackoverflow post. pdfcrop is a tool that can crop multiple-page PDFs (not to be confused with multiple PDFs). Discovering it was memorable enough that I thought it warranted a post. Installation The full version of MacTex comes with a command line tool called pdfcrop. See if you have it by typing: $ which pdfcrop /Library/TeX/texbin/pdfcrop If you don’t have, it can be installed from TexLive:

I created a new, small package called xmltools that helps simplify the process of converting XML data into tidy data frames. It has not yet been tested on a ton of XML files so it may have some bugs. I also have not created any tests. But, at least for me, it helps drastically cut down on the code I have to write to get the data I want from an XML file.

Rstudio’s Mine Cetinkaya-Rundel had a post about the highcharter package, a wrapper for the Highcharts javascripts library that lets you create super sweet interactive charts in R. Joshua Kunst’s highcharter package has become my go-to plotting package once I reach the production phase and know I will be using HTML.

Often, one gets a PDF file that is a scan of a book or text, which cannot be searched (boo!). A good (but not perfect) solution is to use Optical Character Recognition (OCR) to convert the pdf to a txt file and search that instead. Here is my solution. Requirements Command line tools convert tesseract I installed both using homebrew. I’m using Mac OS X 10.

Using highlight.jz Need to install desired syntax from https://highlightjs.org/download/ Extract file highlight.pack.js Replace file in themes/hyde-y/static/highlight.pack.js Then, in a code chunk, put the proper name in code chunk e.g. {r} print(“hey”) shell less some_file.txt