Saturday, January 18, 2014

Converting plots to data

It is a problem which occurs ever so often in applied work, you have a plot, but you want the data. There are at least two programs which can help you there; PlotDigitizer and Engauge Digitizer. I got both on my openSuse machine. Both are available for Windows, for Mac there are only older versions of Engauge.

I tried these programs on a relatively simple problem. I saw a plot in a book and wanted to calculate that line myself. So I took my camera, photographed the plot and got to work.



Engauge Digitizer

Engauge has been there for quite a while. It is many features, but looks a bit outdated. It was not able to import my original figure (2992*2992 pixels, 694 KB) but had no problems after resizing to 500*500 pixels, 55.9 KB.
It is clearly the program which can handle more exotic plots. For me it is not intuitive. For instance, it took me quite some time to figure out how to export the results. Initially I copied-pasted the results to a spreadsheet, later I managed to create a .csv after all. Engauge comes with a manual so everything can be resolved. Engauge has the ability to do point detection, to use that it is probably best to crop the figure as much as possible, Engauge has no qualms finding points in text, black blobs, axis labels and such. Probably in a colored plot automatic detection would work better, you have some settings to guide it.

PlotDigitizer

PlotDigitizer looks much more modern. It had no problems with the large photo, except that it could not scale that photo enough to fit on the screen. The modern interface allows manual adding/removing/moving of points. There is also a possibility to trace a line on screen and it will add points it detects there. PlotDigitizer exports to .xml. It is also possible to cipy-paste the results. While I see the advantage of a file including documentation, it would also be nice to get the data out of the file.

The file I got needed some extra processing before I had the data.frame.
library(XML)
mytree <- xmlTreeParse('test12.xml') 
mylist <- xmlToList(mytree)
mylist2 <- mylist[4:length(mylist)]
mydf <- do.call(rbind,mylist2)
convert <- data.frame(x=as.numeric(mydf[,'dx']),
           y=as.numeric(mydf[,'dy']))

Conclusion

The programs complement each other. Engauge is great for automated extraction, complex plots. However, it is not so easy for occasional usage. PlotDigitizer is easy to use, great if you want to manually select your points.

7 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Correct the spelling of "Converting" in the title.

    ReplyDelete
  3. I have had good luck with WebPlotDigitizer: http://arohatgi.info/WebPlotDigitizer/

    ReplyDelete
  4. There are other quite good data extractors out there. My favorite is DataThief, http://datathief.org/ . Don't waste your time slogging thru R-code to do what a dedicated tool does much better and faster.

    ReplyDelete
  5. I've been looking into these programs. You're right, it's a common problem. Engauge is open-source (c++), which may help anyone who needs to adapt the algorithms for specific scenarios. The interesting thing is that it seems simple at first (scatterplots of a few solid circles are indeed easier), but the more you think about how to code it to deal with a range of images, the harder it gets!

    ReplyDelete
  6. Neat tools for a common problem. Just wanted to add that one should always think about the errors introduced by digitizing, and how reliable are the numbers one gets through this tools, not to mention the original (often unseen or unreported) errors in the data themselves.
    Thanks for the post.

    ReplyDelete