Re: [dev] sple - A simple PDF links/emails extracotr.

From: Jason Woofenden <jason_AT_jasonwoof.com>
Date: Wed, 6 May 2015 23:19:04 -0400

Hi Hypsurus,

I hope you're having fun coding. Don't let me detract from that.
But if you just need to extract links from pdfs, you can do so with
existing tools, eg:

pdftohtml -stdout foo.pdf | sed -ne 's/\(^\|\n\)\n\([^\n]*\)\n[^\n]*/\1\2/gp; t; s/href="\([^"]\+\)"/\n\n\1\n/g; D'

Sorry if that sed thing is more complex than it needs to be. I'm
just learning the other sed commands besides s///.

The extra complexity with the "\n"s is to handle multiple links on
the same line.

-- 
Jason
Received on Thu May 07 2015 - 05:19:04 CEST

This archive was generated by hypermail 2.3.0 : Thu May 07 2015 - 05:24:08 CEST