pdf-parser.py

This post is part of a series of posts categorized as “Wiki” that contain basic how-to information. The intent is to create a reference repository for myself, but I’m not selfish so if anyone else can also benefit from it then I’m happy to share the knowledge!

  • OS: Linux/Windows
  • Description: Examine structure of PDF and look at its contents
Helpful Options:
 -a   stats
 -d   dump stream contents
 -f   pass stream through filter
        FlateDecode
        ASCIIHexDecode
        ASCII85Decode
        LZWDecode
        RunLengthDecode
 -H   hash objects
 -o   select object by ID
 -s   search for string (not streams)
      --searchstream search for string in stream
 -w   raw output from filter