Copying and pasting gives different text - Extracting transactions from encrypted bank statements
Select the text from your PDF bank statement, copy, then paste into notepad - if you see gibberish or different characters in notepad, then read on.
When reading searchable PDF documents StatementReader uses the embedded text, but this has been added as either an obscure font, or encrypted to make copying and pasting the text difficult.
To overcome this, you can use a standalone OCR engine to force the OCR to read all of the text correctly; another method you can use is to tick the ‘bypass PDF encryption’ box above the Go button in StatementReader. Just FYI, if you try to run this you will first have to right click on the job you have already run in StatementReader and select ‘remove OCR cache’.
Here are the steps you can use: 1. Select the matching bank template 2. Select your input searchable PDF document using the ‘browse’ button 3. Untick ‘Parse PDF’ from above the ‘Go’ button (this will use our external OCR server by default, you can check this from the Options -> Advanced options -> Engine window). Also tick ‘bypass PDF encryption’. 4. Click ‘Go’
For any other support, you can call us on +44 (0)20 3287 8283 or send us a message us.
Recent Posts
See AllRevisited starter script from January 2021: Split Excel file into separate files Excel is essential, and Python is the future - forcing...
Comments