Remove duplicate lines from text-based files

Discussion in 'Scripting' started by Thomas Dubreuil, Jun 8, 2019.

  1. Thomas Dubreuil

    Thomas Dubreuil MDL Senior Member

    Aug 29, 2017
    260
    392
    10
    #1 Thomas Dubreuil, Jun 8, 2019
    Last edited: Jun 13, 2019 at 12:07
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  2. Thomas Dubreuil

    Thomas Dubreuil MDL Senior Member

    Aug 29, 2017
    260
    392
    10
    Now made 2 different ones, it is much faster (with the correct regex), and it supports unicode encoding.
    (Still don't know what happens with file without BOM, but it should work...)

    Contrary to simple batch scripts, it is a very robust solution, supporting any strange character, and relatively fast.
    It is based on JRepl script from dbenham (thanks to him for the utility, and also for sharing the correct regex syntax)

    https://github.com/Thdub/Batch_Scripts/blob/master/Utilities/RemoveDuplicateLines.bat
    Remove duplicate lines, blank lines or lines containing only white space from any text-based file
    Keeps last duplicated occurence.
    Usage : just drag and drop your file on to this script.
    Supports files with ASCII, UTF-8 and UNICODE character encoding

    https://github.com/Thdub/Batch_Scripts/blob/master/Utilities/Remove_duplicate_and_blank_lines.bat
    Remove duplicate lines from any text-based file, while preserving blank lines.
    Usage : just drag and drop your file on to this script.
    Keeps last duplicated occurence.
    Supports files with ASCII, UTF-8 and UNICODE character encoding
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. ohenry

    ohenry MDL Novice

    Aug 10, 2009
    32
    15
    0
    Not to rain on your parade or anything, but why not just use "uniq" ? If you're running Windows, you really should have the cygwin utilities anyway.
     
  4. Thomas Dubreuil

    Thomas Dubreuil MDL Senior Member

    Aug 29, 2017
    260
    392
    10
    #4 Thomas Dubreuil, Jun 16, 2019 at 00:50
    Last edited: Jun 16, 2019 at 01:09
    (OP)
    @ohenry Because uniq only works on sorted files/lists ;) .
    And because it can only remove adjacent duplicate lines, you have to use "sort" in conjunction if your lines are not ordered (which means losing initial line ordering)
    While these scripts keep the initial line ordering (or not ordered lines, if you prefer).
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. rayleigh_otter

    rayleigh_otter MDL Senior Member

    Aug 8, 2018
    429
    370
    10
    Thomas, your remove duplicate and blank lines bat, just drag a file onto it and it deletes duplicates?
    Bloody hell :eek:, talk about great timing. Pray it works. :)
     
  6. rayleigh_otter

    rayleigh_otter MDL Senior Member

    Aug 8, 2018
    429
    370
    10
    Ive got the outputs of 6 different privacy tools that i need to seperate into hklm and hkcu then check for duplicates. ;)
     
  7. Thomas Dubreuil

    Thomas Dubreuil MDL Senior Member

    Aug 29, 2017
    260
    392
    10
    #7 Thomas Dubreuil, Jun 16, 2019 at 08:37
    Last edited: Jun 16, 2019 at 08:48
    (OP)
    That's the idea, simple drag and drop.
    It works with most kinds of encoding, strange characters, preserves line ordering while keeping last duplicated line occurence...works fine with .reg ;)

    ps: We can also do that in Notepad++ using regex (regular expression) searching for ^(.*?)$\s+?^(?=.*^\1$) and replace with nothing, but for me drag and drop is simpler sometimes.

    What is cool with notepad++ search is that you can also "mark" lines, then cut/copy/path marked lines...
    Also compare plugin for notepad++ is a must have (be sure update to latest v2 plugin version because vertical scrolling "lock" was broken due to plugin incompatibility with latest notepad++).
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. rayleigh_otter

    rayleigh_otter MDL Senior Member

    Aug 8, 2018
    429
    370
    10
    I use notepad2 mod, ive never had to do this before. Saw this thread a day or 2 ago, glad i bookmarked it. :)
     
  9. rayleigh_otter

    rayleigh_otter MDL Senior Member

    Aug 8, 2018
    429
    370
    10
    Created a file copy, ran it on your bat, compared both files properties, it does something alright :) :worthy: