Better "Search for duplicates"

Brought to you by: macteam

#30 Better "Search for duplicates"

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2003-11-05

Created: 2003-11-05

Creator: Anonymous

Private: No

1. Scan selected volume/hd folder for duplicate
files/folders in entire collection
(That could be useful when preparing mp3 cd
to be burned... to avoid burning duplicates...)
2. A smarter search algorithm (I mean... John
Smith == Jhon Smith == joHn Smith == Jňhn Smith ==
Jihn Smith)
Here is my idea how to do that...
A. Convert ň->o č->e...
B. Lower
C. Ignore anything but letters
and numbers,
D. Count found letters
E. Choose if they look alike on
a maximum different characters parameter basis

Example:
1. John Smith -> Converting: John
Smith -> Lowercase: john smith -> Ignoring: johnsmith -
> Counting Characters: h(2) i(1) j(1) m(1) n(1) o(1) t(1)
s(1)
2. Jhon Smith -> Converting: Jhon
Smith -> Lowercase: jhon smith -> Ignoring: jhonsmith -
> Counting Characters: h(2) i(1) j(1) m(1) n(1) o(1) t(1)
s(1)
3. joHn Smith -> Converting: joHn
Smith -> Lowercase: john smith -> Ignoring: johnsmith -
> Counting Characters: h(2) i(1) j(1) m(1) n(1) o(1) t(1)
s(1)
4. Jňhn Smith -> Converting: John
Smith -> Lowercase: john smith -> Ignoring: johnsmith -
> Counting Characters: h(2) i(1) j(1) m(1) n(1) o(1) t(1)
s(1)
5. Jihn Smith -> Converting: Jihn
Smith -> Lowercase: jihn smith -> Ignoring: jihnsmith ->
Counting Characters: h(2) i(2) j(1) m(1) n(1) t(1) s(1)

Comparing results, we find that:
A. 1, 2, 3 & 4 match
B. 5 matches 1,2,3 &
4 on a 2 char difference (i++ & o--) but it could be
easyly found out that 'o' has been replaced be 'i'...

Thanks.
Bye

Better "Search for duplicates"

Group

Searches

Help

#30 Better "Search for duplicates"

Discussion

Better &quot;Search for duplicates&quot;

Group

Searches

Help

#30 Better &quot;Search for duplicates&quot;

Discussion

Better "Search for duplicates"

#30 Better "Search for duplicates"