Fantastic Unix Forums  

Go Back   Fantastic Unix Forums > Fantastic Unix Forums > General Unix Discussions > Unix Questions

Unix Questions General Questions About Unix.

Windows freeware unique sort technique for large text files (hosts)

Reply

 

LinkBack Thread Tools Display Modes
  #21 (permalink)  
Old 08-17-2008
Anand Hariharan
Guest
 
Posts: n/a
Default Re: Windows freeware unique sort technique for large text files(hosts)

On Sun, 17 Aug 2008 18:46:33 +0200, "B. R. 'BeAr' Ederson"
<br.ederson@expires-2008-08-31.arcornews.de> wrote:

(...)
>
>> The only manual change needed was to move this line back to the top:
>> 127.0.0.1 localhost # this needs to be the first line for some reason
>>
>> I'm digging for the sort command that only sorts from the second line
>> down but haven't found it yet.

>
> The following command line should contain all commands in a one liner:
>
> sed "/127\.0\.0\.1/d" hosts | tr '[A-Z]' '[a-z]' | sort -u | sed
> "1i127.0.0.1 localhost" > hosts
>


Bad idea. My guess of how the OP is using the hosts file is to set
the
IP address of known malicious sites as 127.0.0.1. You'd at least
want
to append your sed's search expression with '[:space:]*localhost'
before
deleting *ALL* lines that contain 127.0.0.1.

sort has a -f option, so the tr is not required.



(...)
> Although the above should work fine, it usually is better to create a
> hosts.new file first and rename it afterwards. But that's up to you.
>


That's actually excellent advice, so much that, that should have been
reflected in your command line. Had the OP used your above command
line,
he'd have lost all entries that corresponded to malicious web-sites
in
his hosts file.


> There are other ways to do the above. I settled with deleting lines
> containing the localhost (127.0.0.1) entries instead of just preserving
> the first line, because the merging of several hosts files may result in
> more than one localhost line...
>


I guess that explains your deleting lines containing 127.0.0.1, but
there
are a number of web sites out there (one even given in an else
thread)
that provide HOSTS files that redirect all requests to known
malicious
sites to 127.0.0.1.


> HTH.
> BeAr


Reply With Quote
  #22 (permalink)  
Old 08-17-2008
B. R. 'BeAr' Ederson
Guest
 
Posts: n/a
Default Re: Windows freeware unique sort technique for large text files (hosts)

On Sun, 17 Aug 2008 12:07:23 -0700 (PDT), Anand Hariharan wrote:

>> The following command line should contain all commands in a one liner:
>>
>> sed "/127\.0\.0\.1/d" hosts | tr '[A-Z]' '[a-z]' | sort -u | sed "1i127.0.0.1 localhost" > hosts


> Bad idea. My guess of how the OP is using the hosts file is to set the
> IP address of known malicious sites as 127.0.0.1. You'd at least want
> to append your sed's search expression with '[:space:]*localhost' before
> deleting *ALL* lines that contain 127.0.0.1.


You are absolutely right. :-( Actually, it should have been:

sed "/^127\.0\.0\.1/d"...

I thought about adding a filter for possible leading whitespace, since
some hosts files are formatted this way. But the already long command
line got a bit too unreadable. While deleting the whitespace class
operator I must have killed the leading caret on accident... :-(

> sort has a -f option, so the tr is not required.


I used tr, because it already had been suggested in this thread and
will produce nicer looking output. The mixed case result of the sort
process using the -f option is probably harder to look through, if
need arises.

> Had the OP used your above command line, he'd have lost all entries that
> corresponded to malicious web-sites in his hosts file.


At least she would have a more manageable hosts file size. ;-)

BeAr
--
================================================== =========================
= What do you mean with: "Perfection is always an illusion"? =
================================================== =============--(Oops!)===
Reply With Quote
  #23 (permalink)  
Old 08-18-2008
Maxwell Lol
Guest
 
Posts: n/a
Default Re: Windows freeware unique sort technique for large text files (hosts)

Donita Luddington <doniludd@sbcglobal.net> writes:

> Do you know if sort can be told to sort all but the first line?


Usually people extract the line, and put it back.
Here is a simple way to do this.

head -1 hosts >hosts.head
sed '1d' hosts | sort <options> >hosts.rest
mv hosts hosts.backup
cat hosts.head hosts.rest >hosts
rm hosts.head hosts.rest

Reply With Quote
  #24 (permalink)  
Old 08-19-2008
Sashi
Guest
 
Posts: n/a
Default Re: Windows freeware unique sort technique for large text files(hosts)

On Aug 17, 10:13*am, Donita Luddington <donil...@sbcglobal.net> wrote:

> What I did was add native Win32 port of the UnixUtils athttp://unxutils.sourceforge.netto my WinXP laptop.
>


If you enjoyed doing this, I STRONGLY recommend instaling Cygwin on
your PC. Since I discovered Cygwin a few years back, it's one of the
first packages that I install on any Windows machine that I need to
work with.
Almost makes me forget that I'm on Windows!
Reply With Quote
Reply

Tags
freeware, hosts, large, sort, technique, text, unique, windows


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads

Thread Thread Starter Forum Replies Last Post
Comparing two text files with non-adjacent lines for unique entries tntelle@yahoo.com Unix Shell Programming 37 06-27-2008 10:15 PM
Comparing two text files with non-adjacent lines for unique entries tntelle@yahoo.com Unix Shell Programming 2 06-27-2008 10:13 PM
Sort command with very large files attraxion Unix Shell Programming 25 06-27-2008 07:28 PM
Sort command with very large files attraxion Unix Shell Programming 0 06-27-2008 07:26 PM
Comparing two text files with non-adjacent lines for unique Unix Shell Programming 10 08-17-2007 12:17 PM


All times are GMT +1. The time now is 04:45 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.2.0