![]() |
| |||||||
| Unix Questions General Questions About Unix. |
![]() |
| LinkBack | Thread Tools | Display Modes |
| |||
| On Sat, 2 Aug 2008 13:25:30 -0700, Donita Luddington wrote: >> There must be a way to uniquify a file from within vi freeware on windows. > > I found these pointers for removing duplicate lines in vi > http://rayninfo.co.uk/vimtips.html >:%s/^\(.*\)\n\1$/\1/ : delete duplicate lines > > http://www.vim.org/tips/tip.php?tip_id=305 >:%s/^\(.*\)\n\1/\1$/ delete duplicate lines > > But, executed in vim 7.1 on Windows, this syntax returns an error. Try this: :%s/^\([^\n]*\)\n\1$/\1/ Please note, that you have to sort the file *beforehand*! The above will only remove *consecutive* duplicate lines. And if you don't have any consecutive duplicate lines, you *will* get an "error". (Pattern not found.) BeAr -- ================================================== ========================= = What do you mean with: "Perfection is always an illusion"? = ================================================== =============--(Oops!)=== |
| |||
| Donita Luddington has brought this to us : > Is there a way, using windows freeware, to sort unique a huge hosts file? Here is some info, that may interest. Hosts File http://home.comcast.net/~SupportCD/XPMyths.html Myth - "Special AntiSpyware Hosts Files are necessary to prevent Spyware infections." Reality - "Using Special AntiSpyware Hosts Files are a waste of time and leads to a false sense of security. Any Malware/Spyware can easily modify the Hosts File at will, even if it is set to Read-only. It is impossible to "lock-down" a Hosts File unless you are running as a limited user which makes using it in this case irrelevant anyway. Various Malware/Spyware uses the Hosts File to redirect your Web Browser to other sites. They can also redirect Windows to use a Hosts File that has nothing to do with the one you keep updating. The Hosts file is an archaic part of networking setups that was originally meant to be used on a LAN and was the legacy way to look up Domain Names on the ARPANET. It tells a PC the fixed numeric address of the internal server(s) so the PC doesn't have to go looking for them through all possible addresses. It can save time when "discovering" a LAN. I don't consider 1970's ARPANET technology useful against modern Malware/Spyware. When cleaning Malware/Spyware from a PC, it is much easier to check a clean Hosts File then one filled with thousands of lines of addresses. Considering how easily a Hosts File can be exploited, redirected and potentially block good sites, it is strongly recommended NOT to waste time using Special Hosts Files. Especially when proper Malware/Spyware protection can be achieved by simply using these steps, all without ever using a Hosts File." |
| |||
| On Aug 2, 2:00*pm, Donita Luddington <donil...@sbcglobal.net> wrote: > Is there a way, using windows freeware, to sort unique a huge hosts file? > > I've concatonated all the freeware windows hosts files I can find into a > single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file > but the resulting hosts file is so huge, replete with duplicates, that it's > slowing down windows browsing. > > I would like to pare the hosts file to remove duplicates. How? > > I tried sorting with windows vim 7.1 freeware but I can't get the unique > sort option to work inside of vim. What am I doing wrong? > > Here is a vim 7.1 command that works inside the huge hosts file: > * :%!sort *(this sorts the huge windows hosts file just fine) > > This vim 7.1 sort unique command should work but it does not: > * :%!sort -u (this is supposed to sort uniquely) > > The syntax is: > <esc>: * *(begin a windows vim 7.1 command) > !sort -u *(run the following command "sort -u" inside of vim freeware) > > When I run "<esc>:!sort -u" inside of vim, it pares the hosts file down to > a single (empty) line. > > Is there another free way to sort uniquely a large windows text file? For the best info on HOSTS files and managing them I have found this site : http://www.mvps.org/winhelp2002/hosts.htm to be very useful. Not only do they publish a very capable HOSTS file, they have free and non-free software listed that will allow you to manage your HOSTS file. As well there are several other tips and tricks that I find useful. The Carnie |
| |||
| Donita Luddington wrote: > Is there a way, using windows freeware, to sort unique a huge hosts file? > > I've concatonated all the freeware windows hosts files I can find into a > single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file > but the resulting hosts file is so huge, replete with duplicates, that > it's > slowing down windows browsing. > > I would like to pare the hosts file to remove duplicates. How? > > I tried sorting with windows vim 7.1 freeware but I can't get the unique > sort option to work inside of vim. What am I doing wrong? > > Here is a vim 7.1 command that works inside the huge hosts file: > :%!sort (this sorts the huge windows hosts file just fine) > > This vim 7.1 sort unique command should work but it does not: > :%!sort -u (this is supposed to sort uniquely) > > The syntax is: > <esc>: (begin a windows vim 7.1 command) > !sort -u (run the following command "sort -u" inside of vim freeware) > > When I run "<esc>:!sort -u" inside of vim, it pares the hosts file down to > a single (empty) line. > > Is there another free way to sort uniquely a large windows text file? Unduplicate, downloadable from http://adriancarter.homestead.com/ might be able to do it, depending on how wide are the lines in your file. You mention a 50,000 line file; to test Unduplicate just now I created ~65,500 random lines in Excel, with a high degree of duplication, then copied to the clipboard. Unduplicate reduced it to about 9900 unique values in less than 10 seconds. I then created the same data but in ~130,000 lines of a text file, and it didn't take much longer. I'm away from my development setup at present, using an old slow early XP system with 250Mb memory. The reason I can't give exact timings is that Unduplicate gives no signal after it has done its thing with the clipboard. A weakness I intend to remedy as soon as I get back home. But it will probably work for you - you just have to select all in an editor, copy, click on the Unduplicate tray icon, wait a while, then paste. -- beerwolf |
| |||
| On Sat, 2 Aug 2008 11:00:26 -0700, Donita Luddington <doniludd@sbcglobal.net> wrote: > Is there a way, using windows freeware, to sort unique a huge hosts file? > > I've concatonated all the freeware windows hosts files I can find into a > single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file > but the resulting hosts file is so huge, replete with duplicates, that it's > slowing down windows browsing. I suspect that even with removing all the duplicates you'll still end up with a file that's a tad big for the usual hosts lookup implementation. Likely, each lookup will end up reading the entire file line-by-line until the first hit or end-of-file, whichever comes first. I think you may need to look at a better solution; firefox with adblock for example. I assume but have not verified whether adblock's lookup is faster, mind. I do know that abusing the hosts file for keeping huge blacklists is more likely to hurt than to help, and not just in slowness. > I would like to pare the hosts file to remove duplicates. How? The easy way for someone with unix experience is to run it through sort, then uniq. Various editors (emacs, vi(m), probably more) can do it too. Various ways for obtaining a unix toolset have already been mentioned. There is a freeware windows implementation available of the programming (scripting, really) language ``awk''[awk]. The installation consists of fetching a single executable and putting it somewhere convenient, then run it with the appropriate arguments (program to execute or file where the program to execute resides, input files, perhaps output redirection). Implementing sort in it would be a bit involved, but an in-place ``uniq'' that doesn't need sorting turns out to be easy. In a dos-box, run: awk '!_[$0]++' inputfile > outputfile On unix shells you may need to escape the !, but I don't think you need to on a windows command line, though I'm not sure just how it handles quoting. This is a bit of a hack in that it is nigh-on unreadable for a beginner, so let me reassure you that it is entirely possible to write very readable awk programs. It has been deployed with success as a language for non-programmers, in fact. [awk] http://plan9.bell-labs.com/cm/cs/awkbook/ which links to http://plan9.bell-labs.com/cm/cs/who/bwk/awk95.exe -- j p d (at) d s b (dot) t u d e l f t (dot) n l . This message was originally posted on Usenet in plain text. Any other representation, additions, or changes do not have my consent and may be a violation of international copyright law. |
| |||
| Hi Guys, By way of update, I followed Bear's and others' original advice and was able to sort the now fifty-thousand line hosts file in about a second or two on Windows. What I did was add native Win32 port of the UnixUtils at http://unxutils.sourceforge.net to my WinXP laptop. This created c:\bin and c:\usr and, more specifically C:\usr\local\wbin\sort.exe Thanks to you, this more powerful sort, containing the "unique" and "ouput" -u and -o options is part of my Windows command-line repertoire. It wasn't at first obvious (to me), but, Wikipedia helped with syntax: http://en.wikipedia.org/wiki/Sort_(Unix) For others, here's the command to pare down the hosts file after you've combined all those hosts files you can find on the Internet using sort: Start->Run->cmd type c:\windows\system32\drivers\etc\hosts | c:\usr\local\wbin\sort.exe -u -o c:\windows\system32\drivers\etc\hosts The only manual change needed was to move this line back to the top: 127.0.0.1 localhost # this needs to be the first line for some reason Do you know if sort can be told to sort all but the first line? It would be nice if the sort command could sort from line 2 to the end so that the extra step of moving the localhost line wasn't needed. |
| |||
| On Sat, 2 Aug 2008 22:27:18 +0200, B. R. 'BeAr' Ederson wrote: > (Better set up a dedicated UnxUtils directory with > entry in the search path, though.) Thanks Bear! Your unxutils advice worked beautifully. Start->Run->cmd type c:\windows\system32\drivers\etc\hosts | c:\usr\local\wbin\sort.exe -u -o c:\windows\system32\drivers\etc\hosts The only manual change needed was to move this line back to the top: 127.0.0.1 localhost # this needs to be the first line for some reason I'm digging for the sort command that only sorts from the second line down but haven't found it yet. |
| |||
| On Sun, 17 Aug 2008 08:16:14 -0700, Donita Luddington <doniludd@sbcglobal.net> wrote: (...) > > Start->Run->cmd > type c:\windows\system32\drivers\etc\hosts | c:\usr\local\wbin\sort.exe > -u -o c:\windows\system32\drivers\etc\hosts > That qualifies for a UUOC (well, 'type' in this case). > The only manual change needed was to move this line back to the top: > 127.0.0.1 localhost # this needs to be the first line for some reason > > I'm digging for the sort command that only sorts from the second line > down but haven't found it yet. Am guessing there must be some variant/clone of 'sed' included in UnxUtils. If not, since you are so keen on calling sort from within vim, you can simply do - :2,$! C:\usr\local\wbin\sort -u - from within a vim session that is editing your hosts file. - Anand |
| |||
| On Sun, 17 Aug 2008 08:16:14 -0700, Donita Luddington wrote: > On Sat, 2 Aug 2008 22:27:18 +0200, B. R. 'BeAr' Ederson wrote: > >> (Better set up a dedicated UnxUtils directory with >> entry in the search path, though.) > > Thanks Bear! You're welcome. :-) Besides, it is BeAr, not Bear. ;-) > The only manual change needed was to move this line back to the top: > 127.0.0.1 localhost # this needs to be the first line for some reason > > I'm digging for the sort command that only sorts from the second line down > but haven't found it yet. The following command line should contain all commands in a one liner: sed "/127\.0\.0\.1/d" hosts | tr '[A-Z]' '[a-z]' | sort -u | sed "1i127.0.0.1 localhost" > hosts If UnxUtils are not part of the PATH search string, all utilities need to be called with fully qualified name. The "hosts" entries have to be substituted with the full name including directory components, if the command is not executed from the directory containing that hosts file. (Which would be easier...) Although the above should work fine, it usually is better to create a hosts.new file first and rename it afterwards. But that's up to you. There are other ways to do the above. I settled with deleting lines containing the localhost (127.0.0.1) entries instead of just preserving the first line, because the merging of several hosts files may result in more than one localhost line... HTH. BeAr -- ================================================== ========================= = What do you mean with: "Perfection is always an illusion"? = ================================================== =============--(Oops!)=== |
| |||
| On Sun, 17 Aug 2008 18:19:56 +0200 (CEST), Anand Hariharan wrote: >> The only manual change needed was to move this line back to the top: >> 127.0.0.1 localhost # this needs to be the first line for some reason >> >> I'm digging for the sort command that only sorts from the second line >> down but haven't found it yet. > > Am guessing there must be some variant/clone of 'sed' included in > UnxUtils. There is. ;-) BeAr -- ================================================== ========================= = What do you mean with: "Perfection is always an illusion"? = ================================================== =============--(Oops!)=== |
![]() |
| Tags |
| freeware, hosts, large, sort, technique, text, unique, windows |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
| ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Comparing two text files with non-adjacent lines for unique entries | tntelle@yahoo.com | Unix Shell Programming | 37 | 06-27-2008 11:15 PM |
| Comparing two text files with non-adjacent lines for unique entries | tntelle@yahoo.com | Unix Shell Programming | 2 | 06-27-2008 11:13 PM |
| Sort command with very large files | attraxion | Unix Shell Programming | 25 | 06-27-2008 08:28 PM |
| Sort command with very large files | attraxion | Unix Shell Programming | 0 | 06-27-2008 08:26 PM |
| Comparing two text files with non-adjacent lines for unique | Unix Shell Programming | 10 | 08-17-2007 01:17 PM | |