![]() |
| |||||||
| Unix Shell Programming Post here for discussing in comp.unix.shell newsgroup. |
![]() |
| LinkBack | Thread Tools | Display Modes |
| |||
| I am trying to find an easy and fast way to compare two files, each with several thousand lines - only one column and spit out what is unique only to one of the files. So, compare file A and file B, and only lines that re unique to file A are spit out to a new file.. comm and diff / sort and Uniq do not work because in this case the two files will have non-adjacent lines. Any help is GREATLY appreciated. Thank you in advance! -TT |
| |||
| On Wed, 15 Aug 2007 13:53:32 -0000, tntelle@yahoo.com <tntelle@yahoo.com> wrote: > > > > Thank you all --- > When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get > awk: syntax error near line 1 > awk: bailing out near line 1 > You are probably using Solaris Old Broken Awk(tm). Use nawk or /usr/xpg4/bin/awk. -- BOFH excuse #331: those damn raccoons! |
| |||
| On Wed, 15 Aug 2007 13:53:32 -0000, tntelle@yahoo.com <tntelle@yahoo.com> wrote: > > > > Thank you all --- > When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get > awk: syntax error near line 1 > awk: bailing out near line 1 > You are probably using Solaris Old Broken Awk(tm). Use nawk or /usr/xpg4/bin/awk. -- BOFH excuse #331: those damn raccoons! |
| |||
| <tntelle@yahoo.com> wrote in message news:1187142111.587829.251580@q3g2000prf.googlegro ups.com... > I am trying to find an easy and fast way to compare two files, each > with several thousand lines - only one column and spit out what is > unique only to one of the files. > So, compare file A and file B, and only lines that re unique to file A > are spit out to a new file.. comm and diff / sort and Uniq do not > work because in this case the two files will have non-adjacent lines. > cat A A B | sort |uniq -u awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B I am not sure what you mean by "non-adjacent lines". And note that the two solutions above give different results for lines that appear more than once in B: it is not clear what you want. Surely solutions based on diff or comm will work if you first sort A and B? -- John. |
| |||
| John L wrote: > <tntelle@yahoo.com> wrote in message news:1187142111.587829.251580@q3g2000prf.googlegro ups.com... > > I am trying to find an easy and fast way to compare two files, each > > with several thousand lines - only one column and spit out what is > > unique only to one of the files. > > So, compare file A and file B, and only lines that re unique to file A > > are spit out to a new file.. comm and diff / sort and Uniq do not > > work because in this case the two files will have non-adjacent lines. > > > > cat A A B | sort |uniq -u > awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B > > I am not sure what you mean by "non-adjacent lines". > And note that the two solutions above give different results for > lines that appear more than once in B: it is not clear what you want. > Surely solutions based on diff or comm will work if you first sort > A and B? > > -- > John. Since he wants the lines that are in A but not in B, I think the order of the files should be reversed. awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A Another way: awk 'NR==FNR{seen[$0]; next} !($0 in seen)' B A One Ruby solution: ruby -e 'def lines;gets(nil).split("\n") end; puts lines - lines' A B |
| |||
| On Aug 15, 2:37 am, William James <w_a_x_...@yahoo.com> wrote: > John L wrote: > > <tnte...@yahoo.com> wrote in messagenews:1187142111.587829.251580@q3g2000prf.go oglegroups.com... > > > I am trying to find an easy and fast way to compare two files, each > > > with several thousand lines - only one column and spit out what is > > > unique only to one of the files. > > > So, compare file A and file B, and only lines that re unique to file A > > > are spit out to a new file.. comm and diff / sort and Uniq do not > > > work because in this case the two files will have non-adjacent lines. > > > cat A A B | sort |uniq -u > > awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B > > > I am not sure what you mean by "non-adjacent lines". > > And note that the two solutions above give different results for > > lines that appear more than once in B: it is not clear what you want. > > Surely solutions based on diff or comm will work if you first sort > > A and B? > > > -- > > John. > > Since he wants the lines that are in A but not in B, > I think the order of the files should be reversed. > > awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A > > Another way: > > awk 'NR==FNR{seen[$0]; next} !($0 in seen)' B A > > One Ruby solution: > > ruby -e 'def lines;gets(nil).split("\n") end; puts lines - lines' A B- Hide quoted text - > > - Show quoted text - Thank you all --- When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get awk: syntax error near line 1 awk: bailing out near line 1 =/ |
| |||
| On 2007-08-15, tntelle@yahoo.com wrote: > I am trying to find an easy and fast way to compare two files, each > with several thousand lines - only one column and spit out what is > unique only to one of the files. > So, compare file A and file B, and only lines that re unique to file A > are spit out to a new file.. comm and diff / sort and Uniq do not > work because in this case the two files will have non-adjacent lines. grep -Fvf FileB FileA -- Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell/> Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress) ===== My code in this post, if any, assumes the POSIX locale ===== and is released under the GNU General Public Licence |
| |||
| On Aug 15, 10:05 am, Bill Marcum <marcumb...@bellsouth.net> wrote: > On Wed, 15 Aug 2007 13:53:32 -0000, tnte...@yahoo.com <tnte...@yahoo.com> wrote: > > > Thank you all --- > > When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get > > awk: syntax error near line 1 > > awk: bailing out near line 1 > > You are probably using Solaris Old Broken Awk(tm). Use nawk or > /usr/xpg4/bin/awk. > > -- > BOFH excuse #331: > > those damn raccoons! haha -- okay, got that down BUT now.... (more raccoons) bash-2.03$ /usr/xpg4/bin/awk 'NR==FNR{seen[$0]++; next} !seen[$0]' ./ all.ldap.users ./all.oracle.users input file "./all.ldap.users"input file "./all.oracle.users"bash-2.03$ it just says that, doesnt do anything more = *( |
| |||
| On Aug 15, 2:37 am, William James <w_a_x_...@yahoo.com> wrote: > John L wrote: > > <tnte...@yahoo.com> wrote in messagenews:1187142111.587829.251580@q3g2000prf.go oglegroups.com... > > > I am trying to find an easy and fast way to compare two files, each > > > with several thousand lines - only one column and spit out what is > > > unique only to one of the files. > > > So, compare file A and file B, and only lines that re unique to file A > > > are spit out to a new file.. comm and diff / sort and Uniq do not > > > work because in this case the two files will have non-adjacent lines. > > > cat A A B | sort |uniq -u > > awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B > > > I am not sure what you mean by "non-adjacent lines". > > And note that the two solutions above give different results for > > lines that appear more than once in B: it is not clear what you want. > > Surely solutions based on diff or comm will work if you first sort > > A and B? > > > -- > > John. > > Since he wants the lines that are in A but not in B, > I think the order of the files should be reversed. > > awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A > > Another way: > > awk 'NR==FNR{seen[$0]; next} !($0 in seen)' B A > > One Ruby solution: > > ruby -e 'def lines;gets(nil).split("\n") end; puts lines - lines' A B- Hide quoted text - > > - Show quoted text - Thank you all --- When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get awk: syntax error near line 1 awk: bailing out near line 1 =/ |
| |||
| On Aug 15, 10:05 am, Bill Marcum <marcumb...@bellsouth.net> wrote: > On Wed, 15 Aug 2007 13:53:32 -0000, tnte...@yahoo.com <tnte...@yahoo.com> wrote: > > > Thank you all --- > > When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get > > awk: syntax error near line 1 > > awk: bailing out near line 1 > > You are probably using Solaris Old Broken Awk(tm). Use nawk or > /usr/xpg4/bin/awk. > > -- > BOFH excuse #331: > > those damn raccoons! haha -- okay, got that down BUT now.... (more raccoons) bash-2.03$ /usr/xpg4/bin/awk 'NR==FNR{seen[$0]++; next} !seen[$0]' ./ all.ldap.users ./all.oracle.users input file "./all.ldap.users"input file "./all.oracle.users"bash-2.03$ it just says that, doesnt do anything more = *( |
![]() |
| Tags |
| comparing, entries, files, lines, nonadjacent, text, unique |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
| ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Comparing two text files with non-adjacent lines for unique entries | tntelle@yahoo.com | Unix Shell Programming | 5 | 06-27-2008 11:13 PM |
| Comparing two text files with non-adjacent lines for unique entries | tntelle@yahoo.com | Unix Shell Programming | 2 | 06-27-2008 11:13 PM |
| Comparing two text files with non-adjacent lines for unique entries | tntelle@yahoo.com | Unix Shell Programming | 6 | 06-27-2008 11:13 PM |
| Comparing two text files with non-adjacent lines for unique entries | tntelle@yahoo.com | Unix Shell Programming | 0 | 06-27-2008 11:13 PM |
| Comparing two text files with non-adjacent lines for unique | Unix Shell Programming | 10 | 08-17-2007 01:17 PM | |