Fantastic Unix Forums  

Go Back   Fantastic Unix Forums > Fantastic Unix Forums > Shell Programming > Unix Shell Programming

Unix Shell Programming Post here for discussing in comp.unix.shell newsgroup.

Comparing two text files with non-adjacent lines for unique entries

Reply

 

LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 06-27-2008
tntelle@yahoo.com
Guest
 
Posts: n/a
Default Comparing two text files with non-adjacent lines for unique entries

I am trying to find an easy and fast way to compare two files, each
with several thousand lines - only one column and spit out what is
unique only to one of the files.
So, compare file A and file B, and only lines that re unique to file A
are spit out to a new file.. comm and diff / sort and Uniq do not
work because in this case the two files will have non-adjacent lines.

Any help is GREATLY appreciated. Thank you in advance!
-TT

Reply With Quote
  #2 (permalink)  
Old 06-27-2008
Bill Marcum
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On Wed, 15 Aug 2007 13:53:32 -0000, tntelle@yahoo.com
<tntelle@yahoo.com> wrote:
>
>
>
> Thank you all ---
> When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get
> awk: syntax error near line 1
> awk: bailing out near line 1
>

You are probably using Solaris Old Broken Awk(tm). Use nawk or
/usr/xpg4/bin/awk.

--
BOFH excuse #331:

those damn raccoons!
Reply With Quote
  #3 (permalink)  
Old 06-27-2008
Bill Marcum
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On Wed, 15 Aug 2007 13:53:32 -0000, tntelle@yahoo.com
<tntelle@yahoo.com> wrote:
>
>
>
> Thank you all ---
> When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get
> awk: syntax error near line 1
> awk: bailing out near line 1
>

You are probably using Solaris Old Broken Awk(tm). Use nawk or
/usr/xpg4/bin/awk.

--
BOFH excuse #331:

those damn raccoons!
Reply With Quote
  #4 (permalink)  
Old 06-27-2008
John L
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries


<tntelle@yahoo.com> wrote in message news:1187142111.587829.251580@q3g2000prf.googlegro ups.com...
> I am trying to find an easy and fast way to compare two files, each
> with several thousand lines - only one column and spit out what is
> unique only to one of the files.
> So, compare file A and file B, and only lines that re unique to file A
> are spit out to a new file.. comm and diff / sort and Uniq do not
> work because in this case the two files will have non-adjacent lines.
>


cat A A B | sort |uniq -u
awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B

I am not sure what you mean by "non-adjacent lines".
And note that the two solutions above give different results for
lines that appear more than once in B: it is not clear what you want.
Surely solutions based on diff or comm will work if you first sort
A and B?

--
John.


Reply With Quote
  #5 (permalink)  
Old 06-27-2008
William James
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries


John L wrote:
> <tntelle@yahoo.com> wrote in message news:1187142111.587829.251580@q3g2000prf.googlegro ups.com...
> > I am trying to find an easy and fast way to compare two files, each
> > with several thousand lines - only one column and spit out what is
> > unique only to one of the files.
> > So, compare file A and file B, and only lines that re unique to file A
> > are spit out to a new file.. comm and diff / sort and Uniq do not
> > work because in this case the two files will have non-adjacent lines.
> >

>
> cat A A B | sort |uniq -u
> awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B
>
> I am not sure what you mean by "non-adjacent lines".
> And note that the two solutions above give different results for
> lines that appear more than once in B: it is not clear what you want.
> Surely solutions based on diff or comm will work if you first sort
> A and B?
>
> --
> John.


Since he wants the lines that are in A but not in B,
I think the order of the files should be reversed.

awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A

Another way:

awk 'NR==FNR{seen[$0]; next} !($0 in seen)' B A

One Ruby solution:

ruby -e 'def lines;gets(nil).split("\n") end; puts lines - lines' A B

Reply With Quote
  #6 (permalink)  
Old 06-27-2008
tntelle@yahoo.com
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On Aug 15, 2:37 am, William James <w_a_x_...@yahoo.com> wrote:
> John L wrote:
> > <tnte...@yahoo.com> wrote in messagenews:1187142111.587829.251580@q3g2000prf.go oglegroups.com...
> > > I am trying to find an easy and fast way to compare two files, each
> > > with several thousand lines - only one column and spit out what is
> > > unique only to one of the files.
> > > So, compare file A and file B, and only lines that re unique to file A
> > > are spit out to a new file.. comm and diff / sort and Uniq do not
> > > work because in this case the two files will have non-adjacent lines.

>
> > cat A A B | sort |uniq -u
> > awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B

>
> > I am not sure what you mean by "non-adjacent lines".
> > And note that the two solutions above give different results for
> > lines that appear more than once in B: it is not clear what you want.
> > Surely solutions based on diff or comm will work if you first sort
> > A and B?

>
> > --
> > John.

>
> Since he wants the lines that are in A but not in B,
> I think the order of the files should be reversed.
>
> awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A
>
> Another way:
>
> awk 'NR==FNR{seen[$0]; next} !($0 in seen)' B A
>
> One Ruby solution:
>
> ruby -e 'def lines;gets(nil).split("\n") end; puts lines - lines' A B- Hide quoted text -
>
> - Show quoted text -


Thank you all ---
When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get
awk: syntax error near line 1
awk: bailing out near line 1

=/

Reply With Quote
  #7 (permalink)  
Old 06-27-2008
Chris F.A. Johnson
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On 2007-08-15, tntelle@yahoo.com wrote:
> I am trying to find an easy and fast way to compare two files, each
> with several thousand lines - only one column and spit out what is
> unique only to one of the files.
> So, compare file A and file B, and only lines that re unique to file A
> are spit out to a new file.. comm and diff / sort and Uniq do not
> work because in this case the two files will have non-adjacent lines.


grep -Fvf FileB FileA

--
Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell/>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence
Reply With Quote
  #8 (permalink)  
Old 06-27-2008
tntelle@yahoo.com
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On Aug 15, 10:05 am, Bill Marcum <marcumb...@bellsouth.net> wrote:
> On Wed, 15 Aug 2007 13:53:32 -0000, tnte...@yahoo.com <tnte...@yahoo.com> wrote:
>
> > Thank you all ---
> > When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get
> > awk: syntax error near line 1
> > awk: bailing out near line 1

>
> You are probably using Solaris Old Broken Awk(tm). Use nawk or
> /usr/xpg4/bin/awk.
>
> --
> BOFH excuse #331:
>
> those damn raccoons!


haha
-- okay, got that down BUT now.... (more raccoons)

bash-2.03$ /usr/xpg4/bin/awk 'NR==FNR{seen[$0]++; next} !seen[$0]' ./
all.ldap.users ./all.oracle.users
input file "./all.ldap.users"input file "./all.oracle.users"bash-2.03$

it just says that, doesnt do anything more
= *(

Reply With Quote
  #9 (permalink)  
Old 06-27-2008
tntelle@yahoo.com
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On Aug 15, 2:37 am, William James <w_a_x_...@yahoo.com> wrote:
> John L wrote:
> > <tnte...@yahoo.com> wrote in messagenews:1187142111.587829.251580@q3g2000prf.go oglegroups.com...
> > > I am trying to find an easy and fast way to compare two files, each
> > > with several thousand lines - only one column and spit out what is
> > > unique only to one of the files.
> > > So, compare file A and file B, and only lines that re unique to file A
> > > are spit out to a new file.. comm and diff / sort and Uniq do not
> > > work because in this case the two files will have non-adjacent lines.

>
> > cat A A B | sort |uniq -u
> > awk 'FNR==NR{Seen[$0]++} FNR!=NR && !Seen[$0]' A B

>
> > I am not sure what you mean by "non-adjacent lines".
> > And note that the two solutions above give different results for
> > lines that appear more than once in B: it is not clear what you want.
> > Surely solutions based on diff or comm will work if you first sort
> > A and B?

>
> > --
> > John.

>
> Since he wants the lines that are in A but not in B,
> I think the order of the files should be reversed.
>
> awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A
>
> Another way:
>
> awk 'NR==FNR{seen[$0]; next} !($0 in seen)' B A
>
> One Ruby solution:
>
> ruby -e 'def lines;gets(nil).split("\n") end; puts lines - lines' A B- Hide quoted text -
>
> - Show quoted text -


Thank you all ---
When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get
awk: syntax error near line 1
awk: bailing out near line 1

=/

Reply With Quote
  #10 (permalink)  
Old 06-27-2008
tntelle@yahoo.com
Guest
 
Posts: n/a
Default Re: Comparing two text files with non-adjacent lines for unique entries

On Aug 15, 10:05 am, Bill Marcum <marcumb...@bellsouth.net> wrote:
> On Wed, 15 Aug 2007 13:53:32 -0000, tnte...@yahoo.com <tnte...@yahoo.com> wrote:
>
> > Thank you all ---
> > When i attempt awk 'NR==FNR{seen[$0]++; next} !seen[$0]' B A - i get
> > awk: syntax error near line 1
> > awk: bailing out near line 1

>
> You are probably using Solaris Old Broken Awk(tm). Use nawk or
> /usr/xpg4/bin/awk.
>
> --
> BOFH excuse #331:
>
> those damn raccoons!


haha
-- okay, got that down BUT now.... (more raccoons)

bash-2.03$ /usr/xpg4/bin/awk 'NR==FNR{seen[$0]++; next} !seen[$0]' ./
all.ldap.users ./all.oracle.users
input file "./all.ldap.users"input file "./all.oracle.users"bash-2.03$

it just says that, doesnt do anything more
= *(

Reply With Quote
Reply

Tags
comparing, entries, files, lines, nonadjacent, text, unique


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads

Thread Thread Starter Forum Replies Last Post
Comparing two text files with non-adjacent lines for unique entries tntelle@yahoo.com Unix Shell Programming 5 06-27-2008 11:13 PM
Comparing two text files with non-adjacent lines for unique entries tntelle@yahoo.com Unix Shell Programming 2 06-27-2008 11:13 PM
Comparing two text files with non-adjacent lines for unique entries tntelle@yahoo.com Unix Shell Programming 6 06-27-2008 11:13 PM
Comparing two text files with non-adjacent lines for unique entries tntelle@yahoo.com Unix Shell Programming 0 06-27-2008 11:13 PM
Comparing two text files with non-adjacent lines for unique Unix Shell Programming 10 08-17-2007 01:17 PM


All times are GMT +1. The time now is 02:01 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.2.0