*BSD News Article 11133

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA1291 ; Tue, 23 Feb 93 14:34:04 EST
Xref: sserve comp.unix.bsd:11186 comp.unix.misc:6437 comp.unix.shell:7748 comp.unix.questions:31241 comp.lang.perl:13545
Newsgroups: comp.unix.bsd,comp.unix.misc,comp.unix.shell,comp.unix.questions,comp.lang.perl
Path: sserve!manuel.anu.edu.au!munnari.oz.au!cs.mu.OZ.AU!koonda.acci.com.au!ggr
From: ggr@koonda.acci.com.au (Greg Rose)
Subject: Re: Splitting a file - HELP WANTED
Message-ID: <9304710.18886@mulga.cs.mu.OZ.AU>
Sender: news@cs.mu.OZ.AU
Organization: Australian Computing and Communications Institute
References: <C2B89x.IF6@inews.Intel.COM> <1993Feb12.151748.20280@news.eng.convex.com>
Date: Mon, 15 Feb 1993 23:08:10 GMT
Lines: 27

>From the keyboard of vdalvi@mcd.intel.com (Vishram Dalvi ~):
>:	Here I am again with another problem. I have two files A and B,
>:which contain names of some persons. File B is the subset of file A i.e.,
>:	I want to split file A into two parts C and D - one which matches 
>:the names in file B and other which doesn't - so that:
>:	What will be the easiest way to do this using awk ? How do I handle
>:multiple files in awk ? 

It turns out that there is a tool written explicitly for this task --
it is called 'comm' and is very standard (old) -- it was used in the
original spell program.

Given two (sorted) files, comm produces a three column output which is
the lines only in the first file, lines only in the second file, and
lines in both. (The fourth column, lines not in either file, is
omitted as it gets a bit large :-)

Since people typically use this tool to find out just one of the
columns, it takes flags -123 to specify columns to leave out. So
  comm -12 A B >C
  comm -13 A B >D
yields the correct answer as specified above, although as someone
pointed out, C and B will be identical if B truly is a subset of A.
--
Greg Rose                 Australian Computing and Communications Institute
ggr@acci.com.au                                              +61 18 174 842
`Use of the standard phrase "HIJACKED" may be inadvisable' -- CAA