Return to BSD News archive
Received: by minnie.vk1xwt.ampr.org with NNTP id AA1291 ; Tue, 23 Feb 93 14:34:04 EST Xref: sserve comp.unix.bsd:11186 comp.unix.misc:6437 comp.unix.shell:7748 comp.unix.questions:31241 comp.lang.perl:13545 Newsgroups: comp.unix.bsd,comp.unix.misc,comp.unix.shell,comp.unix.questions,comp.lang.perl Path: sserve!manuel.anu.edu.au!munnari.oz.au!cs.mu.OZ.AU!koonda.acci.com.au!ggr From: ggr@koonda.acci.com.au (Greg Rose) Subject: Re: Splitting a file - HELP WANTED Message-ID: <9304710.18886@mulga.cs.mu.OZ.AU> Sender: news@cs.mu.OZ.AU Organization: Australian Computing and Communications Institute References: <C2B89x.IF6@inews.Intel.COM> <1993Feb12.151748.20280@news.eng.convex.com> Date: Mon, 15 Feb 1993 23:08:10 GMT Lines: 27 >From the keyboard of vdalvi@mcd.intel.com (Vishram Dalvi ~): >: Here I am again with another problem. I have two files A and B, >:which contain names of some persons. File B is the subset of file A i.e., >: I want to split file A into two parts C and D - one which matches >:the names in file B and other which doesn't - so that: >: What will be the easiest way to do this using awk ? How do I handle >:multiple files in awk ? It turns out that there is a tool written explicitly for this task -- it is called 'comm' and is very standard (old) -- it was used in the original spell program. Given two (sorted) files, comm produces a three column output which is the lines only in the first file, lines only in the second file, and lines in both. (The fourth column, lines not in either file, is omitted as it gets a bit large :-) Since people typically use this tool to find out just one of the columns, it takes flags -123 to specify columns to leave out. So comm -12 A B >C comm -13 A B >D yields the correct answer as specified above, although as someone pointed out, C and B will be identical if B truly is a subset of A. -- Greg Rose Australian Computing and Communications Institute ggr@acci.com.au +61 18 174 842 `Use of the standard phrase "HIJACKED" may be inadvisable' -- CAA