http://grokbase.com/t/hbase/user/125ya2cxxs/scan-addfamily-vs-familyfilter-equal
http://stackoverflow.com/questions/7256100/scan-with-filter-using-hbase-shell
Just to add on.
The java doc clearly says in FamilyFilter that* If an already known column family is looked for, use {@linkorg.apache.hadoop.hbase.client.Get#addFamily(byte[])}* directly rather than a filter.So addFamily should be better.RegardsRam -----Original Message----- From: Anoop Sam John Sent: Thursday, May 31, 2012 11:49 AM To: user@hbase.apache.org Subject: RE: Scan addFamily vs FamilyFilter(EQUAL, ...) Hi, As per my understanding of the Scan code in your scenario where you want to go with scanning of some CFs ( not all) You go with Scan#addFamily. The FamilyFilter also doing the same thing. But there is a difference in the performance. When one specify the CFs in the scan, the scanner will be created for only those many Stores. For the other CFs, there wont be any scanners and so those stores are not scanned. ( The HFile data is not fetched ) Instead when one use the FamilyFilter and not specify any specific columns (using Scan#addFamily) all the stores will get scanned and data will get fetched from HFiles. Later these KVs corresponding to which you needed (as per your FamilyFilter) only will get included in the Result and others just avoided. So there will be performance difference I feel.. Correct me if I am wrong pls... @Stack
One thing I ran into when using the Scan.addFamily / Scan.addColumn is
that those two methods overwrite each other. In the Scan#addColumn javadoc it is clearly telling about this overwrites... So this seems intentionally done correct? -Anoop- ________________________________________ From: saint.ack@gmail.com [saint.ack@gmail.com] on behalf of Stack [stack@duboce.net] Sent: Wednesday, May 30, 2012 11:13 PM To: user@hbase.apache.org Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...) On Wed, May 30, 2012 at 9:59 AM, Kevin wrote:I am curious and trying to learn which method is best when wanting to limita scan to a particular column or column family. The Scan class carries aFilter instance and a TreeMap of the family map and I am unsure how theyget carried through to the server-side functionality. In terms ofperformance is there any difference between doing Scan.addFamily(x) andScan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)?
There is probably not noticeable difference in performance but Scan#addFamily is the more natural way of expressing column family scoping. St.Ack