xargs can be used with parallel mode: -P to make best use of current multi-core CPU. When grepping a large code base for some keyword, it is much quicker if we can make full use of the cores. But of course the overall speed not only depends on parallel grepping but also the speed of the storage.
For a 4-core computer, you can easily accelerate your grepping speed by using find with 4 parallel grep, each with 10 files at a time, at current folder:
find . -type f | xargs -P 4 -n 10 grep -Hn "what_to_grep"
Make it an shell function and you're good to go:
function ppg {
find . -type f | xargs -P 4 -n 10 grep --color=always -Hn "$1"
}
$ ~/build_trees/android-current/kernel$ ppg ehci_resume
./drivers/usb/host/ehci-spear.c:53: ehci_resume(hcd, false);
./drivers/usb/host/ehci-pci.c:365: if (ehci_resume(hcd, hibernated) != 0)
./drivers/usb/host/ehci-msm.c:175: ehci_resume(hcd, false);
....
(of course you can alternatively use some other search tools by indexing the code)
Now what if you want to see the progress? When you consider using parallel grepping, it often means it takes a long time to scan every file for what you want.
A good tool "pv" is already available. However, for pv to make sense to you, you should give it a hint of how large your data set is. In other words, you have to count how many files are there to grep.
To recursively count files under a folder, a typical and straightforward way is to use find with wc -l, with the first part being identical to what is to be fed into xargs in the first example:
find . -type f | wc -l
For instance, the total file count under frameworks/ folder of an android tree is about 30k, which takes about 5s on my notebook:
$ time find ~/trees/android/frameworks -type f | wc -l
30503
real 0m4.950s
Combining everything together, we make it another function:
function ppgv {
find . -type f | pv -cN GREP -i 0.5 -l -s `find . -type f | wc -l` |xargs -P 4 -n 10 grep --color=always -Hn "$1"
}
When searching something that doesn't exist on a kernel folder, the one with pv takes a little bit longer:
$ time ppg doesnt-exist1
real 0m1.168s
user 0m0.140s
sys 0m0.380s
$ time ppgv doesnt-exist2
GREP: 45.3k 0:00:01 [ 39k/s] [================>] 100%
real 0m1.339s
user 0m0.500s
sys 0m0.524s
While we can benefit by the progress bar and also ETA, the extra cost doesn't seem to be too much.