Issue
i have n number of files uploaded to amazon S3 i need*search* those files based on occurrence of an string in its contents , i tried one method of downloading the files from S3 bucket converting input stream to string and then search for the word in content , but if their are more than five to six files it takes lot of time to do the above process,
is their any other way to do this , please help thanks in advance.
Solution
I am not familiar with Amazon S3, but the general way to deal with searching remote files is to use indexing, with the index itself being stored on the remote server. That way each search will use the index to deduce a relatively small number of potential matching files and only those will be scanned directly to verify if they are indeed a match or not. Depending on your search terms and the complexity of the pattern, it might even be possible to avoid the direct file scan altogether.
That said, I do not know whether Amazon S3 has an indexing engine that you can use or whether there are supplemental libraries that do that for you, but the concept is simple enough that you should be able to get something working by yourself without too much work.
EDIT:
Generally the tokens that exist in each file are what is indexed. For example if you want to search for "foo bar" the index will tell you which files contain "foo" and which contain "bar". The cross-section of these results will be the files that contain both "foo" and "bar". You will have to scan those files directly to select those (if any) where "foo" and "bar" are right next to each other in the right order.
In any case, the amount of data that is downloaded to the client would be far less than downloading and scanning everything, although that would also depend on how your files are structured and what your search patterns look like.
Answered By - thkala
Answer Checked By - Dawn Plyler (JavaFixing Volunteer)