Issue
I'm executing the following code with JSoup
Document parse = Jsoup.connect("http://www.google.com/movies?near=<MyCity>&sort=1&start=0")
.followRedirects(true)
.ignoreContentType(true)
.timeout(12000)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer("http://www.google.com")
.execute()
.parse();
Elements elements = parse.select(".movie_results .movie");
but when I inspect elements
, it clearly miss a lot of content. I'm trying to get movie title and description from the page above.
What am I missing? Can this be related to missing header parameters, cookies? Is there any other lib that could solve the problem?
I culd reproduce the same problem by executing:
curl http://www.google.com/movies?near=<MyCity>&sort=1&start=0 > page.html
ProTip
Just highlighting one of the comments: try.jsoup.org is a good place to start using Jsoup. It helps you to parse the html in a very clean way.
Please, +1 if you liked the tip and saved your day :D
Solution
After some investigation using Google Chrome Dev Tools, I figured out that some header info was missing. The final code is similar to this one:
Jsoup.connect(url)
.followRedirects(true)
.ignoreContentType(true)
.timeout(12000) // optional
.header("Accept-Language", "pt-BR,pt;q=0.8") // missing
.header("Accept-Encoding", "gzip,deflate,sdch") // missing
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36") // missing
.referrer("http://www.google.com") // optional
.execute()
.parse();
Thanks for your answers!
Answered By - MatheusJardimB
Answer Checked By - Robin (JavaFixing Admin)