JSoup UserAgent, how to set it right?

You might try setting the referrer header as well: doc = Jsoup.connect(“https://www.facebook.com/”) .userAgent(“Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6”) .referrer(“http://www.google.com”) .get();

jsoup – strip all formatting and link tags, keep text only

With Jsoup: final String html = “<p> <span> foo </span> <em> bar <a> foobar </a> baz </em> </p>”; Document doc = Jsoup.parse(html); System.out.println(doc.text()); Output: foo bar foobar baz If you want only the text of p-tag, use this instead of doc.text(): doc.select(“p”).text(); … or only body: doc.body().text(); Linebreak: final String html = “<p><strong>Tarthatatlan biztonsági viszonyok</strong></p>” … Read more

jsoup posting and cookie

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session. You can get the cookie like this: Connection.Response res = Jsoup.connect(“http://www.example.com/login.php”) .data(“username”, “myUsername”, “password”, “myPassword”) .method(Method.POST) .execute(); Document doc = res.parse(); String sessionId = res.cookie(“SESSIONID”); // you will need … Read more