Issue
I am developing a web scraper using JavaFX webview. For the scraping purpose, I don't need to have the images to be loaded. When the page is being loaded, Webkit spawns lots of UrlLoader thread. So I think it's better to have the images disabled, so I will save lots of system resources. Does anyone know how to disable automatic image loading in Webview?
Solution
Solution Approach
Define your own protocol handler for http and filter out anything with an image mime type or content.
URL.setURLStreamHandlerFactory(new HandlerFactory());
Sample Code
import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.layout.StackPane;
import javafx.scene.web.*;
import javafx.stage.Stage;
import java.io.IOException;
import java.net.*;
public class LynxView extends Application {
private static final String BLANK_IMAGE_LOC =
"https://upload.wikimedia.org/wikipedia/commons/c/ce/Transparent.gif";
public static final String WEBSITE_LOC =
"http://fxexperience.com";
public static final String IMAGE_MIME_TYPE_PREFIX =
"image/";
@Override
public void start(Stage stage) throws Exception {
WebView webView = new WebView();
WebEngine engine = webView.getEngine();
engine.load(WEBSITE_LOC);
stage.setScene(new Scene(new StackPane(webView)));
stage.show();
}
public static void main(String[] args) throws IOException {
URL.setURLStreamHandlerFactory(new URLStreamHandlerFactory() {
@Override
public URLStreamHandler createURLStreamHandler(String protocol) {
if ("http".equals(protocol)) {
return new sun.net.www.protocol.http.Handler() {
@Override
protected URLConnection openConnection(URL url, Proxy proxy) throws IOException {
String[] fileParts = url.getFile().split("\\?");
String contentType = URLConnection.guessContentTypeFromName(fileParts[0]);
// this small hack is required because, weirdly, svg is not picked up by guessContentTypeFromName
// because, for Java 8, svg is not in $JAVA_HOME/lib/content-types.properties
if (fileParts[0].endsWith(".svg")) {
contentType = "image/svg";
}
System.out.println(url.getFile() + " : " + contentType);
if ((contentType != null && contentType.startsWith(IMAGE_MIME_TYPE_PREFIX))) {
return new URL(BLANK_IMAGE_LOC).openConnection();
} else {
return super.openConnection(url, proxy);
}
}
};
}
return null;
}
});
Application.launch();
}
}
Sample Notes
The sample uses concepts from:
The sample only probes the filename to determine the content type and not the input stream attached to the url. Though probing the input stream would be a more accurate way to determine if the resource the url is connected to is actually an image or not, it is slightly less efficient to probe the stream, so the solution presented trades accuracy for efficiency.
The provided solution only demonstrates locations served by a http protocol, and not locations served by a https protocol.
The provided solution uses a sun.net.www.protocol.http.Handler class which may not be publicly visible in Java 9, (so the solution might not work for Java 9).
The urlStreamHandlerFactory is a global setting for the JVM, so once it is set, it will stay that way (e.g. all images for any java.net.URL connections will be ignored).
The sample solution returns a blank (transparent) image, which it loads over the net. For efficiency, the image could be loaded as a resource from the classpath instead of over the net.
You could return a null connection rather a than a connection to a blank image, if you do so, the web view code will start reporting null pointer exceptions to the console because it is not getting the url connection it expects, and will replace all images with an x image to show that the image is missing (I wouldn't really recommend an approach which returned a null connection).
Answered By - jewelsea
Answer Checked By - Mildred Charles (JavaFixing Admin)