Friday, December 7, 2012

HTMLCleaner - way to clean and format html files

HTMLCleaner is an open source html parser. This provides us option to convert ill format html to well format xml file and eliminating comments etc . Using HTMLCleaner, we can directly format the html files on the internet or in local system and store it in local file system.

We can include HTMLCleaner in any project using below dependency - 

<dependency>
 <groupId>net.sourceforge.htmlcleaner</groupId>
 <artifactId>htmlcleaner</artifactId>
 <version>2.2</version>
</dependency>
 Sample of command to perform the cleanup is as mentioned below -
mvn exec:java -Dexec.mainClass="org.htmlcleaner.CommandLine" -Dexec.args="src=C:\\Programming\\WorkSpace\\tempTestIndex.html dest=C:\\Programming\\WorkSpace\\abc.html outputtype=compact omitcomments=true"

For detailed list of available options, kindly  refer the below link -
http://htmlcleaner.sourceforge.net/commandlineuse.php

Reference -http://htmlcleaner.sourceforge.net/index.php

No comments:

Post a Comment