NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
A Java implementation of a flexible and extensible web spider engine.
Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, crea...