dbacl is a digramic Bayesian text classifier.
Given some text, it calculates the posterior
probabilities that the input resembles one of any
number of previously learned document collections.
It can be used to sort incoming email into
arbitrary categories such as spam, work, and play,
or simply to distinguish an English text from a
French text. It fully supports international
character sets, and uses sophisticated statistical
models based on the Maximum Entropy Principle.
Project Release infomations and Project Resources. Note that these informations are from this projects Freecode.com page and the downloads themselves may not be hosted with SourceForge.JP.
This is a hodge-podge of fixes and improvements. A
new hypex command, the TREC 2005 options files,
and an essay on chess are now in the tarball.
Several improvements to the parsing engine were
made, including a new -e char option and bugfixes.
Compilation problems on various architectures were
fixed, and libslang2 support was added.
This release includes various bugfixes and small usability improvements in the documentation and default switch handling. The major addition is support for the TREC spamjig and improved memory mapping for faster online learning.
This release added a new MAP confidence score (-U, to complement the -X switch), some new scoring types in mailinspect, and a new parsing switch for trace headers in email (-T email:theaders). Category learning now accepts directory names as well as file names, and preliminary work on a new header mining tool (hmine) was performed. Category files are now written in 'portable' format by default.
Many bugs were discovered and fixed. A test suite
was added to prevent future regressions. It can be
called using make check. Memory management was
improved, giving a large speedup in classification
speed, and a putative confidence score is now
available via an -X switch. Some documentation
changes were made.