Goose - Article Extractor now open source, same as Flipboard/Instapaper

December 21, 2010

Today I'm releasing a project I worked on for http://gravity.com. It's an HTML Article Extractor ala Flipboard / Instapaper style. It will take an article, run some calculations on it and give you an Article object back with the text of the extracted Article as well as the main image that we think is relevant to the article.

https://github.com/jiminoc/goose/wiki

The goal is to create an open source article extractor for use with open source applications, crawlers or academic NLP processing initiatives.

see more at the new blog: http://jimplush.com

Comments

RSS feed for comments on this post.

Leave a Comment

Line and paragraph breaks automatic, HTML allowed: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <code> <em> <i> <strike> <strong>

Comments disabled due to spammers being losers that lead sad lives.