aboutsummaryrefslogtreecommitdiff
path: root/pyaggr3g470r/crawler.py
Commit message (Expand)AuthorAge
* Run the asyncio loop in a try block.Cédric Bonhomme2015-05-27
* Try to fix a problem with Python 3.4.3 and test 'ensure_future' with Python d...Cédric Bonhomme2015-05-27
* No need to set feed.enabled in the asyncio crawler. Just to increase the coun...Cédric Bonhomme2015-05-27
* Full text seaerch with Whoosh has been removed.Cédric Bonhomme2015-04-22
* Removed debug print.Cédric Bonhomme2015-04-08
* The minimum error count is now specified in the configuration file.Cédric Bonhomme2015-04-08
* Better handling of the error logging in the crawler.Cédric Bonhomme2015-03-08
* Disable the feed when more than 2 erros (test).Cédric Bonhomme2015-03-05
* Minor update to the 'feed' template.Cédric Bonhomme2015-03-05
* Take advantage of some new fields of the Feed objects.Cédric Bonhomme2015-03-05
* Test with the old crawler (temporary during the transition).Cédric Bonhomme2015-03-04
* unused import.Cédric Bonhomme2015-02-22
* Indexation is now restored.Cédric Bonhomme2015-02-22
* bug fix...Cédric Bonhomme2015-02-22
* Prevents BeautifulSoup4 from adding extra <html><body> tags to the soup with ...Cédric Bonhomme2015-02-22
* This test will be used for some weeks in order to avoid duplicates with the n...Cédric Bonhomme2015-02-19
* It is now unseless to test the value of article.date at this point.Cédric Bonhomme2015-02-19
* Alembic is magic!Cédric Bonhomme2015-02-18
* Minor changes in the crawler (test of asyncio.async).Cédric Bonhomme2015-02-12
* Time to sleep.Cédric Bonhomme2015-02-11
* Some minor improvements concerning the parsing of the article publication date.Cédric Bonhomme2015-02-11
* In the case it is not possible to resolve the URL of an article we just ignor...Cédric Bonhomme2015-02-11
* Oh my god.Cédric Bonhomme2015-02-11
* Fixed an other bug in the new crawler...Cédric Bonhomme2015-02-11
* bug when the list of feeds to fetch is emptyCédric Bonhomme2015-02-09
* Misc improvements for the crawler. A semaphore is used to limit the number of...Cédric Bonhomme2015-02-08
* Fetch all feeds of the list (not only the first 20 feeds).Cédric Bonhomme2015-02-04
* Get the feeds with aiohttp.Cédric Bonhomme2015-02-04
* Test if we effectively have retrieved some articles.Cédric Bonhomme2015-01-21
* clean_url is now working with Python3Cédric Bonhomme2015-01-21
* Misc fixes to the crawler.Cédric Bonhomme2015-01-21
* Added link to examples.Cédric Bonhomme2015-01-21
* First implementation with asyncio (not really async for the moment).Cédric Bonhomme2015-01-21
* Updated years f copyright.Cédric Bonhomme2015-01-03
* Updated comment.Cédric Bonhomme2015-01-02
* Hack: Re-add sslwrap to Python 2.7.9.Cédric Bonhomme2015-01-02
* Import urlib.request for Python 3.Cédric Bonhomme2015-01-02
* Notifications functions and functions to send emails are now in separated files.Cédric Bonhomme2014-11-20
* Updated some comments.Cédric Bonhomme2014-11-19
* When the title is not found.Cédric Bonhomme2014-11-09
* Log 'bozo' exception.Cédric Bonhomme2014-11-08
* Configuration variables has been updated.Cédric Bonhomme2014-08-18
* minor changes.Cédric Bonhomme2014-07-13
* Timeout of 5 seconds for all sockets.Cédric Bonhomme2014-07-13
* Update headers tags.Cédric Bonhomme2014-07-13
* Performance improvement for the crawler (database insertion step).Cédric Bonhomme2014-07-13
* Minor improvemnts for the crawler.Cédric Bonhomme2014-07-13
* if the crawler is not able to get the link of the article, continue.Cédric Bonhomme2014-06-21
* fixes #7Cédric Bonhomme2014-06-10
* making pyagregator runnable by apacheFrançois Schmidts2014-06-09
bgstack15