aboutsummaryrefslogtreecommitdiff
path: root/pyaggr3g470r/crawler.py
Commit message (Collapse)AuthorAge
* It is recommended to close the loop at the end of the procesus ↵Cédric Bonhomme2015-06-02
| | | | (https://groups.google.com/forum/#!topic/python-tulip/8bRLexUzeU4).
* Run the asyncio loop in a try block.Cédric Bonhomme2015-05-27
|
* Try to fix a problem with Python 3.4.3 and test 'ensure_future' with Python ↵Cédric Bonhomme2015-05-27
| | | | dev (3.5).
* No need to set feed.enabled in the asyncio crawler. Just to increase the ↵Cédric Bonhomme2015-05-27
| | | | counter.
* Full text seaerch with Whoosh has been removed.Cédric Bonhomme2015-04-22
|
* Removed debug print.Cédric Bonhomme2015-04-08
|
* The minimum error count is now specified in the configuration file.Cédric Bonhomme2015-04-08
|
* Better handling of the error logging in the crawler.Cédric Bonhomme2015-03-08
|
* Disable the feed when more than 2 erros (test).Cédric Bonhomme2015-03-05
|
* Minor update to the 'feed' template.Cédric Bonhomme2015-03-05
|
* Take advantage of some new fields of the Feed objects.Cédric Bonhomme2015-03-05
|
* Test with the old crawler (temporary during the transition).Cédric Bonhomme2015-03-04
|
* unused import.Cédric Bonhomme2015-02-22
|
* Indexation is now restored.Cédric Bonhomme2015-02-22
|
* bug fix...Cédric Bonhomme2015-02-22
|
* Prevents BeautifulSoup4 from adding extra <html><body> tags to the soup with ↵Cédric Bonhomme2015-02-22
| | | | the lxml parser.
* This test will be used for some weeks in order to avoid duplicates with the ↵Cédric Bonhomme2015-02-19
| | | | new article id (entry_id).
* It is now unseless to test the value of article.date at this point.Cédric Bonhomme2015-02-19
|
* Alembic is magic!Cédric Bonhomme2015-02-18
|
* Minor changes in the crawler (test of asyncio.async).Cédric Bonhomme2015-02-12
|
* Time to sleep.Cédric Bonhomme2015-02-11
|
* Some minor improvements concerning the parsing of the article publication date.Cédric Bonhomme2015-02-11
|
* In the case it is not possible to resolve the URL of an article we just ↵Cédric Bonhomme2015-02-11
| | | | ignore the problem and continue.
* Oh my god.Cédric Bonhomme2015-02-11
|
* Fixed an other bug in the new crawler...Cédric Bonhomme2015-02-11
|
* bug when the list of feeds to fetch is emptyCédric Bonhomme2015-02-09
|
* Misc improvements for the crawler. A semaphore is used to limit the number ↵Cédric Bonhomme2015-02-08
| | | | of simultaneous connection.
* Fetch all feeds of the list (not only the first 20 feeds).Cédric Bonhomme2015-02-04
|
* Get the feeds with aiohttp.Cédric Bonhomme2015-02-04
|
* Test if we effectively have retrieved some articles.Cédric Bonhomme2015-01-21
|
* clean_url is now working with Python3Cédric Bonhomme2015-01-21
|
* Misc fixes to the crawler.Cédric Bonhomme2015-01-21
|
* Added link to examples.Cédric Bonhomme2015-01-21
|
* First implementation with asyncio (not really async for the moment).Cédric Bonhomme2015-01-21
|
* Updated years f copyright.Cédric Bonhomme2015-01-03
|
* Updated comment.Cédric Bonhomme2015-01-02
|
* Hack: Re-add sslwrap to Python 2.7.9.Cédric Bonhomme2015-01-02
|
* Import urlib.request for Python 3.Cédric Bonhomme2015-01-02
|
* Notifications functions and functions to send emails are now in separated files.Cédric Bonhomme2014-11-20
|
* Updated some comments.Cédric Bonhomme2014-11-19
|
* When the title is not found.Cédric Bonhomme2014-11-09
|
* Log 'bozo' exception.Cédric Bonhomme2014-11-08
|
* Configuration variables has been updated.Cédric Bonhomme2014-08-18
|
* minor changes.Cédric Bonhomme2014-07-13
|
* Timeout of 5 seconds for all sockets.Cédric Bonhomme2014-07-13
|
* Update headers tags.Cédric Bonhomme2014-07-13
|
* Performance improvement for the crawler (database insertion step).Cédric Bonhomme2014-07-13
|
* Minor improvemnts for the crawler.Cédric Bonhomme2014-07-13
|
* if the crawler is not able to get the link of the article, continue.Cédric Bonhomme2014-06-21
|
* fixes #7Cédric Bonhomme2014-06-10
|
bgstack15