aboutsummaryrefslogtreecommitdiff
path: root/pyaggr3g470r/crawler.py
Commit message (Collapse)AuthorAge
* Test with the old crawler (temporary during the transition).Cédric Bonhomme2015-03-04
|
* unused import.Cédric Bonhomme2015-02-22
|
* Indexation is now restored.Cédric Bonhomme2015-02-22
|
* bug fix...Cédric Bonhomme2015-02-22
|
* Prevents BeautifulSoup4 from adding extra <html><body> tags to the soup with ↵Cédric Bonhomme2015-02-22
| | | | the lxml parser.
* This test will be used for some weeks in order to avoid duplicates with the ↵Cédric Bonhomme2015-02-19
| | | | new article id (entry_id).
* It is now unseless to test the value of article.date at this point.Cédric Bonhomme2015-02-19
|
* Alembic is magic!Cédric Bonhomme2015-02-18
|
* Minor changes in the crawler (test of asyncio.async).Cédric Bonhomme2015-02-12
|
* Time to sleep.Cédric Bonhomme2015-02-11
|
* Some minor improvements concerning the parsing of the article publication date.Cédric Bonhomme2015-02-11
|
* In the case it is not possible to resolve the URL of an article we just ↵Cédric Bonhomme2015-02-11
| | | | ignore the problem and continue.
* Oh my god.Cédric Bonhomme2015-02-11
|
* Fixed an other bug in the new crawler...Cédric Bonhomme2015-02-11
|
* bug when the list of feeds to fetch is emptyCédric Bonhomme2015-02-09
|
* Misc improvements for the crawler. A semaphore is used to limit the number ↵Cédric Bonhomme2015-02-08
| | | | of simultaneous connection.
* Fetch all feeds of the list (not only the first 20 feeds).Cédric Bonhomme2015-02-04
|
* Get the feeds with aiohttp.Cédric Bonhomme2015-02-04
|
* Test if we effectively have retrieved some articles.Cédric Bonhomme2015-01-21
|
* clean_url is now working with Python3Cédric Bonhomme2015-01-21
|
* Misc fixes to the crawler.Cédric Bonhomme2015-01-21
|
* Added link to examples.Cédric Bonhomme2015-01-21
|
* First implementation with asyncio (not really async for the moment).Cédric Bonhomme2015-01-21
|
* Updated years f copyright.Cédric Bonhomme2015-01-03
|
* Updated comment.Cédric Bonhomme2015-01-02
|
* Hack: Re-add sslwrap to Python 2.7.9.Cédric Bonhomme2015-01-02
|
* Import urlib.request for Python 3.Cédric Bonhomme2015-01-02
|
* Notifications functions and functions to send emails are now in separated files.Cédric Bonhomme2014-11-20
|
* Updated some comments.Cédric Bonhomme2014-11-19
|
* When the title is not found.Cédric Bonhomme2014-11-09
|
* Log 'bozo' exception.Cédric Bonhomme2014-11-08
|
* Configuration variables has been updated.Cédric Bonhomme2014-08-18
|
* minor changes.Cédric Bonhomme2014-07-13
|
* Timeout of 5 seconds for all sockets.Cédric Bonhomme2014-07-13
|
* Update headers tags.Cédric Bonhomme2014-07-13
|
* Performance improvement for the crawler (database insertion step).Cédric Bonhomme2014-07-13
|
* Minor improvemnts for the crawler.Cédric Bonhomme2014-07-13
|
* if the crawler is not able to get the link of the article, continue.Cédric Bonhomme2014-06-21
|
* fixes #7Cédric Bonhomme2014-06-10
|
* making pyagregator runnable by apacheFrançois Schmidts2014-06-09
| | | | | | | | | * adding bootstrap module for basic import * redoing logging (config, proper use of the logging module) * making secret part of config (random wouldn't work with apache since it uses different instances of python) * making server entry point not executing application if just imported * not writing file for opml when we can read it from memory
* supporting feed without date or with ill formated dateFrançois Schmidts2014-06-08
|
* Removed unused variable.Cédric Bonhomme2014-05-03
|
* keep the original title.Cédric Bonhomme2014-05-03
|
* Using lxml parser instead of html.parser, fixes #4.Cédric Bonhomme2014-05-03
|
* Better to send email without Flask-Mail.Cédric Bonhomme2014-04-27
|
* Improved code readability.Cédric Bonhomme2014-04-27
|
* Cleaned code.Cédric Bonhomme2014-04-27
|
* Separate indexes by users.Cédric Bonhomme2014-04-23
|
* Autoindexation of new articles (not on Heroku).Cédric Bonhomme2014-04-23
|
* Updated comments and log messages.Cédric Bonhomme2014-04-13
|
bgstack15