cache¶
Caching utilities
-
feupy.cache.
cache
¶ A persistent dictionary-like object whose values are structured in the following way:
{url0 : (timeout0, html0),url1 : (timeout1, html1),…}In which url is a string, timeout is an int or a float (which represents the “due by date” as seconds since epoch), and html is a string
Type: shelve.DbfilenameShelf
or None (Initially, seeload_cache()
)
-
feupy.cache.
load_cache
(flag='c', path=None)¶ Loads the cache from disk and stores it in the variable
cache
. Ifcache
is different than None, the function will do nothing.Parameters: - flag (
str
, optional) – The flag parameter, see https://docs.python.org/3/library/dbm.html#dbm.open - path (
str
orNone
, optional) – The path of the directory where the cache is stored. It defaults to this file’s folder path
Note
Unless you intend to call
load_cache()
with non-default arguments, you don’t have to call this function. The other functions in this module check whether or not the cache has been loaded and will load the cache for you.Example:
from feupy import cache cache.load_cache()
- flag (
-
feupy.cache.
get_html
(url, params={}, use_cache=True)¶ More or less functionally equivalent to
requests.get(url, params).text
, with the added benefit of a persistent cache with customizable html treatment and timeouts, depending on the url. If the result is already in cache and is valid, the function will just return the value from the cache instead of making a web request.Parameters: - url (str) – The url of the html to be fetched
- params (
dict
, optional) – the query portion of the url, should you want to include a query - use_cache (
bool
, optional) – If this value is set to True, the cache will be checked for the url. If the url is not found in the cache keys or has timed out, the function will get the html from the web, remove scripts and styles from the html, store it in cache, and finally return the html. Otherwise, if it’s set to False, the cache will not be checked
Returns: A string which is the html from the requested page url
Note
The curricular units’ pages, along with the students’ and teachers’ htmls, are modified to reduce their memory footprint.
Note
If you know that you are going to make a crapton of requests beforehand, you probably should call
get_html_async()
first to populate the cache.
-
feupy.cache.
reset
()¶ Eliminates all entries from the cache
-
feupy.cache.
remove_invalid_entries
(urls=None)¶ Removes all the cache entries in urls that have timed out.
Parameters: urls ( iterable(str)
orNone
, optional) – The urls to be checked. If this argument is left untouched, all urls in the cache will be checked
-
feupy.cache.
get_html_async
(urls, n_workers=10, use_cache=True)¶ get_html()
, but async, give or take.Takes a list (or any iterable) of urls and returns a corresponding generator of htmls. The htmls have their scripts and styles removed and are stored in cache.
Parameters: - urls (iterable(str)) – The urls to be accessed
- n_workers (
int
, optional) – The number of workers. - use_cache (
bool
, optional) – Attempts to use the cache if True, otherwise it will fetch from sigarra
Returns: An str generator