cache¶

Caching utilities

feupy.cache.cache¶

A persistent dictionary-like object whose values are structured in the following way:

{

url0 : (timeout0, html0),
url1 : (timeout1, html1),
…

}

In which url is a string, timeout is an int or a float (which represents the “due by date” as seconds since epoch), and html is a string

Type:	`shelve.DbfilenameShelf` or None (Initially, see `load_cache()`)

feupy.cache.load_cache(flag='c', path=None)¶

Loads the cache from disk and stores it in the variable cache. If cache is different than None, the function will do nothing.

Parameters:	flag (`str`, optional) – The flag parameter, see https://docs.python.org/3/library/dbm.html#dbm.open path (`str` or `None`, optional) – The path of the directory where the cache is stored. It defaults to this file’s folder path

Note

Unless you intend to call load_cache() with non-default arguments, you don’t have to call this function. The other functions in this module check whether or not the cache has been loaded and will load the cache for you.

Example:

from feupy import cache
cache.load_cache()

feupy.cache.get_html(url, params={}, use_cache=True)¶

More or less functionally equivalent to requests.get(url, params).text, with the added benefit of a persistent cache with customizable html treatment and timeouts, depending on the url. If the result is already in cache and is valid, the function will just return the value from the cache instead of making a web request.

Parameters:

url (str) – The url of the html to be fetched
params (dict, optional) – the query portion of the url, should you want to include a query
use_cache (bool, optional) – If this value is set to True, the cache will be checked for the url. If the url is not found in the cache keys or has timed out, the function will get the html from the web, remove scripts and styles from the html, store it in cache, and finally return the html. Otherwise, if it’s set to False, the cache will not be checked

Returns:

A string which is the html from the requested page url

Note

The curricular units’ pages, along with the students’ and teachers’ htmls, are modified to reduce their memory footprint.

Note

If you know that you are going to make a crapton of requests beforehand, you probably should call get_html_async() first to populate the cache.

feupy.cache.reset()¶: Eliminates all entries from the cache

feupy.cache.remove_invalid_entries(urls=None)¶

Removes all the cache entries in urls that have timed out.

Parameters:	urls (`iterable(str)` or `None`, optional) – The urls to be checked. If this argument is left untouched, all urls in the cache will be checked

feupy.cache.get_html_async(urls, n_workers=10, use_cache=True)¶

get_html(), but async, give or take.

Takes a list (or any iterable) of urls and returns a corresponding generator of htmls. The htmls have their scripts and styles removed and are stored in cache.

Parameters:	urls (iterable(str)) – The urls to be accessed n_workers (`int`, optional) – The number of workers. use_cache (`bool`, optional) – Attempts to use the cache if True, otherwise it will fetch from sigarra
Returns:	An str generator

cache¶

feupy

Navigation

Related Topics