cupage

cupage checks web pages and displays changes from the last run that match a given criteria. Its original purpose was to check web pages for new software releases, but it is easily configurable and can be used for other purposes.

It is written in Python, and requires v2.6 or later. cupage is released under the GPL v3

Contents:

Background

I had been looking for a better way to help me keep on top of software releases for the projects I’m interested in, be that either personally or for things we use at work.

Some projects have Atom feeds, some have mailing lists just for release updates, some post updates on sites like freshmeat and some have no useful update watching mechanism at all. Tracking all these resources is annoying and a simple unified solution would be much more workable.

cupage is that solution, at least for my purposes. Maybe it could be for you too!

Database

With a local, unified tool we would instantly gain easy access to the updates database for use from other tools and applications.

JSON was chosen as it is simple to read and write, especially so from Python using the json module [1].

The database is a simple serialisation of the cupage.Sites object. The cupage.Sites object is a container for cupage.Site objects. Only persistent data from cupage.Site objects that can not be regenerated from the configuration file is stored in the database, namely last check time stamp and the current matches.

matches is an array, and contains the string matches of previous cupage runs.

checked is the offset in seconds from the Unix epoch that the site was last checked. It is normally a float, but may be null.

An example database file could be:

{
    "geany-plugins": {
        "matches": [
            "geany-plugins-0.17.1.tar.bz2",
            "geany-plugins-0.17.1.tar.gz",
            "geany-plugins-0.17.tar.bz2",
            "geany-plugins-0.17.tar.gz",
            "geany-plugins-0.18.tar.bz2",
            "geany-plugins-0.18.tar.gz"
        ],
        "checked": 1256677592.0
    },
    "interlude": {
        "matches": [
            "interlude-1.0.tar.gz"
        ],
        "checked": null
    }
}
[1]Pickle was used in versions prior to 0.3.0. The switch was made as Pickle provided no benefits over JSON, and some significant drawbacks including the lack of support for reading it from other languages.

Configuration

cupage stores its configuration in ~/.cupage.conf by default, although you can specify a different location with the cupage list -f command line option.

The configuration file is a INI format file, with a section for each site definition. The section header is the site’s name which will be displayed in the update output, or used to select individual sites to check on the command line. Each section consists of a section of name=value option pairs.

An example configuration file is below:

[pep8]
site = pypi
match_type = tar
[pydelicious]
site = google code
match_type = zip
[pyisbn]
url = http://www.jnrowe.ukfsn.org/_downloads/
select = pre > a
match_type = tar
frequency = 6m
[upoints]
url = http://www.jnrowe.ukfsn.org/_downloads/
select = pre > a
match_type = tar
[fruity]
site = vim-script
script = 1871
[cupage]
site = github
user = JNRowe
frequency = 1m

Site definitions can either be specified entirely manually, or possibly with the built-in site matchers(see site option for available options).

frequency option

The frequency option allows you to set a minimum time between checks for specific sites within the configuration file.

The format is <value> <units> where value can be a integer or float, and units must be one of the entries from the table below:

Unit Purpose
h Hours
d Days
w Week
m Month, which is defined as 28 days
y Year, which is defined as 13 m units

match option

If match_type is re then match must be a valid regular expression that will be used to match within the selected elements. For most common uses a prebuilt match_type already exists(see match_type option), and re should really only be used as a last resort.

The Python re module is used, and any functionality allowed by the module is available in the match option(with the notable exception of the verbose syntax).

match_type option

The match_type value, if used, must be one of the following:

Match type Purpose
gem to match rubygems archives.
re to define custom regular expressions
tar to match gzip/bzip2/xz compressed tar archives(default)
zip to match zip archives

The match_type values simply select a predefined regular expression to use. The base match is <name>-[\d\.]+([_-](pre|rc)[\d]+)?\.<type>, where <name> is the section name and <type> is the value of match_type for this section.

select option

The select option, if used, must be a valid CSS or XPath selector depending on the value of selector (see selector option) . Unless specified CSS Cascading Style Sheets) is the default selector type.

selector option

The selector option, if used, must be one of the following:

Selector Purpose
css To select elements within the page using CSS selectors (default)
xpath To select elements within the page using XPath selectors

site option

The site option, if used, must be one of the following, hopefully self-explanatory values:

Site Added Required options
cpan v0.4.0  
debian v0.3.0  
failpad v0.5.0  
github v0.3.1 user (GitHub user name)
google code v0.1.0  
hackage v0.1.0  
pypi v0.1.0  
vim-script v0.3.0 script (script id on the vim website)

site options are simply shortcuts that are provided to reduce duplication in the configuration file. They define the values necessary to check for updates on the given site.

url option

The url value is the location of the page to be checked for updates. If used, it must be a valid FTP/HTTP/HTTPS address.

Usage

The cupage is run from the command prompt, and displays updates on stdout.

Options

--version

show program’s version number and exit

-h, --help

show this help message and exit

-v, --verbose

produce verbose output

-q, --quiet

output only matches and errors

Commands

add - add definition to config file

-f <file>, --config <file>

configuration file to read

-s <site>, --site <site>

site helper to use

-u <url>, --url <url>

site url to check

-t <type>, --match-type <type>

pre-defined regular expression to use

-m <regex>, --match <regex>

regular expression to use with –match-type=re

-q <frequency>, --frequency <frequency>

update check frequency

-x <selector>, --select <selector>

content selector

--selector <type>

selector method to use

check - check sites for updates

-f <file>, --config <file>

configuration file to read

-d <file>, --database <file>

database to store page data to. Default based on --config value, for example --config my_conf will result in a default setting of --database my_conf.db.

See Database for details of the database format.

-c <dir>, --cache <dir>

directory to store page cache

This can, and in fact should be, shared between all cupage uses.

--no-write

don’t update cache or database

--force

ignore frequency checks

-t <n>, --timeout=<n>

timeout for network operations

list - list definitions from config file

-f <file>, --config <file>

configuration file to read

-m <regex>, --match <regex>

match sites using regular expression

list-sites - list supported site values

remove - remove site from config

-f <file>, --config <file>

configuration file to read

cupage.py

Check for updates on web pages

Author:James Rowe <jnrowe@gmail.com>
Date:2010-01-23
Copyright:GPL v3
Manual section:1
Manual group:Networking

SYNOPSIS

cupage.py [option]... <command>

DESCRIPTION

cupage checks web pages and displays changes from the last run that match a given criteria. Its original purpose was to check web pages for new software releases, but it is easily configurable and can be used for other purposes.

OPTIONS

--version show program’s version number and exit
-h, --help show this help message and exit
-v, --verbose produce verbose output
-q, --quiet output only matches and errors

COMMANDS

add

Add definition to config file

-f <file>, --config <file>
 configuration file to read
-s <site>, --site <site>
 site helper to use
-u <url>, --url <url>
 site url to check
-t <type>, --match-type <type>
 pre-defined regular expression to use
-m <regex>, --match <regex>
 regular expression to use with –match-type=re
-q <frequency>, --frequency <frequency>
 update check frequency
-x <selector>, --select <selector>
 content selector
--selector <type>
 selector method to use

check

Check sites for updates

-f <file>, --config <file>
 configuration file to read
-d <file>, --database <file>
 

database to store page data to. Default based on cupage check -f value, for example --config my_conf will result in a default setting of --database my_conf.db.

See Database for details of the database format.

-c <dir>, --cache <dir>
 

directory to store page cache

This can, and in fact should be, shared between all cupage uses.

--no-write don’t update cache or database
--force ignore frequency checks
-t <n>, --timeout=<n>
 timeout for network operations

list

List definitions from config file

-f <file>, --config <file>
 configuration file to read
-m <regex>, --match <regex>
 match sites using regular expression

list-sites

List supported site values

remove

Remove site from config

-f <file>, --config <file>
 configuration file to read

CONFIGURATION FILE

The configuration file, by default ~/.cupage.conf, is a simple INI format file, with sections defining sites to check. For example:

[spill]
url = http://www.rpcurnow.force9.co.uk/spill/index.html
select = p a
[rails]
site = vim-script
script = 1567

With the above configuration file the site named spill will be checked at http://www.rpcurnow.force9.co.uk/spill/index.html, and elements matching the CSS selector p a will be scanned for tarballs. The site named rails will be checked using the vim-script site matcher, which requires only a script value to check for updates in the scripts section of http://www.vim.org.

Various site matchers are available, see the output of cupage.py --list-sites.

BUGS

None known.

AUTHOR

Written by James Rowe

RESOURCES

Home page: http://github.com/JNRowe/cupage

COPYING

Copyright © 2009-2013 James Rowe.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Frequently Asked Questions

Ask them, and perhaps they’ll become frequent enough to be added here ;)

API documentation

Note

The documentation in this section is aimed at people wishing to contribute to cupage, and can be skipped if you are simply using the tool from the command line.

Site

Note

The documentation in this section is aimed at people wishing to contribute to cupage, and can be skipped if you are simply using the tool from the command line.

cupage.SITES = {}
Site specific configuration data
cupage.USER_AGENT = 'cupage/0.8.2 +https://github.com/JNRowe/cupage/'

User agent to use for HTTP requests

class cupage.Site(name, url, match_func='default', options=None, frequency=None, robots=True, checked=None, matches=None)[source]

Simple object for representing a web site.

check(cache=None, timeout=None, force=False, no_write=False)[source]

Check site for updates.

Parameters:
  • cache (str) – httplib2.Http cache location
  • timeout (int) – Timeout value for httplib2.Http
  • force (bool) – Ignore configured check frequency
  • no_write (bool) – Do not write to cache, useful for testing
find_default_matches(content, charset)[source]

Extract matches from content.

Parameters:
  • content (str) – Content to search
  • charset (str) – Character set for content
find_github_matches(content, charset)[source]

Extract matches from GitHub content.

Parameters:
  • content (str) – Content to search
  • charset (str) – Character set for content
find_hackage_matches(content, charset)[source]

Extract matches from hackage content.

Parameters:
  • content (str) – Content to search
  • charset (str) – Character set for content
find_sourceforge_matches(content, charset)[source]

Extract matches from sourceforge content.

Parameters:
  • content (str) – Content to search
  • charset (str) – Character set for content
static package_re(name, ext, verbose=False)[source]

Generate a compiled re for the package.

Parameters:
  • name (str) – File name to check for
  • ext (str) – File extension to check
  • verbose (bool) – Whether to enable re.VERBOSE
static parse(name, options, data)[source]

Parse data generated by Sites.loader.

Parameters:
  • name (str) – Site name from config file
  • options (dict) – Site options from config file
  • data – Stored data from database file
state

Return Site state for database storage.

class cupage.Sites[source]

Site bundle wrapper.

load(config_file, database=None)[source]

Read sites from a user’s config file and database.

Parameters:
  • config_file (str) – Config file to read
  • database (str) – Database file to read
save(database)[source]

Save Sites to the user’s database.

Parameters:database (str) – Database file to write

Examples

Reading stored configuration
>>> sites = Sites()
>>> sites.load('support/cupage.conf', 'support/cupage.db')
>>> sites[0].frequency
360000
Writing updates
>>> sites.save('support/cupage.db')

Command line

Note

The documentation in this section is aimed at people wishing to contribute to cupage, and can be skipped if you are simply using the tool from the command line.

cupage.cmdline.USAGE = '%(prog)s checks web pages and displays changes from the last run that match\na given criteria. Its original purpose was to check web pages for new software\nreleases, but it is easily configurable and can be used for other purposes.'

Command line help string, for use with argparse

cupage.cmdline.main()[source]

Main script handler.

cupage.cmdline.add(verbose, config, site, url, match_type, match, frequency, select, selector, name)[source]
cupage.cmdline.check(verbose, config, database, cache, no_write, force, timeout, pages)[source]
cupage.cmdline.list_conf(verbose, config, database, match, pages)[source]
cupage.cmdline.list_sites(verbose)[source]
cupage.cmdline.remove(verbose, config, pages)[source]

Examples

Parse command line options
>>> options, args = process_command_line()

Utilities

Note

The documentation in this section is aimed at people wishing to contribute to cupage, and can be skipped if you are simply using the tool from the command line.

class cupage.utils.CupageEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]

Custom JSON encoding for supporting datetime objects.

default(obj)[source]

Handle datetime objects when encoding as JSON.

This simply falls through to default() if obj has no isoformat method.

Parameters:obj – Object to encode
cupage.utils.json_to_datetime(obj)[source]

Parse checked datetimes from cupage databases.

See:json.JSONDecoder
Parameters:obj – Object to decode
cupage.utils.parse_timedelta(delta)[source]

Parse human readable frequency.

Parameters:delta (str) – Frequency to parse
cupage.utils.sort_packages(packages)[source]

Order package list according to version number.

Parameters:packages (list) – Packages to sort
cupage.utils.robots_test(http, url, name, user_agent='*')[source]

Check whether a given URL is blocked by robots.txt.

Parameters:
  • httphttplib2.Http object to use for requests
  • url (str) – URL to check
  • name – Site name being checked
  • user_agent (str) – User agent to check in robots.txt

The following three functions are defined for purely cosmetic reasons, as they make the calling points easier to read.

cupage.utils.success(text)[source]

Format a success message with colour, if possible.

Parameters:text (str) – Text to format
cupage.utils.fail(text)[source]

Format a failure message with colour, if possible.

Parameters:text (str) – Text to format
cupage.utils.warn(text)[source]

Format a warning message with colour, if possible.

Parameters:text (str) – Text to format

Examples

Output formatting
>>> success('well done!')
u'\x1b[38;5;10mwell done!\x1b[m\x1b(B'
>>> fail('unlucky!')
u'\x1b[38;5;9munlucky!\x1b[m\x1b(B'

Release HOWTO

Test

In the general case tests can be run via nose2:

$ nose2 -vv tests

When preparing a release it is important to check that cupage works with all currently supported Python versions, and that the documentation is correct.

Prepare release

With the tests passing, perform the following steps

  • Update the version data in cupage/_version.py
  • Update NEWS.rst, if there are any user visible changes
  • Commit the release notes and version changes
  • Create a signed tag for the release
  • Push the changes, including the new tag, to the GitHub repository

Update PyPI

Create and upload the new release tarballs to PyPI:

$ ./setup.py sdist --formats=bztar,gztar register upload --sign

Fetch the uploaded tarballs, and check for errors.

You should also perform test installations from PyPI, to check the experience cupage users will have.

Appendix

class httplib2.Http

Instance of Http from httplib2

Indices and tables