Add A Post-Scan Plugin

Scan plugins in scancode-toolkit

A lot of scancode features are built-in plugins which are present with scancode-toolkit source code and are usually enabled via the different scancode-toolkit CLI options and are grouped by the types of plugins.

Here are the major types of plugins:

  1. Pre-scan plugins (scancode_pre_scan in entry points)

    These plugins are run before the main scanning steps and are usually filtering of input files, or file classification steps, on whose results the main scan plugins depend on. The base plugin class to be extended is PreScanPlugin at /src/plugincode/pre_scan.py.

  2. Scan plugins (scancode_scan in entry points)

    The are the scancode plugins which does the file scanning for useful information like license, copyrights, packages and others. These are run on multiprocessing for speed as they are done on a per-file basis, but there can also be post-processing steps on these which are run afterwards and have access to all the per-file scan results. The base plugin class to be extended is ScanPlugin at /src/plugincode/scan.py.

  3. Post-scan plugins (scancode_post_scan in entry points)

    These are mainly data processing, summerizing and reporting plugins which depend on all the results for the scan plugins. These add new codebase level or file-level attributes, and even removes/modifies data as required for consolidation or summarization. The base plugin class to be extended is PostScanPlugin at /src/plugincode/post_scan.py.

  4. Output plugins (scancode_output in entry points)

    Supported output options in scancode-toolkit are all plugins and these can also be multiple output options selected. These convert, process and writes the data in the specific file format as the output of the scanning procedures. The base plugin class to be extended is OutputPlugin at /src/plugincode/output.py.

  5. Output Filter Plugins (scancode_output_filter in entry points)

    There are also output filter plugins which apply filters to the outputs and is modified. These filters can be based on whether resources had any detections, ignorables present in licenses and others. The base plugin class to be extended is OutputFilterPlugin at /src/plugincode/output_filter.py.

  6. Location Provider Plugins

    These plugins provide pre-built binary libraries and utilities and their locations which are packaged to be used in scancode-toolkit. The base plugin class to be extended is LocationProviderPlugin at /src/plugincode/location_provider.py.

Built-In vs. Optional Installation

Built-In

Some post-scan plugins are installed when ScanCode itself is installed, and they are specified at [options.entry_points] in the setup.cfg file. For example, the License Policy Plugin is a built-in plugin, whose code is located here:

https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/plugin_license_policy.py

These plugins do not require any additional installation steps and can be used as soon as ScanCode is up and running.

Optional

ScanCode is also designed to use post-scan plugins that must be installed separately from the installation of ScanCode. The code for this sort of plugin is located here:

https://github.com/nexB/scancode-plugins

This wiki page will focus on optional post-scan plugins.

Example Post-Scan Plugin: Hello ScanCode

To illustrate the creation of a simple post-scan plugin, we’ll create a hypothetical plugin named Hello ScanCode, which will print Hello ScanCode! in your terminal after you’ve run a scan. Your command will look like something like this:

scancode -i -n 2 <path to target codebase> --hello --json <path to JSON output file>

We’ll start by creating three folders:

  1. Top-level folder – /scancode-hello/

  2. 2nd-level folder – /src/

  3. 3rd-level folder – /hello_scancode/

1. Top-level folder – /scancode-hello/

  • In the scancode-plugins repository, in the misc directory, add a folder with a relevant name, e.g., scancode-hello. This folder will hold all of your plugin code.

  • Inside the /scancode-hello/ folder you’ll need to add a folder named src and 7 files. /src/ – This folder will contain your primary Python code and is discussed in more detail in the following section.

The 7 Files are:

  1. .gitignore – See, e.g., /scancode-ignore-binaries/.gitignore

/build/
/dist/
  1. apache-2.0.LICENSE – See, e.g., /scancode-ignore-binaries/apache-2.0.LICENSE

  2. MANIFEST.in

graft src

include setup.py
include setup.cfg
include .gitignore
include README.md
include MANIFEST.in
include NOTICE
include apache-2.0.LICENSE

global-exclude *.py[co] __pycache__ *.*~
  1. NOTICE – See, e.g., /scancode-ignore-binaries/NOTICE

  2. README.md

  3. setup.cfg

[metadata]
license_file = NOTICE

[bdist_wheel]
universal = 1

[aliases]
release = clean --all  bdist_wheel
  1. setup.py – This is an example of what our setup.py file would look like:

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

from __future__ import absolute_import
from __future__ import print_function

from glob import glob
from os.path import basename
from os.path import join
from os.path import splitext

from setuptools import find_packages
from setuptools import setup


desc = '''A ScanCode post-scan plugin to to illustrate the creation of a simple post-scan plugin.'''

setup(
    name='scancode-hello',
    version='1.0.0',
    license='Apache-2.0 with ScanCode acknowledgment',
    description=desc,
    long_description=desc,
    author='nexB',
    author_email='info@aboutcode.org',
    url='https://github.com/nexB/scancode-plugins/blob/main/misc/scancode-hello/',
    packages=find_packages('src'),
    package_dir={'': 'src'},
    py_modules=[splitext(basename(path))[0] for path in glob('src/*.py')],
    include_package_data=True,
    zip_safe=False,
    classifiers=[
        # complete classifier list: http://pypi.python.org/pypi?%3Aaction=list_classifiers
        'Development Status :: 4 - Beta',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: Apache Software License',
        'Programming Language :: Python',
        'Programming Language :: Python :: 3',
        'Topic :: Utilities',
    ],
    keywords=[
        'scancode', 'plugin', 'post-scan'
    ],
    install_requires=[
        'scancode-toolkit',
    ],
    entry_points={
        'scancode_post_scan': [
            'hello = hello_scancode.hello_scancode:SayHello',
        ],
    }
)

2. 2nd-level folder – /src/

  1. Add an __init__.py file inside the src folder. This file can be empty, and is used to indicate that the folder should be treated as a Python package directory.

  2. Add a folder that will contain our primary code – we’ll name the folder hello_scancode. If you look at the example of the setup.py file above, you’ll see this line in the entry_points section:

'hello = hello_scancode.hello_scancode:SayHello',
  • hello refers to the name of the command flag.

  • The first hello_scancode is the name of the folder we just created.

  • The second hello_scancode is the name of the .py file containing our code (discussed in the next section).

  • SayHello is the name of the PostScanPlugin class we create in that file (see sample code below).

3. 3rd-level folder – /hello_scancode/

  1. Add an __init__.py file inside the hello_scancode folder. As noted above, this file can be empty.

  2. Add a hello_scancode.py file.

Imports

from plugincode.post_scan import PostScanPlugin
from plugincode.post_scan import post_scan_impl
from scancode import CommandLineOption
from scancode import POST_SCAN_GROUP

Create a PostScanPlugin class

The PostScanPlugin class PostScanPlugin code) inherits from the CodebasePlugin class (see CodebasePlugin code), which inherits from the BasePlugin class (see BasePlugin code).

@post_scan_impl
class SayHello(PostScanPlugin):
    """
    Illustrate a simple "Hello World" post-scan plugin.
    """

    options = [
        CommandLineOption(('--hello',),
        is_flag=True, default=False,
        help='Generate a simple "Hello ScanCode" greeting in the terminal.',
        help_group=POST_SCAN_GROUP)
    ]

    def is_enabled(self, hello, **kwargs):
        return hello

    def process_codebase(self, codebase, hello, **kwargs):
        """
        Say hello.
        """
        if not self.is_enabled(hello):
            return

        print('Hello ScanCode!!')

Load the plugin

  • To load and use the plugin in the normal course, navigate to the plugin’s root folder (in this example: /plugins/scancode-hello/) and run pip install . (don’t forget the final .).

  • If you’re developing and want to test your work, save your edits and run pip install -e . from the same folder.

More-complex examples

This Hello ScanCode example is quite simple. For examples of more-complex structures and functionalities you can take a look at the other post-scan plugins for guidance and ideas.

One good example is the License Policy post-scan plugin. This plugin is installed when ScanCode is installed and consequently is not located in the /plugins/ directory used for manually-installed post-scan plugins. The code for the License Policy plugin can be found at /scancode-toolkit/src/licensedcode/plugin_license_policy.py and illustrates how a plugin can be used to analyze the results of a ScanCode scan using external data files and add the results of that analysis as a new field in the ScanCode JSON output file.