Google Summer of Code 2017 - Final report
Project: Plugin architecture for ScanCode
Yash D. Saraf yashdsaraf@gmail.com
This project’s purpose was to create a decoupled plugin architecture for ScanCode such that it can handle plugins at different stages of a scan and can be coupled at runtime. These stages were,
1. Format :
In this stage, the plugins are supposed to run after the scanning is done and post-scan
plugins are called. These plugins could be used for:
converting the scanned output to the given format (say csv, json, etc.)
HOWTO
Here, a plugin needs to add an entry in the scancode_output_writers
entry point in the following
format : '<format> = <module>:<function>'
.
<format>
is the format name which will be used as the command line option name (e.gcsv
orjson
).<module>
is a python module which implements theoutput
hook specification.<function>
is the function to which the scan output will be passed if this plugin is called.
The <format>
name will be automatically added to the --format
command line option and
(if called) the scanned data will be passed to the plugin.
2. Post-scan :
In this stage, the plugins are supposed to run after the scanning is done. Some uses for these plugins were:
summarization of scan outputs
e.g A post-scan plugin for marking
is_source
to true for directories with ~90% of source files.simplification of scan outputs
e.g The
--only-findings
option to return files or directories with findings for the requested scans. Files and directories without findings are omitted (not considering basic file information as findings)).
This option already existed, I just ported it to a post-scan plugin.
HOWTO
Here, a plugin needs to add an entry in the scancode_post_scan
entry point in the following
format '<name> = <module>:<function>'
<name>
is the command line option name (e.g only-findings).<module>
is a python module which implements thepost_scan
hook specification.<function>
is the function to which the scanned files will be passed if this plugin is called
The command line option for this plugin will be automatically created using the <function>
‘s
doctring as its help text and (if called) the scanned files will be passed to the plugin.
3. Pre-scan :
In this stage, the plugins are supposed to run before the scan starts. So the potential uses for these types of plugins were to:
ignore files based on a given pattern (glob)
ignore files based on their info i.e size, type etc.
extract archives before scanning
HOWTO
Here, a plugin needs to add an entry in the scancode_pre_scan
entry point in the following
format : '<name> = <module>:<class>'
<name>
is the command line option name (e.g ignore ).<module>
is a python module which implements thepre_scan
hook specification.<class>
is the class which is instantiated and its appropriate method is invoked if this plugin is called. This needs to extend theplugincode.pre_scan.PreScanPlugin
class.
The command line option for this plugin will be automatically created using the <class>
‘s
doctring as its help text. Since there isn’t a single spot where pre-scan
plugins can be
plugged in, more methods to PreScanPlugin
class can be added which can represent different
hooks, say to add or delete a scan there might be a method called process_scan
.
If a plugin’s option is passed by the user, then the <class>
is instantiated with the user
input and its appropriate aforementioned methods are called.
4. Scan (proper):
In this stage, the plugins are supposed to run before the scan starts and after the
pre-scan
plugins are called. These plugins would have been used for
adding or deleting scans
adding dependency scans (whose data could be used in other scans)
No development has been done for this stage, but it will be quite similar to pre-scan
.
5. Other work:
Here, the goal was to add command line options to pre-defined groups such that they are displayed
in their respective groups when scancode -h
or scancode --help
is called. This helped to
better visually represent the command line options and determine more easily what context they
belong to.
Add a Resource class to hold all scanned info
* Ongoing
*
Here, the goal was to create a Resource
class, such that it holds all the scanned data for a
resource (i.e a file or a directory). This class would go on to eventually encapsulate the caching
logic entirely. For now, it just holds the info
and path
of a resource.
6. What’s left?
Pre-scan plugin for archive extractions
Scan (proper) plugins
More complex post-scan plugins
Support plugins written in languages other than python
Additionally, all my commits can be found here.