Option lists are two-column lists of command-line options and descriptions,
documenting a program’s options. For example:
-c, --copyright
Scan <input> for copyrights.
Sub-Options:
--consolidate
-l, --license
Scan <input> for licenses.
Sub-Options:
--license-references
--license-text
--license-text-diagnostics
--license-diagnostics
--license-url-templateTEXT
--license-scoreINT
--license-clarity-score
--consolidate
--unknown-licenses
-p, --package
Scan <input> for packages.
Sub-Options:
--consolidate
--system-package
Scan <input> for installed system package
databases.
--package-only
Scan <input> for system and application
only for package metadata, without license/
copyright detection and package assembly.
-e, --email
Scan <input> for emails.
Sub-Options:
--max-emailINT
-u, --url
Scan <input> for urls.
Sub-Options:
--max-urlINT
-i, --info
Scan for and include information such as:
Size,
Type,
Date,
Programming language,
sha1 and md5 hashes,
binary/text/archive/media/source/script flags
Additional options through more CLI options
Sub-Options:
--mark-source
Note
Unlike previous 2.x versions, -c, -l, and -p are not default. If any combination of these
options are used, ScanCode performs only that specific task, and not the others.
scancode-l scans only for licenses, and doesn’t scan for copyright/packages/general
information/emails/urls. The only notable exception: a --package scan also has
license information for package manifests and top-level packages, which are derived
regardless of --license option being used.
Note
These options, i.e. -c, -l, -p, -e, -u, and -i can be used together. As in, instead of
scancode-c-i-p, you can write scancode-cip and it will be the same.
--generated
Classify automatically generated code files with a flag.
--max-email INT
Report only up to INT emails found in a
file. Use 0 for no limit. [Default: 50]
Sub-Option of: --email
--max-url INT
Report only up to INT urls found in a
file. Use 0 for no limit. [Default: 50]
Sub-Option of: --url
--license-score INTEGER
Do not return license matches with scores lower than this score.
A number between 0 and 100. [Default: 0]
Here, a bigger number means a better match, i.e. Setting a higher license score
translates to a higher threshold (with equal or smaller number of matches).
Sub-Option of: --license
--license-text
Include the matched text for the detected licenses in the output report.
Sub-Option of: --license
Sub-Options:
--license-text-diagnostics
--license-url-template TEXT
Set the template URL used for the license reference URLs.
The --copyright option detects copyright statements in files.
It adds the following resource-level attributes:
copyrights: This is a data mapping with the following attributes: copyright
containing the whole copyright value, with start_line and end_line containing
the line numbers in the file where this copyright value was detected.
holders: This is a data mapping with the following attributes: holder
containing the whole copyright holder value, with start_line and end_line
containing the line numbers in the file where this copyright value was detected.
authors: This is a data mapping with the following attributes: author
containing the whole copyright author value, with start_line and end_line
containing the line numbers in the file where this copyright value was detected.
Example:
## Copyright (c) 2010 Patrick McHardy All rights reserved.# Authors: Patrick McHardy <kaber@trash.net>
The above lines when scanned for copyrights generates the following results for the discussed attributes:
{"copyrights":[{"copyright":"Copyright (c) 2010 Patrick McHardy","start_line":2,"end_line":2}],"holders":[{"holder":"Patrick McHardy","start_line":2,"end_line":2}],"authors":[{"author":"Patrick McHardy <kaber@trash.net>","start_line":3,"end_line":3}],}
The --license option detects various kinds of license texts, notices, tags, references
and other specialized license declarations like the SPDX license identifier in files.
It adds the following attributes to the file data:
license_detections: This has a mapping of license detection data with the license
expression, detection log and license matches. And the license matches contain the
license expression for the match, score, more details for the license detected
and the rule detected, along with the match text optionally.
license_clues: This is a list of license matches, same as matches in
license_detections. These are mere license clues and not perfect detections.
detected_license_expression: This is a scancode license expression string.
detected_license_expression_spdx: This is the SPDX version of
detected_license_expression.
percentage_of_license_text: This has a percentage number which denotes what percentage
of the resource scanned has legalese words.
Example:
License:Apache-2.0
If we run license detection (with --license-text) on the above text we get the following
result for the resource attributes added by the license detection:
The --package option detects various package manifests, lockfiles and package-like
data and then assembles codebase level packages and dependencies from these
package data detected at files. Also tags files if they are part of the packages.
It adds the following attributes to the file data:
package_data: This is a mapping of package data parsed and retrieved from
the file, with the fields for the package URL, license detections, copyrights,
dependencies, and the various URLs.
for_packages: This is a list of strings pointing to the packages that the
files is a part of. The string is basically a packageURL with an UUID as a qualifier.
It adds the following attributes to the top-level in results:
packages: This is a mapping of package data with all the atrributes
present in file level package_data with the following extra attributes:
package_uid, datafile_paths and datasource_ids.
dependencies: This is a mapping of dependency data from all the lockfiles
or package manifests in the scan.
Example:
The following scan result was generated from scanning a package manifest:
{"dependencies":[{"purl":"pkg:bower/get-size","extracted_requirement":"~1.2.2","scope":"dependencies","is_runtime":true,"is_optional":false,"is_pinned":false,"is_direct":true,"resolved_package":{},"extra_data":{},"dependency_uid":"pkg:bower/get-size?uuid=fixed-uid-done-for-testing-5642512d1758","for_package_uid":"pkg:bower/blue-leaf?uuid=fixed-uid-done-for-testing-5642512d1758","datafile_path":"bower.json","datasource_id":"bower_json"}],"packages":[{"type":"bower","namespace":null,"name":"blue-leaf","version":null,"qualifiers":{},"subpath":null,"primary_language":null,"description":"Physics-like animations for pretty particles","release_date":null,"parties":[{"type":null,"role":"author","name":"Betty Beta <bbeta@example.com>","email":null,"url":null}],"keywords":["motion","physics","particles"],"homepage_url":null,"download_url":null,"size":null,"sha1":null,"md5":null,"sha256":null,"sha512":null,"bug_tracking_url":null,"code_view_url":null,"vcs_url":null,"copyright":null,"declared_license_expression":"mit","declared_license_expression_spdx":"MIT","license_detections":[{"license_expression":"mit","matches":[{"score":100.0,"start_line":1,"end_line":1,"matched_length":1,"match_coverage":100.0,"matcher":"1-spdx-id","license_expression":"mit","rule_identifier":"spdx-license-identifier: mit","rule_url":null,"rule_relevance":100,"matched_text":"MIT"}],"identifier":"apache_2_0-ec759abc-ea5a-2a38-793e-312340e080c0"}],"other_license_expression":null,"other_license_expression_spdx":null,"other_license_detections":[],"extracted_license_statement":"MIT","notice_text":null,"source_packages":[],"extra_data":{},"repository_homepage_url":null,"repository_download_url":null,"api_data_url":null,"package_uid":"pkg:bower/blue-leaf?uuid=fixed-uid-done-for-testing-5642512d1758","datafile_paths":["bower.json"],"datasource_ids":["bower_json"],"purl":"pkg:bower/blue-leaf"}],"files":[{"path":"bower.json","type":"file","package_data":[{"type":"bower","namespace":null,"name":"blue-leaf","version":null,"qualifiers":{},"subpath":null,"primary_language":null,"description":"Physics-like animations for pretty particles","release_date":null,"parties":[{"type":null,"role":"author","name":"Betty Beta <bbeta@example.com>","email":null,"url":null}],"keywords":["motion","physics","particles"],"homepage_url":null,"download_url":null,"size":null,"sha1":null,"md5":null,"sha256":null,"sha512":null,"bug_tracking_url":null,"code_view_url":null,"vcs_url":null,"copyright":null,"declared_license_expression":"mit","declared_license_expression_spdx":"MIT","license_detections":[{"license_expression":"mit","matches":[{"score":100.0,"start_line":1,"end_line":1,"matched_length":1,"match_coverage":100.0,"matcher":"1-spdx-id","license_expression":"mit","rule_identifier":"spdx-license-identifier: mit","rule_url":null,"rule_relevance":100,"matched_text":"MIT"}],"identifier":"apache_2_0-ec759abc-ea5a-2a38-793e-312340e080c0"}],"other_license_expression":null,"other_license_expression_spdx":null,"other_license_detections":[],"extracted_license_statement":"MIT","notice_text":null,"source_packages":[],"file_references":[],"extra_data":{},"dependencies":[{"purl":"pkg:bower/get-size","extracted_requirement":"~1.2.2","scope":"dependencies","is_runtime":true,"is_optional":false,"is_pinned":false,"is_direct":true,"resolved_package":{},"extra_data":{}}],"repository_homepage_url":null,"repository_download_url":null,"api_data_url":null,"datasource_id":"bower_json","purl":"pkg:bower/blue-leaf"}],"for_packages":["pkg:bower/blue-leaf?uuid=fixed-uid-done-for-testing-5642512d1758"],"scan_errors":[]}]}
The --info option obtains miscellaneous information about the file being
scanned such as mime/filetype, checksums, programming language, and various
boolean flags.
It adds the following attributes to the file data:
date: last modified data of the file.
sha1, md5 and sha256: file checksums of various algorithms.
mime_type and file_type: basic file type and mime type/subtype
information obtained from libmagic.
programming_language: programming language based on extensions.
is_binary, is_text, is_archive, is_media, is_source,
and is_script: various boolean flags with misc. information about the file.
The --email option detects and reports email adresses present in scanned files.
It adds the emails attribute to the file data with the following attributes:
email with the actual email that was present in the file, start_line and
end_line to be able to locate where the email was detected in the file.
The --url option detects and reports URLs present in scanned files.
It adds the urls attribute to the file data with the following attributes:
url with the actual URL that was present in the file, start_line and
end_line to be able to locate where the URL was detected in the file.
The option --max-email is a sub-option of and requires the option --email.
If in the files that are scanned, in individual files, there are a lot of emails (i.e lists) which
are unnecessary and clutter the scan results, --max-email option can be used to report emails
only up to a limit in individual files.
Some important INTEGER values of the --max-emailINTEGER option:
The option --max-url is a sub-option of and requires the option --url.
If in the files that are scanned, in individual files, there are a lot of links to other websites
(i.e url lists) which are unnecessary and clutter the scan results, --max-url option can be
used to report urls only up to a limit in individual files.
Some important INTEGER values of the --max-urlINTEGER option:
The option --license-score is a sub-option of and requires the option --license.
License matching strictness, i.e. How closely matched licenses are detected in a scan, can be
modified by using this --license-score option.
Some important INTEGER values of the --license-scoreINTEGER option:
0 - Default and Lowest Value, All matches are reported.
100 - Highest Value, Only licenses with a much better match are reported
Here, a bigger number means a better match, i.e. Setting a higher license score translates to a
higher threshold for matching licenses (with equal or less number of license matches).
Here’s the license results on setting the integer value to 100, Vs. the default value 0. This is
visualized using ScanCode workbench in the License Info Dashboard.
An example matched text included in the results is as follows:
"matched_text":" This software is provided 'as-is', without any express or impliedwarranty.Innoeventwilltheauthorsbeheldliableforanydamagesarisingfromtheuseofthissoftware.Permissionisgrantedtoanyonetousethissoftwareforanypurpose,includingcommercialapplications,andtoalteritandredistributeitfreely,subjecttothefollowingrestrictions:1.Theoriginofthissoftwaremustnotbemisrepresented;youmustnotclaimthatyouwrotetheoriginalsoftware.Ifyouusethissoftwareinaproduct,anacknowledgmentintheproductdocumentationwouldbeappreciatedbutisnotrequired.2.Alteredsourceversionsmustbeplainlymarkedassuch,andmustnotbemisrepresentedasbeingtheoriginalsoftware.3.Thisnoticemaynotberemovedoralteredfromanysourcedistribution.Jean-loupGaillyMarkAdlerjloup@gzip.orgmadler@alumni.caltech.edu"
The file in which this license was detected: samples/arch/zlib.tar.gz-extract/zlib-1.2.8/zlib.h
Running a scan on the samples directory with --license-text--license-text-diagnostics options,
causes the following difference in the scan result of the file
samples/JGroups/licenses/bouncycastle.txt.
Without Diagnostics:
"matched_text":"License Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle(http://www.bouncycastle.org)Permissionisherebygranted,freeofcharge,toanypersonobtainingacopyofthissoftwareandassociateddocumentationfiles(the \"Software\"),todealintheSoftwarewithoutrestriction
The option --license-diagnostics is a sub-option of and requires the option
--license
On using the --license-diagnostics option on a license scan there is the
detection_log attribute added to license detections with diagnostics information
about the license detection post-processing steps which are used to create license
detections from license matches.
Consider the following text:
## License
All code, unless stated otherwise, is dual-licensed under
[`WTFPL`](http://www.wtfpl.net/txt/copying/) and
[`MIT`](https://opensource.org/licenses/MIT).
If we run a license scan with the --license-diagnostics option enabled,
we have the following license detection results:
{"path":"README.md","type":"file","detected_license_expression":"wtfpl-2.0 AND mit","detected_license_expression_spdx":"WTFPL AND MIT","license_detections":[{"license_expression":"wtfpl-2.0 AND mit","matches":[{"score":100.0,"start_line":43,"end_line":43,"matched_length":3,"match_coverage":100.0,"matcher":"2-aho","license_expression":"unknown-license-reference","rule_identifier":"lead-in_unknown_30.RULE","rule_relevance":100,"rule_url":"https://github.com/aboutcode-org/scancode-toolkit/tree/develop/src/licensedcode/data/rules/lead-in_unknown_30.RULE","matched_text":"dual-licensed under [`},{"score":50.0,"start_line":43,"end_line":43,"matched_length":1,"match_coverage":100.0,"matcher":"2-aho","license_expression":"wtfpl-2.0","rule_identifier":"spdx_license_id_wtfpl_for_wtfpl-2.0.RULE","rule_relevance":50,"rule_url":"https://github.com/aboutcode-org/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_wtfpl_for_wtfpl-2.0.RULE","matched_text":"WTFPL"},{"score":100.0,"start_line":43,"end_line":43,"matched_length":3,"match_coverage":100.0,"matcher":"2-aho","license_expression":"wtfpl-2.0","rule_identifier":"wtfpl-2.0_27.RULE","rule_relevance":100,"rule_url":"https://github.com/aboutcode-org/scancode-toolkit/tree/develop/src/licensedcode/data/rules/wtfpl-2.0_27.RULE","matched_text":"www.wtfpl.net/"},{"score":100.0,"start_line":43,"end_line":43,"matched_length":6,"match_coverage":100.0,"matcher":"2-aho","license_expression":"mit","rule_identifier":"mit_64.RULE","rule_relevance":100,"rule_url":"https://github.com/aboutcode-org/scancode-toolkit/tree/develop/src/licensedcode/data/rules/mit_64.RULE","matched_text":"MIT`](https://opensource.org/licenses/MIT)."}],"detection_log":["unknown-intro-followed-by-match"],"identifier":"wtfpl_2_0_and_mit-e5642b07-705c-9730-80ab-f5ed0565be28"}],"license_clues":[],"percentage_of_license_text":8.18,"scan_errors":[]}
Here from the "detection_log":["unknown-intro-followed-by-match"] added diagnostics
information we learn that there was an unknown intro license match, followed by
proper detections, so we conclude the unknown intro to be an introduction to the
following license and hence conclude the license from the license matches after the
unknown detection.