Roadmap

This is a high level list of what we are working on and what is completed.

This is not updated regularly, see the milestones instead for updated shorter and longer term roadmaps.

Legend

white_check_mark completed clock1030 In progress white_large_square Planned, not started

Work in Progress

(see Completed features below)

Package manifest and dependency parsers

License Detection

  • white_check_mark support and detect license expressions (code in https://github.com/nexB/license-expression)

  • clock1030 support and detect composite licenses

  • white_check_mark support custom licenses

  • white_large_square move licenses data set to external separate repository

  • white_check_mark Improved unknown license detection

  • white_check_mark sync with external sources (DejaCode, SPDX, etc.)

Copyrights

  • white_check_mark speed up copyright detection

  • white_check_mark improved detected lines range

  • white_check_mark streamline grammar of copyright parser

  • white_check_mark normalize holders and authors for summarizing

  • white_check_mark normalize and streamline results data format

Core features

  • white_check_mark pre scan filtering (ignore binaries, etc)

  • white_check_mark pre/post/ouput plugins! (worked as part of the GSoC by @yadsharaf )

  • white_check_mark scan plugins (e.g. plugins that run a scan to collect data)

  • white_check_mark support Python 3 #295

  • clock1030 transparent archive extraction (as opposed to on-demand with extractcode)

  • white_large_square scancode.yml configuration file for exclusions, defaults, scan failure conditions, etc.

  • white_large_square support scan pipelines and rules to organize more complex scans

  • white_check_mark scan baselining, delta scan and failure conditions (such as license change, etc) ( spawned as its the DeltaCode project)

  • white_large_square dedupe and similarities to avoid re-scanning. For now only identical files are scanned only once.

  • clock1030 Improved logging, tracing and error diagnostics

  • white_check_mark native support for ABC Data (See AboutCode Data Structure (ABCD) )

Classification, summarization and deduction

  • clock1030 File classification #426

  • white_check_mark summarize and aggregate data #377 at the top level

Source code support (some will be spawned as their own tool)

Compiled code support (will be spawned as their own tool)

Data exchange

  • white_check_mark SPDX data conversion #338

Packaging

  • white_large_square simpler installation, automated installer

  • white_check_mark distro-friendly packaging

  • clock1030 unbundle and package as multiple libaries (commoncode, extractcode, etc)

Documentation

  • white_large_square integration in a build/CI loop

  • white_large_square end to end guide to analyze a codebase

  • white_large_square hacking guides

  • white_large_square API doc when using ScanCode as a library

CI integration

  • white_large_square Plugins for CI (Jenkins, etc)

  • white_large_square Integration for CI (Travis, Appveyor, Drone, etc)

Other work in progress

Package mining and matching

(Note that this will be a separate project) Some code is in https://github.com/nexB/scancode-toolkit-contrib/

  • clock1030 exact matching

  • clock1030 attribute-based matching

  • clock1030 fuzzy matching

  • white_large_square peer-reviewed meta packages repo

  • white_large_square basic mining of package repositories

Other

  • white_large_square Crypto code detection

Completed features

Core scans

  • white_check_mark exact license detection

  • white_check_mark approximate license detection

  • white_check_mark copyright detection

  • white_check_mark file information (size, type, etc.)

  • white_check_mark URLs, emails, authors

Outputs and UI

  • white_check_mark JSON compact and pretty

  • white_check_mark plain HTML tables, also usable in a spreadsheet

  • white_check_mark fancy HTML ‘app’ with a file tree navigation, and scan results filtering, search and sorting

  • white_check_mark simple scan summary

  • white_check_mark SPDX output

Package and dependencies

  • white_check_mark common model for package data

  • white_check_mark basic support for common package format

  • white_check_mark RPM package base

  • white_check_mark NuGet package base

  • white_check_mark Python package base

  • white_check_mark PHP Composer package support with dependencies

  • white_check_mark Java Maven POM package support with dependencies

  • white_check_mark npm package support with dependencies

Speed!

  • white_check_mark accelerate license detection indexing and scanning; include caching

  • white_check_mark scan using multiple processes to speed up overall scan

  • white_check_mark cache per-file scan to disk and stream final results

Other

  • white_check_mark archive extraction with extractcode

  • white_check_mark conversion of scan results to CSV

  • white_check_mark improved error handling, verbose and diagnostic output