🐜 Examples

📣 Overview

The examples directory in the pochi directory contains the following example programs. To write groovy script for the pochi, see the API document of pochi, especially, BirthmarkSystemHelper.

The groovy script of pochi defines the two variables as follows.

Also, we should know the birthmarking flow in the pochi.

Example programs

1️⃣ printing_args.groovy

Printing args to the stdout.

🍇 Source file

args.each(arg -> println(arg))

🍷 Output

$ pochi examples/printing_args.groovy a b c d e f
a
b
c
d
e
f

2️⃣ printing_pochi_info.groovy

printing information of pochi.

🍇 Source file

println("========== birthmark extractor names ==========")
println(pochi.extractorNames())

println("========== birthmark comparator names ==========")
println(pochi.comparatorNames())

println("========== pair matcher names ==========")
println(pochi.matcherNames())

config = pochi.config()
println("========== system library rules ==========")
println(config.rules().toArray())

println("========== properties ==========")
config.propertyStream()
     .each(item -> println(item))

🍷 Output

$ pochi examples/printing_pochi_info.groovy
========== birthmark extractor names ==========
[1-gram, 2-gram, 3-gram, 4-gram, 5-gram, 6-gram, fuc, uc, vom]
========== birthmark comparator names ==========
[Cosine, DiceIndex, EditDistance, JaccardIndex, SimpsonIndex]
========== pair matcher names ==========
[Guessed, RoundRobin, RoundRobinWithSamePair, Specified]
========== system library rules ==========
[PREFIX,java., PREFIX,javax., PREFIX,org.omg., PREFIX,org.ietf., PREFIX,org.w3c., PREFIX,org.xml.sax., PREFIX,org.apache.]
========== properties ==========
$ pochi examples/printing_pochi_info.groovy -C examples/sample_config.json
========== birthmark extractor names ==========
[1-gram, 2-gram, 3-gram, 4-gram, 5-gram, 6-gram, fuc, uc, vom]
========== birthmark comparator names ==========
[Cosine, DiceIndex, EditDistance, JaccardIndex, SimpsonIndex]
========== pair matcher names ==========
[Guessed, RoundRobin, RoundRobinWithSamePair, Specified]
========== system library rules ==========
[PREFIX,java., PREFIX,javax., PREFIX,org.omg., PREFIX,org.ietf., PREFIX,org.w3c., PREFIX,org.xml.sax., PREFIX,org.apache.]
========== properties ==========
key1: value1 # <--- sample_config.json adds this property key, and value pair.

3️⃣ extracting_birthmarks.groovy

Extracting specified type of birthmarks from given jar files.

🍇 Source file

// extract UC birthmarks from given jar files.

// extract birthmarks by given extractor from given file path.
def extract(path, extractor) {
    printf("extract uc birthmark from %s%n", path)
    // obtains DataSource object for extracting birthmarks.
    source = pochi.source(path)
    // returns Birthmarks object contains Birthmark objects.
    return extractor.extract(source)
}

// prints each Birthmark object.
def printBirthmarks(birthmarks) {
    birthmarks.each(b -> println(b))
}

// gets UC birthmark extractor.
extractor = pochi.extractor("uc")

Arrays.stream(args)
    .map(file -> extract(file, extractor)) // converts to Birthmarks
    .each(birthmarks -> printBirthmarks(birthmarks)) // print given birthmarks.

🍷 Output

$ pochi examples/extracting_birthmarks.groovy lib/vavr-match-0.10.3.jar
extract uc birthmark from lib/vavr-match-0.10.3.jar
io.vavr.match.annotation.Unapply,jar:file:///POCHI_HOME/lib/vavr-match-0.10.3.jar!/io/vavr/match/annotation/Unapply.class,uc,java.lang.Object,java.lang.annotation.Annotation
io.vavr.match.annotation.Patterns,jar:file:///POCHI_HOME/lib/vavr-match-0.10.3.jar!/io/vavr/match/annotation/Patterns.class,uc,java.lang.Object,java.lang.annotation.Annotation

4️⃣ filtering_source.groovy

pochi reads files as DataSource type variable. Then, we can apply the filter on DataSource type variable for ignoring the classes with the certain name.

🍇 Source file

def contains(entry) {
    return entry.isClass()
        && !entry.className().contains("pochi")
}

def filter(source) {
    // filtering DataSource entry by contains method.
    return source.filter(entry -> contains(entry))
}

def printSource(source) {
    source.stream()
        .each(entry -> println(entry.loadFrom()))
}

Arrays.stream(args)
    .map(arg -> pochi.source(arg))
    .map(source -> filter(source))
    .each(source -> printSource(source))

🍷 Output

$ pochi examples/filtering_source.groovy lib/pochi-core-2.5.0.jar
# Only the following line shows, because all of class contains `pochi` in the package name.
jar:file://POCHI_HOME/lib/pochi-core-2.5.0.jar!/module-info.class
$ pochi examples/filtering_source.groovy lib/vavr-match-0.10.3.jar # prints all classes included in the given jar.
jar:file://POCHI_HOME/lib/vavr-match-0.10.3.jar!/io/vavr/match/annotation/Unapply.class
jar:file://POCHI_HOME/lib/vavr-match-0.10.3.jar!/io/vavr/match/annotation/Patterns.class

5️⃣ comparing_birthmarks.groovy

This script calculates similarities among extracted birthmarks. Also, the script can filter by the similarity value.

🍇 Source file

import jp.cafebabe.birthmarks.comparators.Threshold;
import jp.cafebabe.birthmarks.entities.Birthmarks;

// extract birthmarks by given extractor from given file path.
def extract(path, extractor) {
    source = pochi.source(path)
    return extractor.extract(source)
}

// gets UC birthmark extractor.
extractor = pochi.extractor("uc")

birthmarks = Arrays.stream(args)
    .map(file -> extract(file, extractor))              // converts to Birthmarks
    .reduce(new Birthmarks(), (b1, b2) -> b1.merge(b2)) // merges to one Birthmarks object
// above script is the same as [3️⃣ extracting_birthmarks.groovy](#3-extracting_birthmarksgroovy).

comparator = pochi.comparator("JaccardIndex")
matcher = pochi.matcher("RoundRobin")

// default threshold (0.75)
threshold = Threshold.DEFAULT

matcher.match(birthmarks)
    .map(pair -> comparator.compare(pair))
    .filter(either -> either.isRight()) // ignoring the error.
    .map(either -> either.get())
    .filter(comparison -> comparison.isStolen(threshold)) // similarity > 0.75
    .forEach(comparison -> println(comparison))

🍷 Output

$ pochi examples/comparing_birthmarks.groovy distribution/target/lib/kunai2-1.0.0.jar
jp.cafebabe.pochi.kunai.sink.WarFileDataSink,jp.cafebabe.pochi.kunai.source.WarFileDataSource,1.0
jp.cafebabe.pochi.kunai.sink.WarFileDataSink,jp.cafebabe.pochi.kunai.source.factories.WarFileDataSourceFactory,0.75
jp.cafebabe.pochi.kunai.source.PlainFileDataSource,jp.cafebabe.pochi.kunai.source.ClassFileDataSource,1.0
jp.cafebabe.pochi.kunai.source.WarFileDataSource,jp.cafebabe.pochi.kunai.source.factories.WarFileDataSourceFactory,0.75
jp.cafebabe.pochi.kunai.source.factories.ClassFileDataSourceFactory,jp.cafebabe.pochi.kunai.source.factories.WarFileDataSourceFactory,0.8
jp.cafebabe.pochi.kunai.source.factories.ClassFileDataSourceFactory,jp.cafebabe.pochi.kunai.source.factories.PlainFileDataSourceFactory,0.8
jp.cafebabe.pochi.kunai.source.factories.ClassFileDataSourceFactory,jp.cafebabe.pochi.kunai.source.factories.DirectoryDataSourceFactory,0.8333333333333334
jp.cafebabe.pochi.kunai.source.factories.UnsupportedDataSourceException,jp.cafebabe.pochi.kunai.entries.ClassName,1.0
jp.cafebabe.pochi.kunai.entries.Entry,jp.cafebabe.pochi.kunai.entries.PathEntry,1.0

6️⃣ comparing_birthmarks_with_specified_pair_matcher.groovy

Comparing the birthmarks with the specified pairs. The pairs are specified by the given csv file, The pair list in the example is shown in example/sample_matching.csv.

🍇 Source file

// compare extracted birthmarks from given jar files with the specified_pair_matcher.

import jp.cafebabe.birthmarks.comparators.Threshold;
import jp.cafebabe.birthmarks.entities.Birthmarks;

// extract birthmarks by given extractor from given file path.
def extract(path, extractor) {
    source = pochi.source(path)
    return extractor.extract(source)
}

// gets UC birthmark extractor.
extractor = pochi.extractor("uc")

birthmarks = Arrays.stream(args)
    .map(file -> extract(file, extractor))              // converts to Birthmarks
    .reduce(new Birthmarks(), (b1, b2) -> b1.merge(b2)) // merges to one Birthmarks object

pochi.config().put("pair.list", "examples/sample_matching.csv") // specified pairs by csv file with the key "pair.list".
comparator = pochi.comparator("JaccardIndex")
matcher = pochi.matcher("Specified")

// default threshold (0.75)
threshold = Threshold.DEFAULT

matcher.match(birthmarks)
    .map(pair -> comparator.compare(pair))
    .filter(either -> either.isRight())
    .map(either -> either.get())
    // .filter(comparison -> comparison.isStolen(threshold)) //  no filtering
    .forEach(comparison -> println(comparison))

🍷 Output

$ pochi examples/comparing_birthmarks_with_specified_pair_matcher.groovy lib/kunai2-2.5.0.jar
jp.cafebabe.kunai.sink.WarFileDataSink,jp.cafebabe.kunai.source.factories.WarFileDataSourceFactory,0.75
jp.cafebabe.kunai.source.WarFileDataSource,jp.cafebabe.kunai.source.factories.WarFileDataSourceFactory,0.75
jp.cafebabe.kunai.source.factories.ClassFileDataSourceFactory,jp.cafebabe.kunai.source.factories.WarFileDataSourceFactory,0.8
jp.cafebabe.kunai.source.factories.ClassFileDataSourceFactory,jp.cafebabe.kunai.source.factories.PlainFileDataSourceFactory,0.8
jp.cafebabe.kunai.source.factories.ClassFileDataSourceFactory,jp.cafebabe.kunai.source.factories.DirectoryDataSourceFactory,1.0

7️⃣ registering_new_extractor.groovy

This sample registers the new birthmark extractor (7-gram based birthmark extractor).

🍇 Source file

import jp.cafebabe.pochi.birthmarks.kgram.KGramBasedExtractorBuilder

println("========= extractor ==========")
println(pochi.extractorNames())

// create 7-gram birthmark extractor.
sevenGram = new KGramBasedExtractorBuilder(7);
pochi.register(sevenGram)

println(pochi.extractorNames())

🍷 Output

========= extractor ==========
[6-gram, 4-gram, 2-gram, 5-gram, uc, 1-gram, 3-gram]         # <- before registration
[6-gram, 4-gram, 2-gram, 5-gram, uc, 1-gram, 7-gram, 3-gram] # <- after registration