1. Track file inputs and outputs

Track file inputs and outputs 

Many sbt tasks depend on a collection of files. For example, the package task generates a jar file containing the resources and class files, which are generated by the compile task, for a project. Staring with version 1.3.0, sbt provides a file management system that tracks the inputs and outputs of any task. The task can query which of its file dependencies have changed since the task last completed allowing it to incrementally re-build only the modified files. This system integrates with Triggered execution so that the file dependencies of a task are automatically monitored in a continuous build.

To best illustrate the file tracking system, we construct a build.sbt that illustrates all of the essential features. The example will be a project that is able to build a shared library in c using gcc. This will be done with two tasks: buildObjects, which compiles c source files to object files, and linkLibrary, which links the object files into a shared library. These can be defined with:

import java.nio.file.Path
val buildObjects = taskKey[Seq[Path]]("Compiles c files into object files.")
val linkLibrary = taskKey[Path]("Links objects into a shared library.")

The buildObjects task will depend on *.c source file inputs. The linkLibrary task depends on the output *.o object files generated by buildObjects. This creates a build pipeline: if none of the input sources to buildObjects are modified between calls to linkLibrary then neither compilation nor linking should occur. Conversely, when input source changes are detected, sbt should both generate new object files corresponding to the modified source files and link the shared library.

File inputs 

It is natural for a task to specify the inputs on which it depends. These are set with the fileInputs key, which has type: Seq[Glob] (see Globs). The fileInputs are specified as Seq[Glob] so that more than one search query may be provided, which may be necessary if sources are located in multiple directories or different file types are needed within the same task.

When the fileInputs key is set in a given scope, sbt automatically generates a task named allInputFiles for that scope that returns a Seq[Path] containing all of the files matching the fileInputs queries. For convenience, there is an extension method defined for Task[_] that translates foo.inputFiles to (foo / allInputFiles).value. We can use these to write a simple implementation of buildObjects:

import scala.sys.process._
import java.nio.file.{ Files, Path }
import sbt.nio._
import sbt.nio.Keys._

val buildObjects = taskKey[Seq[Path]]("Compiles c files into object files.")
buildObjects / fileInputs += baseDirectory.value.toGlob / "src" / "*.c"
buildObjects := {
  val outputDir = Files.createDirectories(streams.value.cacheDirectory.toPath)
  def outputPath(path: Path): Path =
    outputDir / path.getFileName.toString.replaceAll(".c$", ".o")
  val logger = streams.value.log
  buildObjects.inputFiles.map { path =>
    val output = outputPath(path)
    logger.info(s"Compiling $path to $output")
    Seq("gcc", "-c", path.toString, "-o", output.toString).!!
    output
  }
}

This implementation will gather all of the files ending with the *.c extension and shell out to gcc to compile them to the output directory.

sbt will automatically monitor any file matched by the globs specified by fileInputs. In this case, modifying any file with *.c extension in the src directory will trigger a build in a continuous build.

Incremental builds 

Every time that buildObjects is invoked from the sbt shell, it will re-compile all of the source files. This becomes expensive as the number of source files increases. In addition to fileInputs, sbt also provides another api, inputFileChanges, that provides information about what source files have changed since the last time the task successfully completed. Using the inputFileChanges, we can make the build above incremental:

import scala.sys.process._
import java.nio.file.{ Files, Path }
import sbt.nio._
import sbt.nio.Keys._

val buildObjects = taskKey[Seq[Path]]("Generate object files from c sources")
buildObjects / fileInputs += baseDirectory.value.toGlob / "src" / "*.c"
buildObjects := {
  val outputDir = Files.createDirectories(streams.value.cacheDirectory.toPath)
  val logger = streams.value.log
  def outputPath(path: Path): Path =
    outputDir / path.getFileName.toString.replaceAll(".c$", ".o")
  def compile(path: Path): Path = {
    val output = outputPath(path)
    logger.info(s"Compiling $path to $output")
    Seq("gcc", "-fPIC", "-std=gnu99", "-c", s"$path", "-o", s"$output").!!
    output
  }
  val sourceMap = buildObjects.inputFiles.view.map(p => outputPath(p) -> p).toMap
  val existingTargets = fileTreeView.value.list(outputDir.toGlob / **).flatMap { case (p, _) =>
    if (!sourceMap.contains(p)) {
      Files.deleteIfExists(p)
      None
    } else {
      Some(p)
    }
  }.toSet
  val changes = buildObjects.inputFileChanges
  val updatedPaths = (changes.created ++ changes.modified).toSet
  val needCompile = updatedPaths ++ sourceMap.filterKeys(!existingTargets(_)).values
  needCompile.foreach(compile)
  sourceMap.keys.toVector
}

The FileChangeReport makes it possible to write an incremental task without manually tracking the input files. It is a sealed trait implemented by three case classes:

  1. Changes — indicates that one or more source files have been modified.
  2. Unmodified — none of the source file have been modified since the last run.
  3. Fresh — there is no cache entry for the previous source file hashes.

It is sometimes convenient to pattern match on the result of the inputFileChanges:

foo.inputFileChanges match {
  case FileChanges(created, deleted, modified, unmodified)
    if created.nonEmpty || modified.nonEmpty =>
      build(created ++ modified)
      delete(deleted)
  case _ => // no changes
}

The input file report says nothing about the outputs. This is why the buildObjects implementation needs to check the target directory to see which outputs exist. In that example, there is a 1:1 mapping between inputs and outputs, but this need not be the case in general. An implementation of buildObjects may include header files in the fileInputs. These are not compiled themselves, but they may trigger re-compilation of one or more *.c source files.

Note that calling buildObjects.inputFileChanges also causes buildObjects / fileInputs to automatically be watched in a continuous build.

File outputs 

The outputs of a file are often best specified as the result of a task. In the example above, buildObjects is a Task returning a Seq[Path] containing the object files generated by compilation. sbt will automatically track the outputs of any task that returns one of the following result types: Path, Seq[Path], File or Seq[File]. We can use this to build on the buildObjects example to write a task that links the object into a shared library:

val linkLibrary = taskKey[Path]("Links objects into a shared library.")
linkLibrary := {
  val outputDir = Files.createDirectories(streams.value.cacheDirectory.toPath)
  val logger = streams.value.log
  val isMac = scala.util.Properties.isMac
  val library = outputDir / s"mylib.${if (isMac) "dylib" else "so"}"
  val linkOpts = if (isMac) Seq("-dynamiclib") else Seq("-shared", "-fPIC")
  if (buildObjects.outputFileChanges.hasChanges || !Files.exists(library)) {
    logger.info(s"Linking $library")
    (Seq("gcc") ++ linkOpts ++ Seq("-o", s"$library") ++
      buildObjects.outputFiles.map(_.toString)).!!
  } else {
    logger.debug(s"Skipping linking of $library")
  }
  library
}

Here the tracking was simpler because linking a shared library is not incremental. Thus we have to rebuild if any of the outputs of buildObjects has changed or if the library doesn’t exist.

Similar to fileInputs, there is a fileOutputs key. This can be used as an alternative to returning the output files in the task when the outputs have a known pattern. For example, buildObjects could have been defined as:

val buildObjects = taskKey[Unit]("Compiles c files into object files.")
buildObjects / fileOutputs := target.value / "objects" / ** / "*.o"

This can be useful when using an opaque external tool where the mapping of inputs to outputs is not known.

Like allInputFiles, there is an allOutputFiles task of return type Seq[Path] that is automatically generated for a task, foo, if the return type of foo is one of Seq[Path], Path, Seq[File] or File. It is also generated if foo / outputFiles is specified. When both fileOutputs is specified and the return type represents a file or collection of files, the result of allOutputFiles is the distinct union of the files returned by the task and the files described by ouputFiles. Calling foo.outputFiles is syntactic sugar for (foo / allOutputFiles).value.

Filters 

The fileInputs and fileOutputs can be filtered beyond what is specified by their Glob patterns. sbt provides four settings of type sbt.nio.file.PathFilter: 1. fileInputIncludeFilter — only include file inputs that also match this filter 2. fileInputExcludeFilter— exclude any file inputs that also match this filter 3. fileOutputIncludeFilter — only include file inputs that also match this filter 4. fileOutputExcludeFilter — exclude any file output that also match this filter

By default, sbt sets `scala fileInputExcludeFilter := HiddenFileFilter.toNio || DirectoryFilter Both fileInputIncludeFilter and fileInputOutputFilter are set to AllPassFilter.toNio. The fileOutputExcludeFilter is set to NothingFilter.toNio`.

To exclude files matching with test in the name from buildObjects, write:

buildObjects / fileInputExcludeFilter := "*test*"

To preserve the previous excludes of hidden files and directories, write:

buildObjects / fileInputExcludeFilter :=
  (buildObjects / fileInputExcludeFilter).value || "*test*"

or

buildObjects / fileInputExcludeFilter ~= { ef => ef || "*test*" }

In most cases, it shouldn’t be necessary to set the fileInputIncludeFilter since the path name filtering it should be handled by fileInputs itself. It also shouldn’t commonly be necessary to filter the outputs.

Cleaning outputs 

sbt automatically generates an implementation of clean scoped to the task foo whenever it also generates the allOutputFiles task. Calling foo / clean will remove all of the files previously generated by foo. It will not re-evaluate foo. For example, calling buildObjects / clean will remove all of the object files generated by the previous call to buildObjects. The generated clean tasks are not transitive. Calling linkLibrary / clean will delete the shared library but will not delete the object files generated by buildObjects.

File change tracking 

For each input or output file tracked by sbt, there is an associated FileStamp. This can either be the last modified time of the file or a hash. By default, inputs are tracked using the hash and outputs are tracked using the last modified time. To change this, set the inputFileStamper or outputFileStamper:

val generateSources = taskKey[Seq[Path]]("Generates source files from json schema.")
generateSources / fileInputs := baseDirectory.value.toGlob / "schema" / ** / "*.json"
generateSources / outputFileStamper := FileStamper.Hash

Continuous build file monitoring 

In a continuous build, ~bar, for an arbitrary task, bar, given some task, foo, any calls to foo.inputFiles and foo.inputFileChanges within bar will cause all of the globs specified by foo / fileInputs to be monitored in a continuous build. Transitive file input dependencies are automatically monitored. For example, the ~linkLibrary continuous build command will monitor the *.c source files defined for buildObjects.

Input files will only trigger a re-build if their hash has changed. This behavior can be overridden with:

Global / watchForceTriggerOnAnyChange := true

Changes to file outputs, which are gathered with either foo.outputFiles or foo.outputFileChanges, do not trigger a re-build.

Partial pipeline evaluation / error handling 

The stamps for each file are tracked on a per-task basis. They are only updated if the incremental task itself succeeds. In the example above, this means that the current file last modified times for buildObjects are stored by the linkLibrary task only when it succeeds. This means that buildObjects can be run many times between calls to linkLibrary and linkLibrary will see the cumulative changes to the outputs of buildObjects.

If linkLibrary fails to complete, sbt will also skip updating the last modified times for the outputs of buildObjects corresponding to linkLibrary because it is impossible to know in general which files were successfully processed.