diff --git a/README.md b/README.md index 5d894467..55ae8b8a 100644 --- a/README.md +++ b/README.md @@ -7,18 +7,33 @@ [![unittest][action-img]][action-url] [![codecov][codecov-img]][codecov-url] -A **fast** Julia library for increasing the number of training -images by applying various transformations. - -Augmentor is a real-time image augmentation library designed to -render the process of artificial dataset enlargement more -convenient, less error prone, and easier to reproduce. It offers -the user the ability to build a *stochastic image-processing -pipeline* -- which we will also refer to as *augmentation -pipeline* -- using image operations as building blocks. For our -purposes, an augmentation pipeline can be understood as a -sequence of operations for which the parameters can (but need -not) be random variables. +**Augmentor.jl** is a *fast* Julia library designed to make the process of +image augmentation more convenient, less error-prone, and easier to reproduce. +It offers a simple way to build flexible **augmentation pipelines**. For our +purposes, an augmentation pipeline can be understood as a sequence of +operations for which the parameters can (but need not) be random variables. + +When augmenting, Augmentor.jl uses multiple heuristics to generate efficient +tailor-made code for the concrete user-specified augmentation pipeline. In +particular, Augmentor tries to avoid the need for any intermediate images and +aims to compute the output image directly from the input in one single pass. + +## Overview + +Augmentor.jl provides many augmentation operations such as rotations, flipping, +blurring, and more. See the +[documentation](https://evizero.github.io/Augmentor.jl/stable/operations/) for +the complete list of available operations. + +The package uses the `|>` operator to **compose** operations into a pipeline. + +Prepared pipelines are applied to images by calling one of the higher-level +functions: `augment`, `augment!`, or `augmentbatch!`. + +The full documentation is available at +[evizero.github.io/Augmentor.jl/](https://evizero.github.io/Augmentor.jl/). + +## Example ```julia julia> pl = ElasticDistortion(6, scale=0.3, border=true) |> @@ -38,44 +53,33 @@ julia> augment(img, pl) ![](https://evizero.github.io/Augmentor.jl/dev/mnist_preview.gif) -The Julia version of Augmentor is engineered specifically for -high performance applications. It makes use of multiple -heuristics to generate efficient tailor-made code for the -concrete user-specified augmentation pipeline. In particular -Augmentor tries to avoid the need for any intermediate images, -but instead aims to compute the output image directly from the -input in one single pass. - -**Augmentor.jl** is the [Julia](http://julialang.org) -implementation for Augmentor. The Python version of the same name -is available [here](https://github.com/mdbloice/Augmentor). +For more examples, see [the documentation](https://evizero.github.io/Augmentor.jl/). -## Package Overview +## Contributing -Augmentor.jl provides: +Contributions are greatly appreciated! -* predefined augmentation operations, e.g., `FlipX` -* `|>` operator to compose operations into a pipeline -* higher-lvel functions (`augment`, `augment!` and `augmentbatch!`) that works on a pipeline and image(s). - -Check the [documentation](https://evizero.github.io/Augmentor.jl/stable/operations/) for a full list of operations. +To report a potential **bug** or propose a **new feature**, please file a *new +issue*. *Pull requests* are always welcome. However, to make sure the PR gets +accepted, it is generally preferred when it follows a particular issue to which +it refers. ## Citing Augmentor -If you use Augmentor for academic research and wish to cite it, -please use the following paper. +If you use Augmentor for academic research and wish to cite it, please use the +following paper. -Marcus D. Bloice, Christof Stocker, and Andreas Holzinger, -*Augmentor: An Image Augmentation Library for Machine Learning*, -arXiv preprint **arXiv:1708.04680**, +Marcus D. Bloice, Christof Stocker, and Andreas Holzinger, *Augmentor: An Image +Augmentation Library for Machine Learning*, arXiv preprint **arXiv:1708.04680**, , 2017. ## Acknowledgments -This package makes heavy use of the following packages in order -to provide it's main functionality. To see at full list of -utilized packages, please take a look at the [REQUIRE](./REQUIRE) -file. +This package is inspired by a Python library of the same name available at +[github.com/mdbloice/Augmentor](https://github.com/mdbloice/Augmentor). + +To provide most of the operations, Augmentor.jl makes heavy use of many +packages. To name a few: - [FugroRoames/CoordinateTransformations.jl](https://github.com/FugroRoames/CoordinateTransformations.jl) - [JuliaImages/ImageTransformations.jl](https://github.com/JuliaImages/ImageTransformations.jl) diff --git a/docs/Project.toml b/docs/Project.toml index 82460b86..2306d1e6 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -3,12 +3,15 @@ Augmentor = "02898b10-1f73-11ea-317c-6393d7073e15" DemoCards = "311a05b2-6137-4a5a-b473-18580a3d38b5" Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549" +Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c" ImageCore = "a09fc81d-aa75-5fe9-8630-4744c3626534" ImageDraw = "4381153b-2b60-58ae-a1ba-fd683676385f" ImageMagick = "6218d12a-5da1-5696-b52f-db25d2ecc6d1" ImageShow = "4e3cecfd-b093-5904-9786-8bbb286a6a31" Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0" +MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d" MLDatasets = "eb30cadb-4394-5ae3-aed4-317e484a6458" +MappedArrays = "dbb5928d-eab1-5f90-85c2-b9b0edb7c900" MosaicViews = "e94cdb99-869f-56ef-bcf0-1ae2bcbe0389" OffsetArrays = "6fe1bfb0-de20-5000-8ca7-80f57d26f881" PaddedViews = "5432bcbf-9aad-5242-b902-cca2824c8663" diff --git a/docs/examples/examples/assets/mnist_elastic.gif b/docs/examples/examples/assets/mnist_elastic.gif deleted file mode 100644 index e119f20f..00000000 Binary files a/docs/examples/examples/assets/mnist_elastic.gif and /dev/null differ diff --git a/docs/examples/examples/mnist_elastic.md b/docs/examples/examples/mnist_elastic.md deleted file mode 100644 index 53b70c25..00000000 --- a/docs/examples/examples/mnist_elastic.md +++ /dev/null @@ -1,125 +0,0 @@ ---- -title: Elastic distortion to MNIST images -id: mnist_elastic -cover: assets/mnist_elastic.gif ---- - - -In this example we are going to use Augmentor on the famous **MNIST database of handwritten -digits** [^MNIST1998] to reproduce the elastic distortions discussed in [^SIMARD2003]. - -It may be interesting to point out, that the way Augmentor implements distortions is a little -different to how it is described by the authors of the paper. This is for a couple of reasons, -most notably that we want the parameters for our deformations to be independent of the size of -image it is applied on. As a consequence the parameter-numbers specified in the paper are not -1-to-1 transferable to Augmentor. - -If the effects are sensible for the dataset, then applying elastic distortions can be a really -effective way to improve the generalization ability of the network. That said, our implementation -of [`ElasticDistortion`](@ref) has a lot of possible parameters to choose from. To that end, we -will introduce a simple strategy for interactively exploring the parameter space on our dataset of -interest. - -## Loading the MNIST Trainingset - -In order to access and visualize the MNIST images we employ the help of two additional Julia -packages. In the interest of time and space we will not go into great detail about their -functionality. Feel free to click on their respective names to find out more information about the -utility they can provide. - -- [Images.jl](https://github.com/JuliaImages/Images.jl) will provide us with the necessary tools - for working with image data in Julia. - -- [MLDatasets.jl](https://github.com/JuliaML/MLDatasets.jl) has an MNIST submodule that offers a - convenience interface to read the MNIST database. - -The function `MNIST.traintensor` returns the MNIST training images corresponding to the given -indices as a multi-dimensional array. These images are stored in the native horizontal-major -memory layout as a single floating point array, where all values are scaled to be between 0.0 and -1.0. - -```@example mnist_elastic -using Images, MLDatasets -train_tensor = MNIST.traintensor() - -summary(train_tensor) -``` - -This horizontal-major format is the standard way of utilizing this dataset for training machine -learning models. In this tutorial, however, we are more interested in working with the MNIST -images as actual Julia images in vertical-major layout, and as black digits on white background. - -We can convert the "tensor" to a `Colorant` array using the provided function -`MNIST.convert2image`. This way, Julia knows we are dealing with image data and can tell -programming environments such as Juypter how to visualize it. If you are working in the terminal -you may want to use the package -[ImageInTerminal.jl](https://github.com/JuliaImages/ImageInTerminal.jl) - -```@example mnist_elastic -train_images = MNIST.convert2image(train_tensor) - -train_images[:,:,1] # show first image -``` - -## Visualizing the Effects - -Before applying an operation (or pipeline of operations) on some dataset to train a network, we -strongly recommend investing some time in selecting a decent set of hyper parameters for the -operation(s). A useful tool for tasks like this is the package -[Interact.jl](https://github.com/JuliaGizmos/Interact.jl). We will use this package to define a -number of widgets for controlling the parameters to our operation. - -Note that while the code below only focuses on configuring the parameters of a single operation, -specifically [`ElasticDistortion`](@ref), it could also be adapted to tweak a whole pipeline. Take -a look at the corresponding section in [High-level Interface](@ref pipeline) for more information -on how to define and use a pipeline. - -These two package will provide us with the capabilities to perform interactive visualisations in a -jupyter notebook -using Augmentor, Interact, Reactive - -The manipulate macro will turn the parameters of the -loop into interactive widgets. - -```julia -@manipulate for - unpaused = true, - ticks = fpswhen(signal(unpaused), 5.), - image_index = 1:100, - grid_size = 3:20, - scale = .1:.1:.5, - sigma = 1:5, - iterations = 1:6, - free_border = true op = ElasticDistortion(grid_size, grid_size, # equal width & height - sigma = sigma, - scale = scale, - iter = iterations, - border = free_border) - augment(train_images[:, :, image_index], op) -end -``` - -Executing the code above in a Juypter notebook will result -in the following interactive visualisation. You can now -use the sliders to investigate the effects that different -parameters have on the MNIST training images. - -!!! tip - You should always use your **training** set to do this - kind of visualisation (not the test test!). Otherwise - you are likely to achieve overly optimistic (i.e. biased) - results during training. - -![interact](https://user-images.githubusercontent.com/10854026/30867456-4afe0800-a2dc-11e7-90eb-800b6ea025d0.gif) - -Congratulations! With just a few simple lines of code, you -created a simple interactive tool to visualize your image -augmentation pipeline. Once you found a set of parameters that -you think are appropriate for your dataset you can go ahead -and train your model. - -## References - -[^MNIST1998]: LeCun, Yan, Corinna Cortes, Christopher J.C. Burges. ["The MNIST database of handwritten digits"](http://yann.lecun.com/exdb/mnist/) Website. 1998. - -[^SIMARD2003]: Simard, Patrice Y., David Steinkraus, and John C. Platt. ["Best practices for convolutional neural networks applied to visual document analysis."](https://www.microsoft.com/en-us/research/publication/best-practices-for-convolutional-neural-networks-applied-to-visual-document-analysis/) ICDAR. Vol. 3. 2003. diff --git a/docs/examples/index.md b/docs/examples/index.md deleted file mode 100644 index 70ea7bee..00000000 --- a/docs/examples/index.md +++ /dev/null @@ -1,12 +0,0 @@ -# [Tutorials](@id tutorials) - -Here we provide several tutorials that you may want to follow up and see how Augmentor is used in -practice. - -{{{democards}}} - -## References - -[^MNIST1998]: LeCun, Yan, Corinna Cortes, Christopher J.C. Burges. ["The MNIST database of handwritten digits"](http://yann.lecun.com/exdb/mnist/) Website. 1998. - -[^SIMARD2003]: Simard, Patrice Y., David Steinkraus, and John C. Platt. ["Best practices for convolutional neural networks applied to visual document analysis."](https://www.microsoft.com/en-us/research/publication/best-practices-for-convolutional-neural-networks-applied-to-visual-document-analysis/) ICDAR. Vol. 3. 2003. diff --git a/docs/make.jl b/docs/make.jl index e6a5e0a5..d6e67416 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -15,47 +15,53 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = true # MLDatasets op_templates, op_theme = cardtheme("grid") operations, operations_cb = makedemos("operations", op_templates) -examples_templates, examples_theme = cardtheme("list") -examples, examples_cb = makedemos("examples", examples_templates) format = Documenter.HTML(edit_link = "master", prettyurls = get(ENV, "CI", nothing) == "true", assets = [ joinpath("assets", "favicon.ico"), joinpath("assets", "style.css"), - op_theme, - examples_theme + op_theme ] ) +About = "Introduction" => "index.md" + +GettingStarted = "gettingstarted.md" + +UserGuide = "User's guide" => [ + "interface.md", + operations + ] + +DevGuide = "Developer's guide" => [ + "wrappers.md" + ] + +Examples = "Examples" => [ + "examples/flux.md" + ] + +License = "License" => "license.md" + +PAGES = [ + About, + GettingStarted, + UserGuide, + DevGuide, + Examples, + License + ] + makedocs( modules = [Augmentor], sitename = "Augmentor.jl", authors = "Christof Stocker", - # linkcheck = true, format = format, - pages = [ - "Home" => "index.md", - "gettingstarted.md", - "Introduction and Motivation" => [ - "background.md", - "images.md", - ], - "User's Guide" => [ - "interface.md", - operations - ], - "Developer's Guide" => [ - "wrappers.md" - ], - "Tutorials" => examples, - hide("Indices" => "indices.md"), - "LICENSE.md", - ], - # doctest=:fix, # used to fix outdated doctest + checkdocs = :exports, + pages = PAGES ) operations_cb() -examples_cb() deploydocs(repo = "github.com/Evizero/Augmentor.jl.git") diff --git a/docs/operations/misc/config.json b/docs/operations/misc/config.json index 74737941..e564a3b9 100644 --- a/docs/operations/misc/config.json +++ b/docs/operations/misc/config.json @@ -1,6 +1,7 @@ { "order":[ "layout.jl", - "utilities.jl" + "utilities.jl", + "higherorder.jl" ] } diff --git a/docs/operations/misc/higherorder.jl b/docs/operations/misc/higherorder.jl new file mode 100644 index 00000000..a9acae0a --- /dev/null +++ b/docs/operations/misc/higherorder.jl @@ -0,0 +1,28 @@ +# --- +# title: Higher-order functions +# description: a set of helper opeartions that allow applying any function +# --- + +# These operations are useful to perform an operation that is not explicitly +# defined in Augmentor. + +using Augmentor +using Random +using Statistics: mean + +Random.seed!(1337) + +DecreaseContrast = MapFun(pixel -> pixel / 2) +IncreaseBrightness = AggregateThenMapFun(img -> mean(img), + (pixel, M) -> pixel + M / 5) + +img_in = testpattern(RGB, ratio=0.5) +img_out = augment(img_in, DecreaseContrast |> IncreaseBrightness) + +# ## References + +#md # ```@docs +#md # MapFun +#md # AggregateThenMapFun +#md # ``` + diff --git a/docs/operations/misc/utilities.jl b/docs/operations/misc/utilities.jl index e38d2a78..12d7ed03 100644 --- a/docs/operations/misc/utilities.jl +++ b/docs/operations/misc/utilities.jl @@ -1,7 +1,7 @@ # --- # title: Composition utilities # cover: utilities.gif -# description: a set of helper opeartions that may be useful when compositing more complex augmentation workflow +# description: a set of helper operations that may be useful when compositing more complex augmentation workflow # --- # Aside from "true" operations that specify some kind of transformation, there are also a couple of diff --git a/docs/src/background.md b/docs/src/background.md deleted file mode 100644 index 7e20a2f2..00000000 --- a/docs/src/background.md +++ /dev/null @@ -1,131 +0,0 @@ -# Background and Motivation - -In this section we will discuss the concept of image augmentation -in general. In particular we will introduce some terminology and -useful definitions. - -## What is Image Augmentation? - -The term *data augmentation* is commonly used to describe the -process of repeatedly applying various transformations to some -dataset, with the hope that the output (i.e. the newly generated -observations) bias the model towards learning better features. -Depending on the structure and semantics of the data, coming up -with such transformations can be a challenge by itself. - -Images are a special class of data that exhibit some interesting -properties in respect to their structure. For example the -dimensions of an image (i.e. the pixel) exhibit a spatial -relationship to each other. As such, a lot of commonly used -augmentation strategies for image data revolve around affine -transformations, such as translations or rotations. Because -images are so popular and special case of data, they deserve -their own sub-category of data augmentation, which we will -unsurprisingly refer to as **image augmentation**. - -The general idea is the following: if we want our model to -generalize well, then we should design the learning process in -such a way as to bias the model into learning such -transformation-[equivariant](https://en.wikipedia.org/wiki/Equivariant_map) -properties. One way to do this is via -the design of the model itself, which for example was idea behind -convolutional neural networks. An orthogonal approach to bias the -model to learn about this equivariance - and the focus of this -package - is by using label-preserving transformations. - -## [Label-preserving Transformations](@id labelpreserving) - -Before attempting to train a model using some augmentation -pipeline, it's a good idea to invest some time in deciding on an -appropriate set of transformations to choose from. Some of these -transformations also have parameters to tune, and we should also -make sure that we settle on a decent set of values for those. - -What constitutes as "decent" depends on the dataset. In general -we want the augmented images to be fairly dissimilar to the -originals. However, we need to be careful that the augmented -images still visually represent the same concept (and thus -label). If a pipeline only produces output images that have this -property we call this pipeline **label-preserving**. - -### [Example: MNIST Handwritten Digits](@id mnist) - -Consider the following example from the MNIST database of -handwritten digits [^MNIST1998]. Our input image clearly -represents its associated label "6". If we were to use the -transformation [`Rotate180`](@ref) in our augmentation pipeline -for this type of images, we could end up with the situation -depicted by the image on the right side. - -```@example -using Augmentor, MLDatasets -input_img = MNIST.convert2image(MNIST.traintensor(19)) -output_img = augment(input_img, Rotate180()) -using Images, FileIO; # hide -upsize(A) = repeat(A, inner=(4,4)); # hide -save(joinpath("assets","bg_mnist_in.png"), upsize(input_img)); # hide -save(joinpath("assets","bg_mnist_out.png"), upsize(output_img)); # hide -nothing # hide -``` - -Input (`input_img`) | Output (`output_img`) ----------------------------------|------------------------------------ -![input](assets/bg_mnist_in.png) | ![output](assets/bg_mnist_out.png) - -To a human, this newly transformed image clearly represents the -label "9", and not "6" like the original image did. In image -augmentation, however, the assumption is that the output of the -pipeline has the same label as the input. That means that in this -example we would tell our model that the correct answer for the -image on the right side is "6", which is clearly undesirable for -obvious reasons. - -Thus, for the MNIST dataset, the transformation -[`Rotate180`](@ref) is **not** label-preserving and should not be -used for augmentation. - -[^MNIST1998]: LeCun, Yan, Corinna Cortes, Christopher J.C. Burges. ["The MNIST database of handwritten digits"](http://yann.lecun.com/exdb/mnist/) Website. 1998. - -### Example: ISIC Skin Lesions - -On the other hand, the exact same transformation could very well -be label-preserving for other types of images. Let us take a look -at a different set of image data; this time from the medical -domain. - -The International Skin Imaging Collaboration [^ISIC] hosts a -large collection of publicly available and labeled skin lesion -images. A subset of that data was used in 2016's ISBI challenge -[^ISBI2016] where a subtask was lesion classification. - -Let's consider the following input image on the left side. It -shows a photo of a skin lesion that was taken from above. By -applying the [`Rotate180`](@ref) operation to the input image, we -end up with a transformed version shown on the right side. - -```@example -using Augmentor, ISICArchive -input_img = get(ImageThumbnailRequest(id = "5592ac599fc3c13155a57a85")) -output_img = augment(input_img, Rotate180()) -using FileIO; # hide -save(joinpath("assets","bg_isic_in.png"), input_img); # hide -save(joinpath("assets","bg_isic_out.png"), output_img); # hide -nothing # hide -``` - -Input (`input_img`) | Output (`output_img`) ---------------------------------|----------------------------------- -![input](assets/bg_isic_in.png) | ![output](assets/bg_isic_out.png) - -After looking at both images, one could argue that the -orientation of the camera is somewhat arbitrary as long as it -points to the lesion at an approximately orthogonal angle. Thus, -for the ISIC dataset, the transformation [`Rotate180`](@ref) -could be considered as label-preserving and very well be tried -for augmentation. Of course this does not guarantee that it will -improve training time or model accuracy, but the point is that it -is unlikely to hurt. - -[^ISIC]: https://isic-archive.com/ - -[^ISBI2016]: Gutman, David; Codella, Noel C. F.; Celebi, Emre; Helba, Brian; Marchetti, Michael; Mishra, Nabin; Halpern, Allan. "Skin Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration (ISIC)". eprint [arXiv:1605.01397](https://arxiv.org/abs/1605.01397). 2016. diff --git a/docs/src/examples/flux.md b/docs/src/examples/flux.md new file mode 100644 index 00000000..6cc99140 --- /dev/null +++ b/docs/src/examples/flux.md @@ -0,0 +1,191 @@ +# Integration with Flux.jl + +This example shows a way to use Augmentor to provide images for training +[Flux.jl](https://github.com/FluxML/Flux.jl/) models. We will be using the +[MNIST database of handwritten digits](http://yann.lecun.com/exdb/mnist/) as +our input data. + +To skip all the talking and see the code, go ahead to [Complete example](@ref +flux_mnist_complete_example). + +## Ordinary training + +Let's first show how training looks without any augmentation. + +We are using the [MLDataSets.jl](https://github.com/JuliaML/MLDataSets.jl) +package to coveniently access the MNIST dataset. To reduce the training time, +we are working only with a subset of the data. + +After collecting the data, we divide them into batches using `batchview` from +[MLDataUtils.jl](https://github.com/JuliaML/MLDataUtils.jl). We then create a +model, pick a loss function and an optimizer, and start the training. + +```@example flux +using Flux, MLDatasets, MLDataUtils + +n_instances = 32 +batch_size = 32 +n_epochs = 16 + +# Flux requires a 4D numerical array in WHCN (width, height, channel, batch) +# format thus we need to insert a dummy dimension to indicate `C=1`(gray image). +X = Flux.unsqueeze(MNIST.traintensor(Float32, 1:n_instances), 3) +y = Flux.onehotbatch(MNIST.trainlabels(1:n_instances), 0:9) + +# size(X) == (28, 28, 1, 32) +# size(y) == (10, 32) +@assert size(X) == (28, 28, 1, 32) # hide +@assert size(y) == (10, 32) # hide + +# `data = batches[1]` means the first batch input: +# - `data[1]` is a batch extracted from `X` +# - `data[2]` is a batch extracted from `y` +# We also apply `shuffleobs` to get a random batch view. +batches = batchview(shuffleobs((X, y)), maxsize=batch_size) + +predict = Chain(Conv((3, 3), 1=>16, pad=(1, 1), relu), + MaxPool((2,2)), + Conv((3, 3), 16=>32, pad=(1, 1), relu), + MaxPool((2,2)), + Conv((3, 3), 32=>32, pad=(1, 1), relu), + MaxPool((2, 2)), + flatten, + Dense(288, 10)) + +loss(X, y) = Flux.Losses.logitcrossentropy(predict(X), y) + +opt = Flux.Optimise.ADAM(0.001) + +for epoch in 1:n_epochs + Flux.train!(loss, params(predict), batches, opt) +end + +nothing # hide +``` + +## Adding augmentation + +Augmentor aims to provide generic image augmentation support for any machine +learning framework and not just deep learning. Except for the grayscale images, +Augmentor assumes every image is an array of `Colorant`. Without loss of +generality, we use `Gray` image here so that the same pipeline also applies to +`RGB` image. + +!!! warning "Use colorant array whenever you can" + If you pass a 3d numerical array, e.g., of size `(28, 28, 3)` and interpret + it as an RGB array, you'll almost definitely get an incorrect result from + Augmentor. This is because Augmentor and the entire JuliaImages ecosystem + uses `Array{RGB{Float32}, 2}` to represent an `RGB` array. Without any + explicit note, `Array{Float32, 3}` will be interpreted as a 3d gray image + instead of any colorful image. Just think of the color specifications like + `Lab`, `HSV` and you'll notice the ambiguity here. + +```@example flux +using ImageCore + +X = Gray.(MNIST.traintensor(Float32, 1:n_instances)) +y = Flux.onehotbatch(MNIST.trainlabels(1:n_instances), 0:9) + +nothing # hide +``` + +Augmentation is given by an augmentation pipeline. Our pipeline is a +composition of three operations: + + 1. [`ElasticDistortion`](@ref) is the only image operation in this pipeline. + 2. [`SplitChannels`](@ref) split the colorant array into the plain numerical + array so that deep learning frameworks are happy with the layout. + 2. [`PermuteDims`](@ref) permutes the dimension of each image to match WHC. + +The operations are composed by the `|>` operator. + +```@example flux +using Augmentor + +pl = ElasticDistortion(6, 6, + sigma=4, + scale=0.3, + iter=3, + border=true) |> + SplitChannels() |> + PermuteDims((2, 3, 1)) +``` + +Next, we define two helper functions. + +```@example flux +# Creates an output array for augmented images +outbatch(X) = Array{Float32}(undef, (28, 28, 1, nobs(X))) +# Takes a batch (images and targets) and augments the images +augmentbatch((X, y)) = (augmentbatch!(outbatch(X), X, pl), y) + +nothing # hide +``` + +In many deep learning tasks, the augmentation is applied lazily during the data +iteration. For this purpose, we wrap the batches with a [mapped +array](https://github.com/JuliaArrays/MappedArrays.jl/) in order to augment +each batch right before feeding it to the network. + +```@example flux +using MappedArrays + +batches = batchview((X, y), maxsize=batch_size) +batches = mappedarray(augmentbatch, batches) +# eager alternative: augmentation happens when this line gets executed +# batches = augmentbatch.(batches) + +# The output is already in the expected WHCN format +# size(batches[1][1]) == (28, 28, 1, 32) +# size(batches[1][2]) == (10, 32) +@assert size(batches[1][1]) == (28, 28, 1, 32) # hide +@assert size(batches[1][2]) == (10, 32) # hide + +nothing # hide +``` + +Iterating over batches will now produce augmented images. No other changes are +required. + +## [Complete example](@id flux_mnist_complete_example) + +```@example +using Augmentor, Flux, ImageCore, MappedArrays, MLDatasets, MLDataUtils + +n_instances = 32 +batch_size = 32 +n_epochs = 16 + +X = Gray.(MNIST.traintensor(Float32, 1:n_instances)) +y = Flux.onehotbatch(MNIST.trainlabels(1:n_instances), 0:9) + +pl = ElasticDistortion(6, 6, + sigma=4, + scale=0.3, + iter=3, + border=true) |> + SplitChannels() |> + PermuteDims((2, 3, 1)) + +outbatch(X) = Array{Float32}(undef, (28, 28, 1, nobs(X))) +augmentbatch((X, y)) = (augmentbatch!(outbatch(X), X, pl), y) + +batches = mappedarray(augmentbatch, batchview((X, y), maxsize=batch_size)) + +predict = Chain(Conv((3, 3), 1=>16, pad=(1, 1), relu), + MaxPool((2,2)), + Conv((3, 3), 16=>32, pad=(1, 1), relu), + MaxPool((2,2)), + Conv((3, 3), 32=>32, pad=(1, 1), relu), + MaxPool((2, 2)), + flatten, + Dense(288, 10)) + +loss(X, y) = Flux.Losses.logitcrossentropy(predict(X), y) + +opt = Flux.Optimise.ADAM(0.001) + +for epoch in 1:n_epochs + Flux.train!(loss, params(predict), batches, opt) +end +``` diff --git a/docs/src/images.md b/docs/src/images.md deleted file mode 100644 index 1dbdf69f..00000000 --- a/docs/src/images.md +++ /dev/null @@ -1,365 +0,0 @@ -# Working with Images in Julia - -The [Julia language](https://julialang.org/) provides a rich -syntax as well as large set of highly-optimized functionality for -working with (multi-dimensional) arrays of what is known as "bit -types" or compositions of such. Because of this, the language -lends itself particularly well to the fairly simple idea of -treating images as just plain arrays. Even though this may sound -as a rather tedious low-level approach, Julia makes it possible -to still allow for powerful abstraction layers without the loss -of generality that usually comes with that. This is accomplished -with help of Julia's flexible type system and multiple dispatch -(both of which are beyond the scope of this tutorial). - -While the images-are-arrays-approach makes working with images in -Julia very performant, it has also been source of confusion to -new community members. This beginner's guide is an attempt to -provide a step-by-step overview of how pixel data is handled in -Julia. To get a more detailed explanation on some particular -concept involved, please take a look at the documentation of the -[JuliaImages](https://juliaimages.org/) ecosystem. - -## Multi-dimensional Arrays - -To wrap our heads around Julia's array-based treatment of images, -we first need to understand what Julia arrays are and how we can -work with them. - -!!! note - - This section is only intended provide a simplified and thus - partial overview of Julia's arrays capabilities in order to - gain some intuition about pixel data. For a more detailed - treatment of the topic please have a look at the [official - documentation](https://docs.julialang.org/en/latest/manual/arrays/) - -Whenever we work with an `Array` in which the elements are -bit-types (e.g. `Int64`, `Float32`, `UInt8`, etc), we can think -of the array as a continuous block of memory. This is useful for -many different reasons, such as cache locality and interacting -with external libraries. - -The same block of memory can be interpreted in a number of ways. -Consider the following example in which we allocate a vector -(i.e. a one dimensional array) of `UInt8` (i.e. bytes) with some -ordered example values ranging from 1 to 6. We will think of this -as our physical memory block, since it is a pretty close -representation. - -```jldoctest 1 -julia> memory = [0x1, 0x2, 0x3, 0x4, 0x5, 0x6] -6-element Vector{UInt8}: - 0x01 - 0x02 - 0x03 - 0x04 - 0x05 - 0x06 -``` - -The same block of memory could also be interpreted differently. -For example we could think of this as a matrix with 3 rows and 2 -columns instead (or even the other way around). The function -`reshape` allows us to do just that - -```jldoctest 1 -julia> A = reshape(memory, (3, 2)) -3×2 Matrix{UInt8}: - 0x01 0x04 - 0x02 0x05 - 0x03 0x06 -``` - -Note how we specified the number of rows first. This is because -the Julia language follows the [column-major -convention](https://docs.julialang.org/en/latest/manual/performance-tips/#Access-arrays-in-memory-order,-along-columns-1) -for multi dimensional arrays. What this means can be observed -when we compare our new matrix `A` with the initial vector -`memory` and look at the element layout. Both variables are using -the same underlying memory (i.e the value `0x01` is physically -stored right next to the value `0x02` in our example, while -`0x01` and `0x04` are quite far apart even though the matrix -interpretation makes it look like they are neighbors; which they -are not). - -!!! tip - - A quick and dirty way to check if two variables are - representing the same block of memory is by comparing the - output of `pointer(myvariable)`. Note, however, that - technically this only tells you where a variable starts in - memory and thus has its limitations. - -This idea can also be generalized for higher dimensions. For -example we can think of this as a 3D array as well. - -```jldoctest 1 -julia> reshape(memory, (3, 1, 2)) -3×1×2 Array{UInt8, 3}: -[:, :, 1] = - 0x01 - 0x02 - 0x03 - -[:, :, 2] = - 0x04 - 0x05 - 0x06 -``` - -If you take a closer look at the dimension sizes, you can see -that all we did in that example was add a new dimension of size -`1`, while not changing the other numbers. In fact we can add -any number of practically empty dimensions, otherwise known as -*singleton dimensions*. - -```jldoctest 1 -julia> reshape(memory, (3,1,1,1,2)) -3×1×1×1×2 Array{UInt8, 5}: -[:, :, 1, 1, 1] = - 0x01 - 0x02 - 0x03 - -[:, :, 1, 1, 2] = - 0x04 - 0x05 - 0x06 -``` - -This is a useful property to have when we are confronted with -greyscale datasets that do not have a color channel, yet we still -want to work with a library that expects the images to have one. - -## Vertical-Major vs Horizontal-Major - -There are a number of different conventions for how to store -image data into a binary format. The first question one has to -address is the order in which the image dimensions are -transcribed. - -We have seen before that Julia follows the column-major -convention for its arrays, which for images would lead to the -corresponding convention of being vertical-major. In the image -domain, however, it is fairly common to store the pixels in a -horizontal-major layout. In other words, horizontal-major means -that images are stored in memory (or file) one pixel row after -the other. - -In most cases, when working within the JuliaImages ecosystem, the -images should already be in the Julia-native column major layout. -If for some reason that is not the case there are two possible -ways to convert the image to that format. - -```jldoctest 1 -julia> At = collect(reshape(memory, (3,2))') # "row-major" layout -2×3 Matrix{UInt8}: - 0x01 0x02 0x03 - 0x04 0x05 0x06 -``` - -1. The first way to alter the pixel order is by using the - function `Base.permutedims`. In contrast to what we have seen - before, this function will allocate a new array and copy the - values in the appropriate manner. - - ```jldoctest 1 - julia> B = permutedims(At, (2,1)) - 3×2 Matrix{UInt8}: - 0x01 0x04 - 0x02 0x05 - 0x03 0x06 - ``` - -2. The second way is using `Base.PermutedDimsArray` which results in a lazy view that - does not allocate a new array but instead only computes the - correct values when queried. - - ```jldoctest 1 - julia> C = PermutedDimsArray(At, (2,1)) - 3×2 PermutedDimsArray(::Matrix{UInt8}, (2, 1)) with eltype UInt8: - 0x01 0x04 - 0x02 0x05 - 0x03 0x06 - ``` - -Either way, it is in general a good idea to make sure that the -array one is working with ends up in a column-major layout. - -## Reinterpreting Elements - -Up to this point, all we talked about was how to reinterpreting -or permuting the dimensional layout of some continuous memory -block. If you look at the examples above you will see that all -the arrays have elements of type `UInt8`, which just means that -each element is represented by a single byte in memory. - -Knowing all this, we can now take the idea a step further and -think about reinterpreting the element types of the array. Let us -consider our original vector `memory` again. - -```jldoctest 1 -julia> memory = [0x1, 0x2, 0x3, 0x4, 0x5, 0x6] -6-element Vector{UInt8}: - 0x01 - 0x02 - 0x03 - 0x04 - 0x05 - 0x06 -``` - -Note how each byte is thought of as an individual element. One -thing we could do instead, is think of this memory block as a -vector of 3 `UInt16` elements. - -```jldoctest 1 -julia> reinterpret(UInt16, memory) -3-element reinterpret(UInt16, ::Vector{UInt8}): - 0x0201 - 0x0403 - 0x0605 -``` - -Pay attention to where our original bytes ended up. In contrast -to just rearranging elements as we did before, we ended up with -significantly different element values. One may ask why it would -ever be practical to reinterpret a memory block like this. The -one word answer to this is **Colors**! As we will see in the -remainder of this tutorial, it turns out to be a very useful -thing to do when your arrays represent pixel data. - - -## Introduction to Color Models - -As we discussed before, there are a various number of conventions -on how to store pixel data into a binary format. That is not only -true for dimension priority, but also for color information. - -One way color information can differ is in the [color -model](https://en.wikipedia.org/wiki/Color_model) in which they -are described in. Two famous examples for color models are *RGB* -and *HSV*. They essentially define how colors are conceptually -made up in terms of some components. Additionally, one can decide -on how many bits to use to describe each color component. By -doing so one defines the available [color -depth](https://en.wikipedia.org/wiki/Color_depth). - -Before we look into using the actual implementation of Julia's -color models, let us prototype our own imperfect toy model in -order to get a better understanding of what is happening under -the hood. - -```@example 1 -# define our toy color model -struct MyRGB - r::UInt8 - b::UInt8 - g::UInt8 -end -``` - -Note how we defined our new toy color model as `struct`. Because -of this and the fact that all its components are bit types (in -this case `UInt8`), any instantiation of our new type will be -represented as a continuous block of memory as well. - -We can now apply our color model to our `memory` vector from -above, and interpret the underlying memory as a vector of to -`MyRGB` values instead. - -```julia-repl -julia> reinterpret(MyRGB, memory) -2-element Vector{MyRGB}: - MyRGB(0x01,0x02,0x03) - MyRGB(0x04,0x05,0x06) -``` - -Similar to the `UInt16` example, we now group neighboring bytes -into larger units (namely `MyRGB`). In contrast to the `UInt16` -example we are still able to access the individual components -underneath. This simple toy color model already allows us to do a -lot of useful things. We could define functions that work on -`MyRGB` values in a color-space appropriate fashion. We could -also define other color models and implement function to convert -between them. - -However, our little toy color model is not yet optimal. For -example it hard-codes a predefined color depth of 24 bit. We may -have use-cases where we need a richer color space. One thing we -could do to achieve that would be to introduce a new type in -similar fashion. Still, because they have a different range of -available numbers per channel (because they have a different -amount of bits per channel), we would have to write a lot of -specialized code to be able to appropriately handle all color -models and depth. - -Luckily, the creators of `ColorTypes.jl` went a with a more -generic strategy: Using parameterized types and **fixed point -numbers**. - -!!! tip - - If you are interested in how various color models are - actually designed and/or implemented in Julia, you can take a - look at the - [ColorTypes.jl](https://github.com/JuliaGraphics/ColorTypes.jl) - package. - -## Fixed Point Numbers - -The idea behind using fixed point numbers for each color -component is fairly simple. No matter how many bits a component -is made up of, we always want the largest possible value of the -component to be equal to `1.0` and the smallest possible value to -be equal to `0`. Of course, the amount of possible intermediate -numbers still depends on the number of underlying bits in the -memory, but that is not much of an issue. - -```jldoctest 1 -julia> using ImageCore; # ImageCore reexports FixedPointNumbers and Colors - -julia> reinterpret(N0f8, 0xFF) -1.0N0f8 - -julia> reinterpret(N0f16, 0xFFFF) -1.0N0f16 -``` - -Not only does this allow for simple conversion between different -color depths, it also allows us to implement generic algorithms, -that are completely agnostic to the utilized color depth. - -It is worth pointing out again, that we get all these goodies -without actually changing or copying the original memory block. -Remember how during this whole tutorial we have only changed the -interpretation of some underlying memory, and have not had the -need to copy any data so far. - -!!! tip - - For pixel data we are mainly interested in **unsigned** fixed - point numbers, but there are others too. Check out the - package - [FixedPointNumbers.jl](https://github.com/JuliaMath/FixedPointNumbers.jl) - for more information on fixed point numbers in general. - -Let us now leave our toy model behind and use the actual -implementation of `RGB` on our example vector `memory`. With the -first command we will interpret our data as two pixels with 8 bit -per color channel, and with the second command as a single pixel -of 16 bit per color channel - -```jldoctest 1 -julia> reinterpret(RGB{N0f8}, memory) -2-element reinterpret(RGB{N0f8}, ::Vector{UInt8}): - RGB{N0f8}(0.004,0.008,0.012) - RGB{N0f8}(0.016,0.02,0.024) - -julia> reinterpret(RGB{N0f16}, memory) -1-element reinterpret(RGB{N0f16}, ::Vector{UInt8}): - RGB{N0f16}(0.00783,0.01567,0.02351) -``` - -Note how the values are now interpreted as floating point numbers. diff --git a/docs/src/index.md b/docs/src/index.md index eb7dd356..3b04870b 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -3,7 +3,7 @@ A **fast** library for increasing the number of training images by applying various transformations. -# Augmentor.jl's documentation +# Introduction Augmentor is a real-time image augmentation library designed to render the process of artificial dataset enlargement more @@ -71,112 +71,90 @@ input in one single pass. For the Python version of Augmentor, you can find it [here](https://github.com/mdbloice/Augmentor) -## Where to begin? +## What is Image Augmentation? + +The term *data augmentation* is commonly used to describe the +process of repeatedly applying various transformations to some +dataset, with the hope that the output (i.e. the newly generated +observations) bias the model towards learning better features. +Depending on the structure and semantics of the data, coming up +with such transformations can be a challenge by itself. + +Images are a special class of data that exhibit some interesting +properties in respect to their structure. For example the +dimensions of an image (i.e. the pixel) exhibit a spatial +relationship to each other. As such, a lot of commonly used +augmentation strategies for image data revolve around affine +transformations, such as translations or rotations. Because +images are so popular and special case of data, they deserve +their own sub-category of data augmentation, which we will +unsurprisingly refer to as **image augmentation**. + +The general idea is the following: if we want our model to +generalize well, then we should design the learning process in +such a way as to bias the model into learning such +transformation-[equivariant](https://en.wikipedia.org/wiki/Equivariant_map) +properties. One way to do this is via +the design of the model itself, which for example was idea behind +convolutional neural networks. An orthogonal approach to bias the +model to learn about this equivariance - and the focus of this +package - is by using label-preserving transformations. + +## [Label-preserving Transformations](@id labelpreserving) + +Before attempting to train a model using some augmentation +pipeline, it's a good idea to invest some time in deciding on an +appropriate set of transformations to choose from. Some of these +transformations also have parameters to tune, and we should also +make sure that we settle on a decent set of values for those. + +What constitutes as "decent" depends on the dataset. In general +we want the augmented images to be fairly dissimilar to the +originals. However, we need to be careful that the augmented +images still visually represent the same concept (and thus +label). If a pipeline only produces output images that have this +property we call this pipeline **label-preserving**. + +Consider the following example from the [MNIST database of +handwritten digits](http://yann.lecun.com/exdb/mnist/). Our input image clearly +represents its associated label "6". If we were to use the +transformation [`Rotate180`](@ref) in our augmentation pipeline +for this type of images, we could end up with the situation +depicted by the image on the right side. -If this is the first time you consider using Augmentor.jl for -your machine learning related experiments or packages, make sure -to check out the "Getting Started" section. There we list the -installation instructions and some simple hello world examples. - -```@contents -Pages = ["gettingstarted.md"] -Depth = 2 -``` - -## Introduction and Motivation - -If you are new to image augmentation in general, or are simply -interested in some background information, feel free to take a -look at the following sections. There we discuss the concepts -involved and outline the most important terms and definitions. - -```@contents -Pages = ["background.md"] -Depth = 2 -``` - -In case you have not worked with image data in Julia before, feel -free to browse the following documents for a crash course on how -image data is represented in the Julia language, as well as how -to visualize it. For more information on image processing in -Julia, take a look at the documentation for the vast -[`JuliaImages`](https://juliaimages.github.io/stable/) ecosystem. - -```@contents -Pages = ["images.md"] -Depth = 2 -``` - -## User's Guide - -As the name suggests, Augmentor was designed with image -augmentation for machine learning in mind. That said, the way the -library is implemented allows it to also be used for efficient -image processing outside the machine learning domain. - -The following section describes the high-level user interface in -detail. In particular it focuses on how a (stochastic) -image-processing pipeline can be defined and then be applied to -an image (or a set of images). It also discusses how batch -processing of multiple images can be performed in parallel using -multi-threading. - -```@contents -Pages = ["interface.md"] -Depth = 2 -``` - -We mentioned before that an augmentation pipeline is just a -sequence of image operations. Augmentor ships with a number of -predefined operations, which should be sufficient to describe the -most commonly utilized augmentation strategies. Each operation is -represented as its own unique type. The following section -provides a complete list of all the exported operations and their -documentation. - -```@contents -Pages = ["operations.md"] -Depth = 2 +```@eval +using Augmentor, MLDatasets +input_img = MNIST.convert2image(MNIST.traintensor(19)) +output_img = augment(input_img, Rotate180()) +using Images, FileIO; # hide +upsize(A) = repeat(A, inner=(4,4)); # hide +save(joinpath("assets","bg_mnist_in.png"), upsize(input_img)); # hide +save(joinpath("assets","bg_mnist_out.png"), upsize(output_img)); # hide +nothing # hide ``` -## Tutorials - -Just like an image can say more than a thousand words, a simple -hands-on tutorial showing actual code can say more than many -pages of formal documentation. +Input (`input_img`) | Output (`output_img`) +---------------------------------|------------------------------------ +![input](assets/bg_mnist_in.png) | ![output](assets/bg_mnist_out.png) -The first step of devising a successful augmentation strategy is -to identify an appropriate set of operations and parameters. What -that means can vary widely, because the utility of each operation -depends on the dataset at hand (see [label-preserving -transformations](@ref labelpreserving) for an example). To that -end, we will spend the first tutorial discussing a simple but -useful approach to interactively explore and visualize the space -of possible parameters. +To a human, this newly transformed image clearly represents the +label "9", and not "6" like the original image did. In image +augmentation, however, the assumption is that the output of the +pipeline has the same label as the input. That means that in this +example we would tell our model that the correct answer for the +image on the right side is "6", which is clearly undesirable for +obvious reasons. -```@contents -Pages = [joinpath("generated", "mnist_elastic.md")] -Depth = 2 -``` +Thus, for the MNIST dataset, the transformation +[`Rotate180`](@ref) is **not** label-preserving and should not be +used for augmentation. -In the next tutorials we will take a close look at how we can -actually use Augmentor in combination with popular deep learning -frameworks. The first framework we will discuss will be -[Knet](https://github.com/denizyuret/Knet.jl). In particular we -will focus on adapting an already existing example to make use of -a (quite complicated) augmentation pipeline. Furthermore, this -tutorial will also serve to showcase the various ways that -augmentation can influence the performance of your network. - -```@contents -Pages = [joinpath("generated", "mnist_knet.md")] -Depth = 2 -``` +## Working with images in Julia -```@eval -# Pages = [joinpath("generated", fname) for fname in readdir("generated") if splitext(fname)[2] == ".md"] -# Depth = 2 -``` +Augmentor exists along other packages in the +[JuliaImages](https://juliaimages.org/) ecosystem. To learn how images are +treated in Julia, how pixels are represented, and more, read [the +documentation](https://juliaimages.org/stable/tutorials/quickstart/). ## Citing Augmentor @@ -187,9 +165,3 @@ Marcus D. Bloice, Christof Stocker, and Andreas Holzinger, *Augmentor: An Image Augmentation Library for Machine Learning*, arXiv preprint **arXiv:1708.04680**, , 2017. - -## Indices - -```@contents -Pages = ["indices.md"] -``` diff --git a/docs/src/indices.md b/docs/src/indices.md deleted file mode 100644 index 24f81d71..00000000 --- a/docs/src/indices.md +++ /dev/null @@ -1,11 +0,0 @@ -## Functions - -```@index -Order = [:function] -``` - -## Types - -```@index -Order = [:type] -``` diff --git a/docs/src/interface.md b/docs/src/interface.md index 15903ea8..c6945515 100644 --- a/docs/src/interface.md +++ b/docs/src/interface.md @@ -32,8 +32,6 @@ detail. Depending on the complexity of your problem, you may want to iterate between step `2.` and `3.` to identify an appropriate pipeline. -Take a look at the [Elastic Distortions Tutorial](@ref mnist_elastic) -for an example of how such an iterative process could look like. ## [Defining a Pipeline](@id pipeline) @@ -197,6 +195,7 @@ function `augment`. ```@docs augment +Augmentor.Mask ``` We also provide a mutating version of `augment` that writes the diff --git a/docs/src/LICENSE.md b/docs/src/license.md similarity index 91% rename from docs/src/LICENSE.md rename to docs/src/license.md index cc6d1c22..7120e279 100644 --- a/docs/src/LICENSE.md +++ b/docs/src/license.md @@ -1,4 +1,4 @@ -# LICENSE +# License ```@eval using Markdown, Augmentor