DH2017 – Computer Vision in DH workshop (Papers – First Block)

Seven papers have been selected by a review commission and authors had 15 minutes to present during the Workshop. Papers were divided into three thematic blocks:

First block: Research results using computer vision
Chair: Mark Williams (Darthmouth College)

1) Extracting Meaningful Data from Decomposing Bodies (Alison Langmead, Paul Rodriguez, Sandeep Puthanveetil Satheesan, and Alan Craig)

Abstract
Slides
Full Paper

Each card used a pre-established set of eleven anthropometrical measurements (such as height, length of left foot, and width of the skull) as an index for other identifying information about each individual (such as the crime committed, their nationality, and a pair of photographs).

This presentation is about Decomposing Bodies, a large-scale, lab-based, digital humanities project housed in the Visual Media Workshop at the University of Pittsburgh that is examining the system of criminal identification introduced in France in the late 19th century by Alphonse Bertillon.

  • Data: System of criminal identification from American prisoners from Ohio.
  • ToolOpenFace. Free and open source face recognition with deep neural networks.
  • Goal: An end-to-end system for extracting handwritten text and numbers from scanned Bertillon cards in a semi-automated fashion and also the ability to browse through the original data and generated metadata using a web interface.
  • Character recognition: MNIST database
  • Mechanical Turk: we need to talk about it”: consider Mechanical Turk if public domain data and task is easy.
  • Findings: Humans deal very well with understanding discrepancies. We should not ask the computer to find these discrepancies to us, but we should build visualizations that allow us to visually compare images and identify de similarities and discrepancies.

2) Distant Viewing TV (Taylor Arnold and Lauren Tilton, University of Richmond)

Abstract
Slides

Distant Viewing TV applies computational methods to the study of television series, utilizing and developing cutting-edge techniques in computer vision to analyze moving image culture on a large scale.

Screenshots of analysis of Bewitched
  • Code on Github
  • Both presenters are authors o Humanities Data in R
  • The project was built on work with libraries with low-level features (dlib, cvv and OpenCV) + many papers that attempt to identify mid-level features. Still:
    • code often nonexistent;
    • a prototype is not a library;
    • not generalizable;
    • no interoperability
  • Abstract-features such as genre and emotion, are new territories
Feature taxonomy
  • Pilot study: Bewitched (serie)
  • Goal: measure character presence and position in the scene
  • Algorithm for shot detection 
  • Algorithm for face detection
  •  Video example
  • Next steps:
    • Audio features
    • Build a formal testing set

3) Match, compare, classify, annotate: computer vision tools for the modern humanist (Giles Bergel)

Abstract
Slides
The Printing Machine (Giles Bergel research blog)

This presentation related the University of Oxford’s Visual Geometry Group’s experience in making images computationally addressable for humanities research.

The Visual Geometry Group has built a number of systems for humanists, variously implementing (i) visual search, in which an image is made retrievable; (ii) comparison, which assists the discovery of similarity and difference; (iii) classification, which applies a descriptive vocabulary to images; and (iv) annotation, in which images are further described for both computational and offline analysis

a) Main Project Seebibyte

  • Idea: Visual Search for the Era of Big Data is a large research project based in the Department of Engineering Science, University of Oxford. It is funded by the EPSRC (Engineering and Physical Sciences Research Council), and will run from 2015 – 2020.
  • Objectives: to carry out fundamental research to develop next generation computer vision methods that are able to analyse, describe and search image and video content with human-like capabilities. To transfer these methods to industry and to other academic disciplines (such as Archaeology, Art, Geology, Medicine, Plant sciences and Zoology)
  • Demo: BBC News Search (Visual Search of BBC News)

Tool: VGG Image Classification (VIC) Engine

This is a technical demo of the large-scale on-the-fly web search technologies which are under development in the Oxford University Visual Geometry Group, using data provided by BBC R&D comprising over five years of prime-time news broadcasts from six channels. The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types: object search, image search and people search.

The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types.
An item of interest can be specified at run time by a text query, and a discriminative classifier for that item is then learnt on-the-fly using images downloaded from Google Image search.

ApproachImage classification through Machine Learning.
Tool: VGG Image Classification Engine (VIC)

The objective of this research is to find objects in paintings by learning classifiers from photographs on the internet. There is a live demo that allows a user to search for an object of their choosing (such as “baby”, “bird”, or “dog, for example) in a dataset of over 200,000 paintings, in a matter of seconds.

It allows computers to recognize objects in images, what is distinctive about our work is that we also recover the 2D outline of the object. Currently, the project has trained this model to recognize 20 classes. The demo allows the user to test our algorithm on their images.

b) Other projects

Approach: Image searching
Tool: VGG Image Search Engine (VISE)

Approach: Image annotation
Tool: VGG Image Annotator (VIA)