Feeding the Trees

Having just attended RootsTech 2017, I feel compelled to compare the state of genealogy with my previous observations and viewpoints, as reported last year in Evolution and Genealogy. What has changed, and in which direction? I will also make some concrete suggestions to the industry that could go a long way to averting the headlong demise of online genealogy.

Figure 1 – Compost frenzy.

This year’s Innovator Showdown semi-finalists presented products with the following functionality: photograph/image tagging and organisation, indexing, DNA triangulation, transcription, stories and memories, celebrity/friend tree matching, and newspaper research. That’s quite a broad range, and by itself doesn’t give away much in terms of trending. Some of the products were specialised, but others offered insular functionality, divorced from complementary functionality elsewhere — a point that I also mentioned last year. You would be forgiven for asking why can’t I have that, together with that, and inside this?

The overall message of RootsTech was still about stories and memories, and I’m totally on-board with this, but it is just the tip of a bigger requirement involving narrative. I applaud any change of focus away from raw data on trees to descriptive and audiovisual media that real people can relate to — allegedly allowing us to become heart specialists — but narrative (as favoured by humans but not by software designers) has many critical uses that were not addressed at the conference. More on this in a moment.

On the Wednesday (Feb. 8th), there was a session entitled “Industry Trends and Outlooks” with a panel that included Ben Bennett, Executive Vice President of International Business at Findmypast, and Craig Bott, co-founder, President and CEO of Grow Utah. Their particular comments were enlightening about current thinking in the commercial sector.

Ben acknowledged that not everyone wants to build a tree (or at least not just a tree), and that companies needed to understand their “customer context”. He was making the point that there is a mass market — apparently 83M people in the US interested and willing to pay — that involves a broad range of skills and interests, so how do you engage it. He suggested that products needed differentiation, with functionality aimed at the requirements of their particular customer group. I’m sceptical of this suggestion since it could be interpreted as different skills and depth of work translating into functional differences rather than user-interface (UI) ones; does the fact that some people write or research better than others necessarily mean that they’re the only ones wanting to do it?

Ben also acknowledged that good ideas don’t just come from within companies, and that they [Findmypast] are looking externally and willing to talk about new innovation. I believe this meant demonstrable products rather than written ideas, but it’s probably as close as we can expect to outreach so I wholly welcome his comments.

Craig talked about new technology in the areas of OCR and handwriting recognition — functionality that we all want — but also went on to describe neural networks being applied to the identification of named entities and semantic links. What this means is being able to pick out personal names, places, dates, events, etc., from digitised text, and also the relationships between them: biological or social relationships between people, origin or residence of someone, and dates of vital and non-vital events. Well, I have to repeat something that I’ve said elsewhere: it’s people that perform genealogical research, not software. Highlighting named entities could be an aid to newspaper research, but the researcher would be analysing the text, and across multiple documents rather than just one at a time.

My take on all this is that the large companies feel obligated to throw technology at genealogical (and historical) research, but the more fundamental issues of real research are not being addressed, or even acknowledged.

I make no secret of the fact that I dislike online family trees as they’re currently implemented. They do not capture history, they make it far too easy to connect the wrong dots, and they’re an inappropriate organisational structure (i.e. they should be simply a visualisation of lineage). I’ve justified these points in previous posts, but let me summarise some of their basic failings that really need tackling.

a)They are person-centric when it is time to enter data. For instance, in order to enter all the people in a given census household, it is nearly always necessary to start with each person in the tree, and then add each so-called “fact” and associated source to them. This is quite laborious as you really want to work from the census household rather than from the tree, and you have to frequently re-consult and re-describe the same document. If you want to attach an image of some document, say because you have a paper copy that’s not online at the current host site, then you’ll also be forced to attach it multiple times (hopefully not independent copies).

b)When a source is added to a “fact” then it is a direct connection with nothing in between: no analytical commentary; no transcription; no justification for why it’s appropriate to the selected person; and no explanation as to why the name might be slightly different, or the date-of-birth implied by an age slightly different, from your conclusions. A consequence of this is that there’s no way to determine how a given conclusion was reached by someone.

c)There’s no obvious way to add material that relates to multiple people. Photographs and document images are obvious examples, but the same problem relates to stories/memories, transcriptions, and any researched histories of your ancestors.

d)There’s no obvious concept of ownership in a unified family tree. While still controversial in some quarters, most users do want this. As I mentioned last year, certain contributions should be immutable, but which? While a mere collection of “facts” can have no ownership (and cannot be copyrighted either), authored works such as research articles and personal memories must have.

e)There will always be multiple possible conclusions in unified trees; anyone disputing that needs to understand the concept of evidence better. If there are no controls then there will be edit wars, and potentially loss of valuable contributions, but what form should they take? Throwing complicated technology at this in order to support multiple versions of the “truth” isn’t necessarily the right solution, and we need to take a step back and look at the dynamics of real research. Consider: what we’re doing isn’t always what we think we’re doing.

f)Copying is made too easy in online trees, either from someone else’s tree or from material found elsewhere. In an ideal world then it should not be necessary, but these trees offer no alternatives. Their lack of functionality may even force users to put certain material elsewhere, thus leading to other users feeling they have to copy rather than cite or link-to it. This all means that errors, or even tentative conclusions when a researcher hasn’t yet finished, will replicate like a virus. It also means that the provenance of a contribution is lost, and there can be no attribution to the original author, contributor, or owner.

While I dislike trees,[1] I do acknowledge the investment that sites may have in that paradigm. So what can be done to address these failings, and help trees evolve to meet more of the requirements of that mass market?

The scheme I want to suggest to companies that host online family trees involves using separate layers. Back in Our Days of Future Passed — Part III, I explained how the STEMMA data model has two notional sub-models: conclusional and informational. The old GenTech [lien inaccessible y compris sur la Web Archive, probablement Genealogical Process Steps ] data model also had separate sub-models, although its equivalent to informational was termed evidence. STEMMA purposely uses the term informational as its sub-model includes the information sources and the possible analysis of that information, irrespective of whether it contributes evidence relevant to some conclusion.

When information is cleanly separated from conclusions then it provides a natural distinction for controlling changes to the corresponding contributions. Conclusions — which includes names, dates, and relationships in the online tree — would be editable by anyone, whereas information — which includes personal stories and memories, photographs and images of documents, source analysis, research, and proof arguments — would be editable only by the respective contributor (or possibly some registered agent, such as another family member).