Skip to content

Managing Time

Overall challenges

Historical documents often contain different and not necessarily compatible ways to denote dates, ranging from specific numeric dates to more qualitative descriptions. Some of the descriptions are vague and cannot easily be transferred to a structured data format. Indicators for time in German historical documents may include the following:

Numeric Dates:

  • Day.Month.Year (e.g., 21.12.1820)
  • Day/Month/Year (e.g., 21/12/1820)
  • Month/Year (e.g., 12/1820)
  • Year (e.g., 1820)
  • Year range (e.g., 1820-1825)

Qualitative Descriptions:

Seasons:

  • “Frühling” (Spring)
  • “Sommer” (Summer)
  • “Herbst” (Autumn/Fall)
  • “Winter”

Names of months, often abbreviated and without years:

  • “Januar”, “Februar”, “März”, etc. (January, February, March)
  • “Sommermonate” (Summer months)

Time indicators specific to the ecclesiastical year:

  • “Fastenzeit” (Lent)
  • “Weihnachten” (Christmas)
  • festdays of saints
  • office periods of bishops or popes

Time indicators concerning the academic year:

  • SS / “Sommersemester” (summer term)
  • WS / “Wintersemester” (winter term)

Narrative periodisation based on political events:

  • Zur Zeit des Kaisers (during the reign of the emperor)
  • Während des Krieges (during the war)
  • In der Ära der Industrialisierung (in the era of industrialization)

Relative time expressions in a narrative form:

  • Vorher (before)
  • Nachher (after)
  • Davor (prior to)
  • Danach (subsequently)
  • Am selben Tag (on the same day)
  • Einen Tag später (a day later)
  • Eine Woche zuvor (a week earlier)
  • In der Zwischenzeit (in the meantime)
  • Zu dieser Zeit (at that time)
  • Vor x Jahren (x years ago)
  • In x Jahren (in x years)
  • Zu einer unbestimmten Zeit (at an unspecified time)
  • In der Vergangenheit (in the past)
  • Im Laufe des Jahres (during the year)
  • Jetzt (now)
  • Gegenwärtig (presently)
  • Bald (soon)
  • Später (later)
  • Früher (earlier)
  • Heute (today)
  • Gestern (yesterday)
  • Morgen (tomorrow)
  • Letzte Woche (last week)
  • Nächsten Monat (next month)
  • Zwischenzeitlich (in the meantime)

Indefinite qualifiers for numerical dates:

  • Ungefähr YYYY (in approximately YYYY)
  • Noch YYYY (also still in YYYY)
  • Nicht vor YYYY (not before YYYY)
  • spätestens YYYY (in YYYY at the latest)
  • frühestens YYYY (in YYYY at the earliest)

Expressions of doubt or uncertainty:

  • ? (question mark and other symbols of uncertainty)
  • Unklar (unclear)
  • Unbekannt (unknown)
  • Unbekannt (unknown)
  • Unbestimmt (indefinite)

While this list is by no means comprehensive, it can, to a limited degree, be used to work with string matching and simple NPL operations in identifying exact dates, time frames and fuzzy dates, but where expressions relate back to dates mentioned earlier in the text and require an interpretation of information in context, AI / Large Language Models will probably need to be used to automate the process of date identification. At present, the most common workflow is to check those dates manually, which is time-consuming. In the DigiKAR project, student assistants at JGU Mainz and IEG Mainz have supported us in this task.

Time modelling decisions in the Mainz workpackage

INFO

Focusing on the event-related date columns event_after_date, event_before_date, event_date, event_end, and event_start in a project with fuzzy and uncertain data had the following advantages:

  • By including event_after_date and event_before_date, the dataset can capture the uncertainty around the exact timing of events. If the precise date of an event is unknown but it’s known to have occurred after a certain date or before a certain date, these columns can represent that uncertainty.
  • The columns event_start and event_end provide information about the duration of events, which is crucial for understanding their impact and scope. This can help in analyzing the temporal extent of events and their relationship with other events.
  • Using varchar data type for these columns allows for flexibility in representing uncertain or ambiguous date information. It accommodates various date formats, partial dates, or textual descriptions of dates, which is common in historical or fuzzy datasets.

INFO

While focusing on these columns addresses certain forms of uncertainty, there are still some challenges that remain:

  • The level of granularity provided by these columns may not always be sufficient to capture the full extent of uncertainty. For example, if events are known to have occurred within a certain timeframe but the exact dates are unknown, representing this level of uncertainty may require additional techniques such as interval-based representations.
  • Uncertainty in the data itself, such as missing or inconsistent values in these columns, can still pose challenges.
  • While these columns provide information about the timing and duration of events, interpreting and making meaningful inferences cannot be taken for granted. This requires careful consideration of context and potentially the integration of external sources of information, including the various historiographical and editorial comments in the factoid lists.

Overall, focusing on event-related date columns allows us to handle temporal uncertainty per archival factoid but may not be the ideal solution for all projects.

Differentiating temporal information in the factoid model (WP3)

INFO

This part of the documentation is still missing.

Additional temporal information collected as free-text comments

Allowing a free-text entry for comments on date (un)certainty gave as the opportunity to express additional data problems beyond fuzzy dating and highlight dates from primary and secondary sources whose validity we doubted. The disadvantage, however, was that it was difficult to regulate what data collectors wrote and to find clear conventions for identical issues. The first attempt to add free-text descriptions to our Jahns data set during data collection, for example, resulted in comments that were inconsistent, contained spelling mistakes, or were difficult to interpret by other team members looking at the data later. We, therefore, made the attempt to normalise the free-text comments to a degree that will still permit some NLP or AI processing in the future. The original Jahns comments and the new normalised versions are in the table below.

TextOccurrence
Alternativdatum1
Alternatives Startdatum1
Datierung inkonsistent, siehe Praktika1
exclude1
Datierung unklar2
Datierung unsicher19
Datierung unsicher und vage8
Datierung unsicher, “offenbar vor Erwerb des Lic. iur.”1
Datierung unsicher, Alternativdaten1
Datierung unsicher, Alternativdatum7
Datierung vage85
Datierung vage / Noch 1766 (Promotion)5
Datierung vage / Ort unsicher2
Datierung vage / Zeitdauer3
Datierung vage und unsicher1
Datierung vage, “1 1/2 J. Praktikant am RKG und am RHR”1
Datierung vage, “1 1/2 J. Praktikant am RKG und am RHR”, Station nach Wetzlar1
Datierung vage, “ca. 1737 ff”, “anfangs über 14 J. lang”1
Datierung vage, Zeitdauer1
Datierung vage: “Ende 1775/Anf. 1776”1
Einzelbeleg7
Einzelbeleg, Funktion wahreschinlich auch vor- und nachher2
Enddatum mglw. später; letzter Beleg8
Enddatum möglicherweise später3
Enddatum unklar, Nachweis noch für 1809/101
Enddatum unsicher5
Enddatum unsicher, Alternativdatum4
Enddatum unsicher, evtl. auch später1
Enddatum unsicher, letzter Beleg1
Enddatum vage4
Enddatum vage / Zeitdauer1
Enddatum vage und unsicher1
Enddatum vage, “bis ca. Anf. 1706”1
Enddatum vage, Nachweis noch für 18152
Enddatum vage.1
Enddatumg mglw. später, letzter Beleg4
frühestes Belegdatum = 17511
Funktionsausübung “daneben”1
Geheimer Rat2
Inauguraldisputation am 12.09.17361
n/a17371
punktueller Nachweis3
Reihenfolge, Datierung unklar; Enddatum taq3
saisonales Datum1
Stardatum vage1
Start- und Enddatum frühester bzw. spätester Beleg3
Start- und Enddatum mglw. früher bzw. später1
Start- und Enddatum mglw. früher bzw. später, punktueller Nachweis6
Start- und Enddatum mglw. früher bzw. später. Frühester/spätester Beleg1
Start- und Enddatum mglw. früher/später, frühester bzw. spätester Beleg1
Start- und Enddatum möglicherweise früher bzw. später, punktueller Nachweis2
Start- und Enddatum möglicherweise früher, punktueller Nachweis1
Start- und Enddatum vage2
Start- und Enddatum vage / Alternatives Startdatum2
Start- und Enddatum vage, möglicherweise früher bzw. später. Sicher nicht mehr 17511
Start- und Enddatum vage: "ca. 1753- ca. Anf. 1757; Zeitdauer vier Jahre1
Startdatierung vage2
Startdatum frühester Beleg, Enddatum vage.2
Startdatum Alternativdatum2
Startdatum eventuell früher3
Startdatum mglw. früher, frühester Beleg17
Startdatum mglw. früher, frühester Beleg / Alternativdatum2
Startdatum mglw. früher, frühester Beleg; Enddatum Alternativdatum2
Startdatum mglw. früher, frühester Beleg.4
Startdatum mglw. früher.1
Startdatum möglicherweise frühe, frühester Beleg1
Startdatum möglicherweise früher, frühester Beleg2
Startdatum möglicherweise früher.2
Startdatum unsicher4
Startdatum unsicher und vage.2
Startdatum unsicher, Alternativdatum6
Startdatum unsicher, erster Beleg.1
Startdatum unsicher, Immatrikulation erst 1724 sicher nachgewiesen1
Startdatum unsicher, punktueller2

Managing fuzzy time in data analysis and data visualisation

INFO

This part of the documentation is still missing.