About the Course
Author: Sarah Oberbichler Leibniz Institute of European History (IEG)
This course offers an introduction to Natural Language Processing (NLP) and its application in the humanities and cultural studies. Participants work with digitized newspaper collections from the German Digital Library and examine the topic of "Natural and Environmental Disasters in Media". Both theoretical foundations and practical applications of NLP methods are taught.
Course Content and Methodology:
- Practical Application: Students learn to apply NLP tools to specific research questions. The digitized newspaper collections of the German Digital Library are used as a data basis, and various analysis methods are employed.
- Thematic Focus: The course focuses on the examination of natural and environmental disasters in media. It analyzes how these events are presented and discussed in historical media reports.
- Interdisciplinary Approaches: The course explores how NLP technologies can open up new perspectives on cultural, historical, and social issues. It also reflects on how these methods complement and extend traditional humanities approaches.
Learning Objectives:
- Application of relevant Python packages for NLP tasks on own research data
- Preparation and structuring of large datasets for analysis
- Use of transformer models and large language models for NLP tasks with extensive data volumes
- Critical reflection on various methods (methodology critique)
- Writing a scientific paper on the research results
Course Schedule
Module 1: October 25, 2024 (10:00 AM to 11:30 AM)
Introduction to the topic, the course, and NLP
Introduction to Colab Notebooks
Python Crash Course 1
Module 2: November 8, 2024 (10:00 AM to 11:30 AM)
Python Crash Course 2
Introduction to NLP with SpaCy, NLTK, and SKLEARN
Module 3: November 15, 2024 (10:00 AM to 11:30 AM)
The German Newspaper Portal: Introduction and API Usage
(Guests: Lisa Landes, Michael Büchner, and Stephanie Nitsche from the German National Library)
Module 4: November 22, 2024 (10:00 AM to 11:30 AM)
Transformer Models for Semantic Search
Module 5: December 6, 2024 (10:00 AM to 11:30 AM)
Large Language Models for Article Extraction and Post-OCR Correction
Module 6: January 10, 2025 (10:00 AM to 11:30 AM)
Named Entity Recognition and Text Classification
Module 7: January 24, 2025 (10:00 AM to 11:30 AM)
Individual Consultation Appointments
Modules and Workloads

Module 1: Introduction to the topic, the course, and NLP • Introduction to Colab Notebooks • Python Crash Course 1
Module 1 will introduce the main topic of the course, give an overview on NLP and a crash course on Python using Colab Notebooks.
View DetailsOctober 25, 2024

Module 2: Python Crash Course 2 • Introduction to NLP with SpaCy, NLTK, and SKLEARN
In this module, we'll explore how to leverage Colab Notebooks for data access and become data detectives using basic NLP tasks.
View DetailsNovember 8, 2024

Module 3: The German Newspaper Portal: Overview, API Usage, Data Lab
This module gives background information to the the German Newspaper Portal, introduces to the API and gives an insight into the Data Lab.
View DetailsNovember 15, 2024

Module 4: Transformer Models for Semantic Search
In Module 4 we investigate the variety of transformer models for NLP tasks as well as the semantic search possibilites for historical newspapers.
View DetailsNovember 22, 2024

Module 5: Large Language Models for Article Extraction and Post-OCR Correction
In this module, we'll explore how Open-Access LLMs can be used for complex NLP tasks.
View DetailsDecember 6, 2024

Module 6: Named Entity Recognition and Text Classification
In this module we explore novel ways for NER (using Data Lab API's) and text classification.
View DetailsJanuary 10, 2025
Literature
- Dobson, J.E. (2023). On reading and interpreting black box deep neural networks. International Journal of Digital Humanities, 5, 431–449. https://doi.org/10.1007/s42803-023-00075-w
- Khurana, D., Koli, A., Khatter, K. et al. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82, 3713–3744. https://doi.org/10.1007/s11042-022-13428-4
- König, M. (19. August 2024). ChatGPT und Co. in den Geschichtswissenschaften – Grundlagen, Prompts und Praxisbeispiele. Digital Humanities am DHIP. Abgerufen am 2. Dezember 2024 von https://doi.org/10.58079/126eo
- Navigli, R., Conia, S., & Ross, B. (2023). Biases in Large Language Models: Origins, Inventory, and Discussion. Journal of Data and Information Quality, 15(2), Article 10, 21 pages. https://doi.org/10.1145/3597307
- Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv:2402.07927. https://doi.org/10.48550/arXiv.2402.07927