Recent news that the Library of Congress is attracting attention from artificial intelligence (AI) startup companies raises important discussions about the progress of the digital age and how cultural heritage is being utilized. This phenomenon suggests that the role of public data is being redefined as AI rapidly evolves, marking a time when the value of the knowledge held by libraries is being reevaluated.
The Importance of Public Domain Data and AI Development
Established in 1800, the Library of Congress has preserved valuable books and materials as one of the largest libraries in the world. Notably, a vast portion of its digitized data is not protected by copyright, making it an extremely attractive resource for AI companies as a free-to-use database. Large-scale language models (LLMs), in particular, require an enormous and diverse array of text data for training, and the Library’s archives meet these needs perfectly.
One of the issues AI companies have faced thus far is copyright. There has been an increasing number of lawsuits as companies and creators object to their content being used for AI training without permission. In this context, public domain data poses little legal risk, making it a valuable resource for AI companies. It is only natural that the Library of Congress, with its wealth of public domain data, would attract such attention from AI companies.
The Gap Between AI and Historical Context
On the other hand, the question of how AI models interpret historical materials has also come to the forefront. There is a risk that AI, trained on modern data, may misinterpret historical context. One example is AI mistakenly identifying an old book held by a historical figure as a smartphone. This occurs because AI’s powerful algorithms often prioritize modern context, resulting in a distorted interpretation of the past.
This issue presents a major challenge when applying AI to historical materials. The risk of AI generating incorrect information, known as “hallucination,” can undermine the trust placed in AI, especially in situations where accuracy is crucial. For instance, the spread of misinformation in the creation of legislative or policy-related data can lead to serious consequences. Recent tests demonstrated this risk when an AI model mistakenly treated Washington, D.C., as a “state” and generated incorrect information regarding China-related legislation.
Redefining the Library’s Role and the Future of Digitalization
Despite these risks, the Library of Congress is carefully considering the use of AI tools while planning to make more data available. This approach is rooted in the idea that the data held by libraries and federal institutions can play an enhanced role as the foundation of economic and technological innovation. As AI’s influence on the economy and society expands, digital archives will become a new frontier of knowledge and technology.
As demonstrated by the case of the Library of Congress, the digitalization of cultural heritage and knowledge will become increasingly important in the AI era. This is not just about providing data but also about exploring new ways of interpreting and utilizing knowledge through AI. How libraries coexist with AI technology and promote further digitalization will be a critical challenge moving forward. This challenge also directly relates to how we protect historical context while achieving technological innovation for the future.
Conclusion
While the Library of Congress is becoming an attractive resource for AI companies, caution is necessary in the interaction between AI and historical materials. The future lies in finding how AI’s advancements will redefine the value of cultural heritage and how we can utilize past knowledge.