When we as humans look at a safety data sheet (SDS), we immediately notice how the SDS is structured and whether there are any visible quality issues with the document.
At Datalyxt, since our systems always mimic how humans perceive and interact with information, we also instilled these capabilities in SdbHub relatively early on. Our goal is to provide companies with a digital assistant through our AI solutions. SdbHub takes over the highly monotonous activities of data retrieval from documents.
Importance of Artificial Intelligence (AI) for Safety Data Sheets
What exactly does it mean when we say an AI-equipped system imitates a human and outputs qualitative feedback when processing a safety data sheet?
In order to fundamentally assess the quality of the information, we first need to know the components of an SDS:
What a safety data sheet should contain is described in detail in Annex II of the REACH Regulation. National information is explained in more detail in technical rules on hazardous substances (TRGS) 220 “National aspects when compiling safety data sheets” and has already been briefly summarized in another article. According to this, all SDSs should be structured more or less similarly: 16 sections, version information, revision date and non-empty sections filled with information.
However, the reality is different, both in terms of content and appearance. The following is an excerpt from our list:
- Two languages: There are bilingual SDSs all the time. This is not a problem per se. When translation texts are not visually perceptible separately from texts to be translated, it becomes both a human and technical extraction problem. The error rate increases and the quality of data extraction suffers. SdbHub detects bilingualism and gives the user the appropriate hint about the multilingualism of the document, but performs the SDS extraction very restricted.
- Double columns: The sense of multi-column SDSs is not obvious. For humans, these are sometimes very difficult to read too. Also the AI has to struggle in such cases. Therefore, in such cases, we issue a notice that the document is multi-column.
- Poor optical quality: Most SDSs are digitized and therefore at least information-technically processable. From time to time, however, we come across very poorly scanned SDSs. Probably not even the forensic scientist can help, who reconstructs the single letters and numbers laboriously under the microscope. In this case, SdbHub returns the highest error message level: “Document is unsuitable for automatic extraction”. In addition, SdbHub asks the user whether the document is actually an SDS. It is not uncommon for the wrong document to be uploaded by mistake. The user can derive the next steps for himself from this.
- Missing sections: SdbHub performs an initial quality check on each SDS. Missing main sections are identified and reported.
- Missing version or date information: Not only main sections can be missing in SDSs, but also version and date information. SdbHub’s initial quality check includes checking these details. The errors are accurately identified and output.
- Magic texts invisible to humans: If the text color matches the background color of the document, the text cannot be perceived by humans. The machine, however, does perceive such texts, but this would lead to a misbehavior. SdbHub detects such scenarios and also gives a hint for the “invisible” information.
- A document with numerous SDSs: Sometimes the user has multiple SDSs in a single document. Depending on the customer configuration, SdbHub will perform the extraction and provide feedback to the customer about the multiple SDSs contained.
The above cases occur in less than 3% of the cases, so rather rare. Nevertheless, the motto in SdbHub is to give the user as much high-quality feedback as possible so that necessary process steps can be initiated.