INLS 151 Lecture 10: Data and Database Searching

University of North Carolina - Chapel Hill
Information and Library Science
INLS 151

Data • Structured data vs unstructured data o Structured data: ▪ Information with a high degree of organization ▪ Easy to put into a relational database ▪ Search is simple and straightforward ▪ Structured data is easy to envision in terms of “tables” • Structured data typically allows numerical range and exact match (for text) queries o e.g., Salary < 60000 AND Manager = Smith. o Unstructured data ▪ Information with a little to no of organization ▪ Difficult or impossible to put into a relational database ▪ Search is complicated ▪ Typically refers to natural language/free text ▪ Email is a good example of unstructured data • This is because although it is indexed by date, time, sender, recipient, and/or subject, the body of the email is still unstructured ▪ Other examples include: • Books • Documents • Medical records • Social media posts o Relational Databases ▪ Structured data ▪ Designed to provide search results with exact answers ▪ Queries are built on scheme of structured fields ▪ We know the schema in advance so semantic correlation between the queries and the data is clear o Information Retrieval Systems ▪ Unstructured/semi-structured data ▪ Designed to support unstructured natural language full-text search ▪ Ranking mechanism is extremely important as the results must be sorted by relevance in order to satisfy the user’s information need ▪ We get inexact, estimated answers o Controlled Vocabularies ▪ Everyone calls everything different things • Due to culture, language, experience, background, and other factors • How do we make it so everyone still understands and is able to search things on databases/retrieva
