Friday, June 18, 2010

Natural Language


Introduction

All of us use search engines on the internet. Typically we type key word or multiple words and the search engine comes back with the ‘search results’ page with the links. Just imagine if one types in the natural language, the way we speak, and the search engine comes back with specific summary of all the relevant information.

What is Natural Language Processing?

Natural-language processing (NLP) is an area of artificial intelligence research that attempts to reproduce the human interpretation of language. NLP methodologies and techniques assume that the patterns in grammar and the conceptual relationships between words in language can be articulated scientifically. The ultimate goal of NLP is to determine a system of symbols, relations, and conceptual information that can be used by computer logic to implement artificial language interpretation.

What’s happening around?

Companies are working on the theoretical issues of computational linguistics and developing technologies such as speech processing, machine translation, universal and application-specific dialog engines, information retrieval, text mining and hypertext databases, automatic text summarization, natural language understanding and generation, etc. One key objective is to provide advanced NLP software for multiple languages and modalities exploited in business applications. Another fundamental objective is to provide the sophisticated NLP technologies required to linguistically enable human-computer interfaces.

Applications

Natural language processing provides both theory and implementations for a range of applications. In fact, any application that utilizes text is a candidate for NLP. The most frequent applications utilizing NLP include the following:

  • Information Retrieval (IR) – Provides a list of potentially relevant documents in response to a user’s query
  • Information Extraction (IE) – Focuses on recognition, tagging and extraction into a structured representation, certain key elements of information, e.g. persons, companies, locations, organizations, from large collections of text. These extractions can then be utilized for a range of applications including question-answering, visualization, and data mining
  • Question-Answering – Provides the user with either just the text of the answer itself or answer-providing passages
  • Summarization – Can empower an implementation that reduces a larger text into a shorter, yet richly constituted abbreviated narrative representation of the original document
  • Machine Translation – Various levels of NLP have been utilized in MT systems, ranging from the ‘word-based’ approach to applications that include higher levels of analysis

How will search engines make money?

Various business models emerge:

  • More specific and focused advertisements on the searched pages
  • Database of users based on age group, gender, location, preferences, etc. – A huge source of data that could potentially be a gold mine for market/ consumer research and tracking the changing/ evolving preferences of people
  • Search itself could become paid in certain cases
  • Bundle the text platform with voice recognition, streaming media, voice recognition & understanding emotions, biometric, etc.
  • Search within emails – information from one cloud made available at a cost
  • Enterprise content management – more a B2B approach

Limitations

The greatest challenge to NLP is representing a sentence or group of concepts with absolute precision. The realities of computer software and hardware limitation make this challenge nearly insurmountable. The realistic amount of data necessary to perform NLP at the human level requires a memory space and processing capacity that is beyond even the most powerful computer processors.

Conclusion

While NLP is a relatively recent area of research and application, as compared to other information technology approaches, there have been sufficient successes to date that suggest that NLP-based information access technologies will continue to be a major area of research and development in information systems now and far into the future.

No comments: