Archive for February, 2012

Building Machine Learning Tools to Mine Unstructured Text

Friday, February 17th, 2012

This presentation describes how to build tools to find the meaning of unstructured text using machine generated knowledge representation graphs using NLP and ai-one’s Topic-Mapper API.
The prototype solution, called ai-Browser, is a generalized approach that can solve the following types of use cases:
  • Sentiment analysis of social media feeds
  • Evaluating electronic medical records for clinical decision support systems
  • Comparing news feeds
  • Electronic discovery for legal purposes
  • Automatically tagging documents
  • Building intelligent search agents
The source code for ai-Browser is available to developers to customize to meet specific requirements. For example:
  • Healthcare providers can use ai-Browser to analyze medical records by using ontologies and medical lexicons.
  • Social media marketing agencies can use ai-Browser to create personal profiles of customers by reading social media feeds.
  • Researchers can use ai-Browser to mine PubMed and other repositories.
Our goal is to get the source code and the API into the hands of commercial companies who want to tailor the application to solve specific problems.
Click here to download the presentation from SlideShare:
View more presentations from ai-one

Partnership to Create New Social Media Intelligence Tools

Thursday, February 16th, 2012

New Partnership Targets Creation of Social Media Intelligence Tools

Press Release

Tweet log

New tools will enable machine learning of twitter feeds

La Jolla CA | Zurich | Berlin  February 16 2012 – ai-one inc. and Gnostech Inc. announced a partnership today to build new machine learning applications for the US government and military. The deal brings together two small firms that are well known for developing cutting-edge technologies. Gnostech specializes in simulation and modeling, Command Control Communications Computers and Intelligence Surveillance and Reconnaissance (C4ISR) systems and security engineering and Information Assurance (IA) applications. The partnership with ai-one provides Gnostech with access to technology that enables computers to learn the meaning and context of data in a way that is similar to humans. Called “biologically inspired intelligence” the technology is a new form of machine learning that is particularly useful for understanding complex, unstructured information – such as conversations in social media.

In the past month, the US government has issued six requests for companies to create solutions to help better understand TwitterFacebook and other social media sources. These broad area announcements (BAAs) are formal requests from the Government to invite companies to provide turn-key solutions. With more than 800 million people actively using Facebook and more than 100 million Twitter users, governments and intelligence agencies know that they need better ways to mine this data to get real-time information to protect national security.“

We now have more than 40 partners worldwide that are experimenting with our technology – but only 3 that specialize in US government applications,” said Tom Marsh, President of ai-one. “Gnostech is local, technically driven and well positioned to develop rapid prototypes using our technology.”

About Gnostech, Since 1981, Gnostech has provided technical and engineering services to the Department of Defense (DOD) and Department of Homeland Security (DHS). Gnostech has a proven reputation for engineering efficiency, systems innovation, and dedicated customer service.

Gnostech Inc. began as an engineering and consulting company in Warminster, PA with expertise in GPS simulations and software, initially supporting the US Navy at the Naval Air Development Center (NADC) in Warminster, PA. Today, Gnostech has grown from a few people to about 50 employees with a satellite office in San Diego, CA and engineering support staff in Norfolk, VA, Morristown, NJ and Philadelphia, PA. Gnostech’s technical expertise expands upon our GPS experience and extends into Mission Planning, Network Engineering, Information Assurance and Security Engineering.  www.gnostech.com

About ai-one inc., ai-one provides an “API for building learning machines”.  Based in San Diego, Zurich and Berlin, ai-one’s software technology is an adaptive holosemantic data space with semiotic capabilities (“biologically inspired intelligence”).  The Topic-Mapper™ SDK for text enables developers to create intelligent applications that deliver better sense-making capabilities for semantic discovery, lightweight ontologies, knowledge collaboration, sentiment analysis, artificial intelligence and data mining.  www.ai-one.com

Mining Unstructured Text: A new machine learning approach

Monday, February 13th, 2012

We believe we have found a new approach to apply a new general purpose machine learning technology to solve domain-specific problems by mining unstructured text. The solution addresses fundamental problems in knowledge management:

ai-browser is a tool for mining unstructured textHow to find information that is difficult to describe?

For example, you want to find a match between two people to fill an empty job position. What attributes do you use to represent a complex subject (like a person) to find the best fit?

What if the single best answer is hidden within a vast amount of unstructured text?

Let’s say you want to repurpose a drug – such as using the side-effect of a chemical to treat a disease using a newly discovered metabolic pathway. How would you search through the 21+ million research articles in PubMed to find the best match from more than 2,000+ known drug compounds?

What if the textual information is constantly changing?

What if you want to provide personalized marketing to a person based on what they are saying on Facebook, Twitter or LinkedIn?  To do this, you must understand the meaning of what they are saying. The most accurate approach is to have people read and interpret the conversations because we are fantastic at understanding the complexity of language. But to do this with a computer requires a different approach: Machines must learn like humans. They must understand how meaning evolves in a conversation, how to disambiguate, how to detect the single most important concepts, etc.

Big Data Means Big Opportunity

These are classic “Big Data” problems – and they are rampant. Finding a solution would change everything; from how we discover new drugs to what social media would tell us about ourselves.

There have been many attempts to find ways for machines to learn like a human. Artificial intelligence has made bold promises that have been consistently broken for more than 50 years. Yet, we still don’t have a universal approach for machines to learn and understand language like a human.

Growth of Websites

Now, more than ever, we need to find a new approach to mine unstructured text. As of February 2012, it is estimated that the Internet has more than 614 million websites. More than 1.8 zettabytes of information was created in 2011 – more than much of it unstructured text from our comments on websites, news articles, social media feeds… just about anything where people are communicating with language rather than numbers.

Unstructured text can’t be processed like structured data. Rather it requires an approach that enables knowledge representation in a form that can be processed by machines.

Knowledge representation is a rich field and there has been tremendous effort and innovation – too many to describe here. However, we still live in a world where the overwhelming majority of people (including almost every CIO, developer and consumer) CANNOT find the information they seek with a simple query. Rather, the domain of data mining text analytics is dominated by specialists who use tools that are very difficult to learn and very expensive to deploy (because they require highly skilled programmers).

We set out to create a new toolset that would be easy to use for almost any programmer to build data mining tools for unstructured text.

ai-browser: A prototype for human-machine collaboration

For the past several months, we have been working on a new approach for text analytics and data mining. The idea is to create a tool that enables human-machine collaboration to quickly mine unstructured data to find the single best answer.

We now have a working prototype, called ai-browser, that solves knowledge management and data mining problems involving unstructured text. It combines natural language processing (NLP) and pattern recognition technologies to generate a precise knowledge representation graph.  Our team selected OpenNLP because it is open-source, easy to use and customize. We used the Topic-Mapper API to detect patterns within the text after it was pre-processed to isolate parts of speech. The system also allows users to use ontologies and/or reference documents to sharpen the results. The output is a graph that can be used in a number of ways with 3rd party products, such as:

  • Submission to search appliances like Google, Bing, Lucene, etc.
  • Analysis with modelling tools like Cytoscape, MATlab, SAS, etc.
  • Enterprise systems for reporting, knowledge management and/or decision support

This graph makes it easy to ask questions like, “Find me something like _______!” and get a very tightly clustered group of results – rather than millions of hits.

Even more impressive, ai-browser’s graph is a powerful tool that can be applied to a wide range of applications, such as:

  • Healthcare – clinical decision support systems to enable physicians to make better decisions by understanding all the relevant information held in electronic medical records (EMRs) – including emerging trends and relationships within the patient population.
  • Social media – detecting and tracking sentiments in conversations over time (such as Twitter) to understand how brands are perceived by customers.
  • Innovation management – discovering the relationships of information across disciplines to foster more productive collaboration and interdisciplinary discoveries.
  • Information comparison and confirmation – determine the similarities and differences between two different sources of content.
  • Human resources – sourcing and placement of the best candidate for a job based on previous work experience.

The intent of the ai-browser design is to provide a starting point for developers to build solutions to meet the specific needs of enterprise customers. For example, modifying the system enables solutions to the following use cases:

  • Help a physician determine if additional tests are necessary to confirm a diagnosis.
  • Determine how perceptions about a brand are change through conversations on Twitter.
  • Find new uses for a drug by reviewing clinical studies published on PubMed and determining if there are relevant patent filings.
  • Identify stock market trading opportunities by comparing news feeds and SEC filings on a particular company or industry.
  • Finding the best person for a job by searching the internet for someone that is “just like person who has this job last year.”

Enterprise Data Mining: A far easier, lower cost approach.

Unlike other data mining approaches, ai-browser learns the meaning of documents by generating a lightweight ontology – a dynamic file that describes every relationship between every data element. It detects keywords and their association words which provide context. The combination of a keyword and all the association words can be thought of as a coordinate (x,y0->T) where x is the keyword and y0->T is the series of association words for that specific keyword. The collection of these coordinates creates a topology for the document: G(V,E) where G is graph and V is the set of vertices (or nodes) represented by each keyword and E is the edge represented by the associations to the keyword.

ai-fingerprint of Fox News Article

We call this graph the “ai-fingerprint.” It is a lossless knowledge representation model. It captures the meaning of the document by showing the context of words and the clustering of concepts. It is lossless because it captures every relationship in a directed graph – thereby revealing the significance of a word that may only appear once yet is central to the meaning of a large, complex textual data set.

ai-browser expresses ai-fingerprints uses the XGMML format in REST. This enables it to accommodate dynamic data, so it can change as the underlying text changes (such as in text from social media feeds).

Contact Olin Hyde to schedule a demo of ai-Browser. The source code is available to programmers to license and modify to solve specific problems.