Building Machine Learning Tools to Mine Unstructured Text

February 17th, 2012

This presentation describes how to build tools to find the meaning of unstructured text using machine generated knowledge representation graphs using NLP and ai-one’s Topic-Mapper API.
The prototype solution, called ai-Browser, is a generalized approach that can solve the following types of use cases:
  • Sentiment analysis of social media feeds
  • Evaluating electronic medical records for clinical decision support systems
  • Comparing news feeds
  • Electronic discovery for legal purposes
  • Automatically tagging documents
  • Building intelligent search agents
The source code for ai-Browser is available to developers to customize to meet specific requirements. For example:
  • Healthcare providers can use ai-Browser to analyze medical records by using ontologies and medical lexicons.
  • Social media marketing agencies can use ai-Browser to create personal profiles of customers by reading social media feeds.
  • Researchers can use ai-Browser to mine PubMed and other repositories.
Our goal is to get the source code and the API into the hands of commercial companies who want to tailor the application to solve specific problems.
Click here to download the presentation from SlideShare:
View more presentations from ai-one

Partnership to Create New Social Media Intelligence Tools

February 16th, 2012

New Partnership Targets Creation of Social Media Intelligence Tools

Press Release

Tweet log

New tools will enable machine learning of twitter feeds

La Jolla CA | Zurich | Berlin  February 16 2012 – ai-one inc. and Gnostech Inc. announced a partnership today to build new machine learning applications for the US government and military. The deal brings together two small firms that are well known for developing cutting-edge technologies. Gnostech specializes in simulation and modeling, Command Control Communications Computers and Intelligence Surveillance and Reconnaissance (C4ISR) systems and security engineering and Information Assurance (IA) applications. The partnership with ai-one provides Gnostech with access to technology that enables computers to learn the meaning and context of data in a way that is similar to humans. Called “biologically inspired intelligence” the technology is a new form of machine learning that is particularly useful for understanding complex, unstructured information – such as conversations in social media.

In the past month, the US government has issued six requests for companies to create solutions to help better understand TwitterFacebook and other social media sources. These broad area announcements (BAAs) are formal requests from the Government to invite companies to provide turn-key solutions. With more than 800 million people actively using Facebook and more than 100 million Twitter users, governments and intelligence agencies know that they need better ways to mine this data to get real-time information to protect national security.“

We now have order diflucan pill more than 40 partners worldwide that are experimenting with our technology – but only 3 that specialize in US government applications,” said Tom Marsh, President of ai-one. “Gnostech is local, technically driven and well positioned to develop rapid prototypes using our technology.”

About Gnostech, Since 1981, Gnostech has provided technical and engineering services to the Department of Defense (DOD) and Department of Homeland Security (DHS). Gnostech has a proven reputation for engineering efficiency, systems innovation, and dedicated customer service.

Gnostech Inc. began as an engineering and consulting company in Warminster, PA with expertise in GPS simulations and software, initially supporting the US Navy at the Naval Air Development Center (NADC) in Warminster, PA. Today, Gnostech has grown from a few people to about 50 employees with a satellite office in San Diego, CA and engineering support staff in Norfolk, VA, Morristown, NJ and Philadelphia, PA. Gnostech’s technical expertise expands upon our GPS experience and extends into Mission Planning, Network Engineering, Information Assurance and Security Engineering.  www.gnostech.com

About ai-one inc., ai-one provides an “API for building learning machines”.  Based in San Diego, Zurich and Berlin, ai-one’s software technology is an adaptive holosemantic data space with semiotic capabilities (“biologically inspired intelligence”).  The Topic-Mapper™ SDK for text enables developers to create intelligent applications that deliver better sense-making capabilities for semantic discovery, lightweight ontologies, knowledge collaboration, sentiment analysis, artificial intelligence and data mining.  www.ai-one.com

Mining Unstructured Text: A new machine learning approach

February 13th, 2012

We believe we have found a new approach to apply a new general purpose machine learning technology to solve domain-specific problems by mining unstructured text. The solution addresses fundamental problems in knowledge management:

ai-browser is a tool for mining unstructured textHow to find information that is difficult to describe?

For example, you want to find a match between two people to fill an empty job position. What attributes do you use to represent a complex subject (like a person) to find the best fit?

What if the single best answer is hidden within a vast amount of unstructured text?

Let’s say you want to repurpose a drug – such as using the side-effect of a chemical to treat a disease using a newly discovered metabolic pathway. How would you search through the 21+ million research articles in PubMed to find the best match from more than 2,000+ known drug compounds?

What if the textual information is constantly changing?

What if you want to provide personalized marketing to a person based on what they are saying on Facebook, Twitter or LinkedIn?  To do this, you must understand the meaning of what they are saying. The most accurate approach is to have people read and interpret the conversations because we are fantastic at understanding the complexity of language. But to do this with a computer requires a different approach: Machines must learn like humans. They must understand how meaning evolves in a conversation, how to disambiguate, how to detect the single most important concepts, etc.

Big Data Means Big Opportunity

These are classic “Big Data” problems – and they are rampant. Finding a solution would change everything; from how we discover new drugs to what social media would tell us about ourselves.

There have been many attempts to find ways for machines to learn like a human. Artificial intelligence has made bold promises that have been consistently broken for more than 50 years. Yet, we still don’t have a universal approach for machines to learn and understand language like a human.

Growth of Websites

Now, more than ever, we need to find a new approach to mine unstructured text. As of February 2012, it is estimated that the Internet has more than 614 million websites. More than 1.8 zettabytes of information was created in 2011 – more than much of it unstructured text from our comments on websites, news articles, social media feeds… just about anything where people are communicating with language rather than numbers.

Unstructured text can’t be processed like structured data. Rather it requires an approach that enables knowledge representation in a form that can be processed by machines.

Knowledge representation is a rich field and there has been tremendous effort and innovation – too many to describe here. However, we still live in a world where the overwhelming majority of people (including almost every CIO, developer and consumer) CANNOT find the information they seek with a simple query. Rather, the domain of data mining text analytics is dominated by specialists who use tools that are very difficult to learn and very expensive to deploy (because they require highly skilled programmers).

We set out to create a new toolset that would be easy to use for almost any programmer to build data mining tools for unstructured text.

ai-browser: A prototype for human-machine collaboration

For the past several months, we have been working on a new approach for text analytics and data mining. The idea is to create a tool that enables human-machine collaboration to quickly mine unstructured data to find the single best answer.

We now have a working prototype, called ai-browser, that solves knowledge management and data mining problems involving unstructured text. It combines natural language processing (NLP) and pattern recognition technologies to generate a precise knowledge representation graph.  Our team sertraline online uk selected OpenNLP because it is open-source, easy to use and customize. We used the Topic-Mapper API to detect patterns within the text after it was pre-processed to isolate parts of speech. The system also allows users to use ontologies and/or reference documents to sharpen the results. The output is a graph that can be used in a number of ways with 3rd party products, such as:

  • Submission to search appliances like Google, Bing, Lucene, etc.
  • Analysis with modelling tools like Cytoscape, MATlab, SAS, etc.
  • Enterprise systems for reporting, knowledge management and/or decision support

This graph makes it easy to ask questions like, “Find me something like _______!” and get a very tightly clustered group of results – rather than millions of hits.

Even more impressive, ai-browser’s graph is a powerful tool that can be applied to a wide range of applications, such as:

  • Healthcare – clinical decision support systems to enable physicians to make better decisions by understanding all the relevant information held in electronic medical records (EMRs) – including emerging trends and relationships within the patient population.
  • Social media – detecting and tracking sentiments in conversations over time (such as Twitter) to understand how brands are perceived by customers.
  • Innovation management – discovering the relationships of information across disciplines to foster more productive collaboration and interdisciplinary discoveries.
  • Information comparison and confirmation – determine the similarities and differences between two different sources of content.
  • Human resources – sourcing and placement of the best candidate for a job based on previous work experience.

The intent of the ai-browser design is to provide a starting point for developers to build solutions to meet the specific needs of enterprise customers. For example, modifying the system enables solutions to the following use cases:

  • Help a physician determine if additional tests are necessary to confirm a diagnosis.
  • Determine how perceptions about a brand are change through conversations on Twitter.
  • Find new uses for a drug by reviewing clinical studies published on PubMed and determining if there are relevant patent filings.
  • Identify stock market trading opportunities by comparing news feeds and SEC filings on a particular company or industry.
  • Finding the best person for a job by searching the internet for someone that is “just like person who has this job last year.”

Enterprise Data Mining: A far easier, lower cost approach.

Unlike other data mining approaches, ai-browser learns the meaning of documents by generating a lightweight ontology – a dynamic file that describes every relationship between every data element. It detects keywords and their association words which provide context. The combination of a keyword and all the association words can be thought of as a coordinate (x,y0->T) where x is the keyword and y0->T is the series of association words for that specific keyword. The collection of these coordinates creates a topology for the document: G(V,E) where G is graph and V is the set of vertices (or nodes) represented by each keyword and E is the edge represented by the associations to the keyword.

ai-fingerprint of Fox News Article

We call this graph the “ai-fingerprint.” It is a lossless knowledge representation model. It captures the meaning of the document by showing the context of words and the clustering of concepts. It is lossless because it captures every relationship in a directed graph – thereby revealing the significance of a word that may only appear once yet is central to the meaning of a large, complex textual data set.

ai-browser expresses ai-fingerprints uses the XGMML format in REST. This enables it to accommodate dynamic data, so it can change as the underlying text changes (such as in text from social media feeds).

Contact Olin Hyde to schedule a demo of ai-Browser. The source code is available to programmers to license and modify to solve specific problems.

Self-Aware, Self-Defending Adaptive Network Appliance Software (SASDANAS)

January 12th, 2012

On November 29, 2011, our consulting partner Ariston Consulting submitted a proposal to the US Air Force to develop a new form of defense for cyber assets using machine learning for cyber awareness and resilience.  This proposal was partially developed by ai-one in an effort to bring the most advanced machine learning technologies to the Air Force at the lowest possible cost. 

Our proposal (below) was in response to BAA Number  AFRL-PK-11-0001 as a Rapid Innovation Funding program. Our proposal met all four operational criteria yet was rejected on January 6, 2012 due to our lack of prior history with the US Air Force. The AF simply preferred to do business with a company that they knew rather than a new vendor.

However, on December 20, 2011 the Air Force released a request to build a system very similar to what we proposed to build below under the contract BAA-RIK-12-03. Both projects were issued by the Department of the Air Force, Air Force Materiel Command, AFRL – Rome Research Site, AFRL/Information Directorate, 26 Electronic Parkway, Rome, NY, 13441-4514.

We are not accusing the Air Force of any wrong doing nor is there any evidence that they copied and pasted our ideas into another BAA. Quite to the contrary, the Air Force is a big place and we are not the only people thinking of ways for networks to defend themselves using autonomic machine learning technologies. However, we feel that our technology can be deployed at very minimal cost compared to the budget provided in the BAA issued a month after we proposed a smaller, more rapid solution.

We think it is valuable to share this information with the public for several reasons:

  1. To publish our findings in a public forum to prevent any other party from obtaining a patent for cyber security applications or network defense applications using the approach described herein.
  2. To encourage major defense contractors to contact Ariston Consulting and to use ai-one’s biologically inspired intelligence in cyber security applications.
  3. To encourage the Air Force to consider reducing the budget allocated for BAA-RIK-12-03 by 90%. There is simply no business reason to spend 10-times what we proposed.

Title:     SASDANAS: A network that protects itself from cyber attacks.

BAA Number:  AFRL-PK-11-0001

Firm:         Ariston Consulting LLC

P.O. Box 1721

Sierra Vista, AZ 85636

http://www.aristonhq.com

Phone: (520) 378-6112

CAGE CODE: 61E85

Duration of Effort:         24 months

Estimated Cost of Effort:          $2,800,000

Self Certification of Applicant:   Service-Disabled Veteran-Owned Small Business (SDVOSB)

Air Force Need Area:  02. Cyberspace Superiority and Mission Assurance

Air Force Primary User:  24th Air Force Wing, San Antonio, TX

Programs/Platforms for Proposed Technology:

DoD-Reimbursed IR&D:  NO

Proposed Approach Relate to Prior DoD-Funded SBIR or STTR:  NO

Foreign Participants for Effort:  NO

Funded by DoD or Another Federal Agency: NO

Percentage of Effort

by Offerer:                    60%

by Others:                    40%

 Preferred Funding Instrument:    Contract

Technical POC:     Jonathan Woodruff, CEO, Ariston Consulting

Phone: 520.378.6112

Email:  jonathan.woodruff@aristonhq.com

 

Business POC:        Steve Mecham, COO, Ariston Consulting

Phone: 520.378.6112

Email: steve.mecham@aristonhq.com

 

Project Description/Objective:  SASDANAS: A network that protects itself from cyber attacks.

Ariston Consulting LLC proposes to develop a Self-Aware, Self-Defending Adaptive Network Appliance Software (SASDANAS) system that acts as an intelligent agent to monitor network activity, content and behavior to augment the capacity of human analysts to identify and counteract all forms of cyber threats.

Ariston Consulting is a Service-Disabled Veteran-Owned Small Business (SDVOSB) based in Sierra Vista, AZ, provides advanced technology testing and engineering solutions. Expertise and experience in providing non-personal scientific and engineering services to test Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance (C4ISR) systems in support of the US Air Force (USAF), US Army, and DISA.

SASDANAS is an intelligent agent that learns and understands the threat level posed by every byte-pattern across a network. The software system uses a new form of machine learning to monitor every detail of a network to identify and isolate cyber security threats – including malware, application high-jacking, sabotage and illicit access, hacking and unauthorized use. It enables the Air Force to make all cyber assets self-aware, self-protecting and adaptive to any external or internal threat. The approach eliminates the opportunity for zero-day attacks because it detects all anomalous packet behavior and content. Furthermore, SASDANAS provides the Air Force with a first-mover advantage as the system learns through use and thus becomes more intelligent over time.

SASDANAS is a 64-bit multithread, massively parallel application that is deployable through a REpresentational state transfer (REST) architecture. Each instance of SASDANAS may be deployed in series and/or in parallel. This architecture provides the USAF the greatest degree of flexibility when deploying into field operations. This approach enables the USAF to use SANDANAS in either: a) moving-windows approach to read every packet as it flows across the network; or, b) identifying threats by capturing an image of the topology of network at byte- or packet-level of detail to understand the behavior and content of network. Each instance of SASDANAS will have the capacity to understand up to 18 exabytes of data at a time. Speed of SASDANAS is dependent on available memory and processing capacity. When deployed in parallel, SASDANAS has the theoretical capacity to monitor the activity of the entire Internet.

Unlike current approaches to cyber security, SASDANA uses a new technology called a HoloSemantic DataSpace (HSDS) to detect, classify and store every byte pattern. The HSDS is thus able to recognize every packet’s behavior and content to determine if the byte-pattern conforms to expectations or is anomalous and therefore subject to further scrutiny to determine if it is a threat. The HSDS is an adaptive, associative network that detects the relationship of every byte that is fed into the system. Thus, the HSDS is capable of identifying both known threat patterns while concurrently identifying and isolating anomalous patterns that may signify a zero-day attack or non-compliant use of the network (e.g., sabotage).

The HSDS is a newly discovered form of neuronal network that mimics the neurophysiology of the neocortex. It is commercially trademarked as a “biologically inspired intelligence” and operates similar to a human brain. It learns autonomically by detecting byte-patterns at the moment of stimulation. The HSDS stores each unique byte pattern only once regardless of how many times it encounters that specific pattern. It registers and adjusts the semiotic generic zoloft cost value for each byte pattern each time it is stimulated – adjusting the size of the net automatically. It determines the semiotic value for each byte pattern with the following dimensions, each of which may have many values: time of stimulation, place of stimulation, syntax of surrounding byte patterns, and packet payload and addressing. Thus, the HSDS creates an n-dimensional representation of the semiotic value of every byte-pattern; thereby capturing every detail within the complexity of data.

The HSDS technology is commercially available from ai-one inc. since June 2011. It is currently in use at Orange (France Telecom) and more than 40 additional installation sites around the world. The commercial version of the HSDS is offered in three versions: Topic-Mapper to analyze human languages, graphalizer to analyze sensor data, and Ultra-Match to analyze visual images. The technology has been used by The Federal Criminal Police Office of Germany (Bundeskriminalamt or BKA) to build a crime scene analysis tool for the Swiss Federal Department of Justice and Police (Eidgenössische Justiz- und Polizeidepartement or EJPD). The commercial versions of HSDS have a technology readiness level (TRL) of 9. The TRL for the proposed customization of current HSDS COTS technology is 7. Ariston Consulting will license ai-one’s technology to create a new software application to meet the unique needs of protecting USAF cyber assets.  The HSDS differs from current forms of neural networks, machine learning and artificial intelligence technologies in the following ways:

Transparency – HSDS generates a lightweight ontology (LWO) that adjusts dynamically with each passing byte (and/or packet). The LWO describes the relationship of every byte within the network. The LWO is machine generated, machine curated and accessible by humans.

Benefit: Humans can see how SASDANAS interprets the value and threat level of every packet.

 

Autonomic:  HSDS learns without any human intervention. It does not require any prior conditions or neighborhood functions. Rather, it automatically generates computational and data cells within the network as needed immediately upon network stimulation – just like the human brain.

Benefit: SASDANAS is objective and subject to cognitive biases that may distort threat detection.

 

Speed, Accuracy, Sensitivity: HSDS captures every detail regardless of the degree of complexity. In incremental learning situations, the proposed 64-bit architecture is expected to be at least 105 faster than latent Dirichlet allocation (LDA) or vectoring approaches such as COStf-idf.

Benefit: SASDANAS is very fast and accurate – even by neural net standards.

 

Trainability: The system can be trained and untrained by humans. It is aware of which patterns are learned through training and which patterns have been taught from humans.

Benefit: SASDANAS eliminates the risk of overtraining. It is flexible.

 

Compatible with Existing Technologies: The system is deployable using industry standard approaches as a cloud-based application.

Benefit: SASDANAS reduces the cost of maintaining and protecting cyber assets while extending their functionality.

Ariston Consulting proposes to build SASDANAS as a software proof-of-concept for further development as a hardware solution called Self-Aware, Self-Defending Adaptive Network Appliance Chipsets (SASDANACS). Based on preliminary tests of the core commercial technology, Ariston estimates that the hardware version will operate at least 10,000 times faster than the software version. This speed, combined with an estimated capacity of 18 exabytes per instance, enables the hardware version to monitor and protect cyber assets at wire-speed and at Internet scale.

SASDANA is deployable at any layer with network (from switch layers 1 through 7) and is compatible with known specifications for Wireless Network After Next (WNAN) as described in unclassified DARPA and AFRL reports. Its architecture provides the AF with a wide range of deployment options.

Approach:

Ariston Consulting LLC will adapt commercial-off-the-shelf (COTS) HSDS software from ai-one inc. to build SASDANA. Ariston Consulting has secured rights to license and modify technologies owned by ai-one inc.for the purpose of creating custom applications for agencies of the United States Government, including the Department of Defense.

Critical Need/JUPM Challenge Area Addressed:

02. Cyberspace Superiority and Mission Assurance

Benefits to the Warfighter:

Cyber security – Networks monitor and defend themselves.

Force leverage – SASDANA drastically increases the analytical capacity of human analysis.

Morale – SASDANA makes network security analysis and counter measures more interesting by eliminating mundane tasks.

Funding/Cost:              $2,800,000.

Program Plan:

a)     Period of Performance:  Not more than 24 months from commencement of contract for Phase 1.

i)      Ariston Consulting shall report progress on technical design, engineering and prototype development every 30 days throughout the project.

b)    Schedule – Total of 24 months:

i)      Detailed technical specification including use and test cases:  3 months

ii)     Technical development of software using Agile methodology: 12 months

iii)    Software testing: 3 months

iv)    Software revisions: 3 months

v)     Preparation and submission of final technical report: 3 months

c)     Deliverables:

i)      Scientific and Technical Reports every three (3) months, Final Report at conclusion

ii)     Funds and Man-hour Expenditure Report every three (3) months, Final Report at conclusion

iii)    Contract Status Report (CFSR)

iv)    Status Report

v)     Presentation Materials

vi)    Software: As proposed, on CD-ROM

d)    Metrics/Measure of Success:

i)      Ability to detect known malware compared to industry standard technology (e.g., McAfee).

ii)     Ability to detect unknown malware threat imposed by AFRL Red Team.

iii)    Ability to detect anomalous behavior of a packet within a network.

e)     Facilities/Equipment:

i)      All development will be completed at an Ariston consulting controlled Top Secret (TS) facility.

f)     Risk:

i)      Technical risk of SASDANAS is minimal as the technology currently is available for commercial use by ai-one inc. Ariston Consulting will mitigate risk by employing ai-one engineers to train Ariston staff, transfer knowledge and provide guidance based on commercial experience.

g)    Proposed Transition Plan:

i)      Technical data: Unlimited rights granted to USAF.

ii)     Non-commercial software (NCS): Unlimited rights granted for each additional instance of SASDANAS software shall be sold to the US Government.

iii)    NCS Documentation: Unlimited rights granted to USAF.

iv)    Commercial computer software rights: Not applicable. SASDANAS will be a modified version of ai-one technology that will not be commercially available.

v)     There are no restrictions on the use of a licensed instance of SASDANAS for use within the United States Air Force. The Air Force may deploy SASDANAS at its own discretion, in any manner it so chooses.

vi)    SASDANA’s application program interface (API) may be accessed by any entity authorized by the USAF.

h)     Other Key Participants:

i)      Commercial supplier of HSDS technology, software development kit and technical training:

ai-one inc. (a Delaware C-corporation)

Atten: Olin Hyde, Vice President

5711 La Jolla Blvd., La Jolla, CA 92037

Phone: 1-858-381-5897/Email: oh@ai-one.com

Use Case: Passenger Name List (PNL) for Secure Flight Program

December 14th, 2011

Case Study Summary:

The Passenger Name List application was developed by ai-one for one of the largest airline ground handling services company in the world.

The PNL Matcher is being used by airlines at the JFK, FRA, and ZHR airports to efficiently and accurately match a PNL (Passenger Name List) with the different suspect lists (no-fly list) supplied by official sources such as the U.S. Department of Homeland Security (DHS) Secure Flight Program.

This application uses the core ai-one™ technology in a limited but very effective way.  The challenge in this area is the need to comply quickly with new U.S. DHS requirements to effectively screen passengers before boarding a flight.

The challenge for such an application is, when a ticketing agent creates a ticket for a passenger from a country that does not use the western alphabetic character set and phonetically spells the name.  Since the spelling could be very different from different agents, the software has to be intelligent enough to find and match suspect passengers to the DHS list.  Additionally phonetic use of characters is varies from country to country but must meet U.S. requirements for quality.

PNL Matcher for Swissport's Secure Flight Program

 PNL Benefits:

  • Fast, very accurate response to a ticket agent
  • Phonetic spelled names can be matched with compliance to all regulations
  • Easy to add new lists
  • Works in any language or character set (language agnostic)

Topic:

Visualize all associations within written text

Kind:

Custom implementation of Topic-Mapper API

Status:

Solutions installed runs since its implementation in 2007 productive
Partner:

Swissport International

Rico Barandun, Product Manager

Application areas:

  • Pattern Matching
  • Security

Target Industries

  • Transportation
  • Homeland security
  • Law enforcement

ai-one Use Case: Enhance OCR of Credit Card Receipts using Machine Learning API

December 14th, 2011

OCR Correction using ai-one machine learning API


Use Case Summary:

The BON Matcher is an ai-one implementation enabling a leading swiss retail store to analyze all scanned credit card receipts.

After the scan process, all credit card receipts are analyzed and matched against patterns using a-one’s API.

Our solution corrects the errors of the optical character recognition (OCR) system when it fails to recognize 100% of the elements.

This was an early validation of our technology. It  affirmed ai-one’s superiority over alternative artificial intelligence-based solutions as a much faster, better quality, and less expensive solution. The retail chain saved substantial operating costs by automating this process and was able to reduce its workforce by 15 people.

The project was finished after 3 months of development time and is still being used for more than 80 stores.

The feature of the technology used in this application is commonly used in document archiving systems where users need to search for documents that have been scanned with many character errors.

Benefits:

  • Improved OCR performance from 80% to 98% in less than a week after implementation.
  • Enhancing OCR recognition in a separate, low-cost post processing process
  • Faster data availability
  • Additional fraud detection possibilities

Deployment:

Customize software development

Status:

Solution in place. Successful since 2006 launch.

Partner:

Swiss Data Safe AG

Application areas:

  • OCR recognition
  • Numerical series matching
  • Data management / Archiving

Target Industries:

  • Information management
  • Retail

 

OCR Correction Workflow Using Machine Learning API

 


Big Data Just Got Smaller: New Approach to Find Information

November 15th, 2011

Press Release

For Immediate Release

ai-Fingerprint

ai-Fingerprint shows a graphical representation of the knowledge within a news article

San Diego, CA – Artificial intelligence vendor ai-one will unveil a new approach to graphically represent knowledge at the SuperData conference in San Diego on Wednesday November 16, 2011. The discovery, named ai-Fingerprint, is a significant breakthrough because it allows computers to understand the meaning of language much like a person. Unlike other technologies, ai-Fingerprints compresses knowledge in way that can work on any kind of device, in any language and shows how clusters of information relate to each other. This enables almost any developer to use off-the-shelf and open-source tools to build systems like Apple’s SIRI and IBM Watson.

Ondrej Florian, ai-one’s VP of Core Technology invented ai-Fingerprints as a way to find information by comparing the differences, similarities and intersections of information on multiple websites. The approach is dynamic so that the ai-Fingerprint transforms as the source information changes. For example, the shape for a Twitter feed adapts with the conversation. This enables someone to see new information evolve and immediately understand its significance.

“The big idea is that we use artificial intelligence to identify clusters and show how each cluster relates to another,” said Florian. “Our approach enables computers to compare ai-Fingerprints across many documents to find hidden patterns and interesting relationships.”

The ai-Fingerprint is the collection of all the keywords and their associations identified by ai-one’s Topic-Mapper tool. Each keyword and its associations is a coordinate – much like what you would find on a map. The combination of these keywords and associations forms a graph that encapsulates the entire meaning of the document.

The real-world applications are impressive. “It solves a lot of so-called Big Data problems because the system learns by itself,” said Olin Hyde who worked with Florian on the project. “ai-Fingerprints work with existing computer languages and standards. So it only took us about a week to create a generic tool, called BrainBrowser, to find relationships in complex texts – such as summarizing news articles, searching for a job, or identifying new uses for a drug.”

To build BrainBrowser, the team fed ai-Fingerprint results from Topic-Mapper into a natural language processing tool, OpenNLP, so that the computer could understand the rules of grammar then tag parts of speech, chunk zoloft online purchase phrases and classify words into categories (also called named-entity recognition). The ai-Fingerprint is continuously updated by Topic-Mapper so that the computer can understand how information changes over time – as it does in a human conversation.

Next, the team built a little tool in Java that converted the output into a continuous data feed using an open-standard format called XGMML. This format shares the knowledge of a document as a network of words, sentences and relationships.

Finally, they visualized the result with an open-source bioinformatics tool, called Cytoscape, to show the differences, similarities and identify anomalous information among documents. The result is a graphic representation of knowledge that can show clusters, extract summaries and compare many documents at the same time.

The approach is easy for others to replicate with other technologies. “We used Topic-Mapper with Java, OpenNLP and Cytoscape,” said Florian, “But you could easily do this with Python, MATLAB and NLTK. Heck, you could throw a voice recognition tool on it, like Dragon or Nuance, and you can build an intelligent agent just like SIRI.”

ai-Fingerprint works in any language because Topic-Mapper looks only at byte-patterns. “The approach can give false positives if you don’t teach it the rules of language” warned Florian, “but it is very accurate once it learns the grammar from an outside source of information – such as a natural language processing system or an external database.”

ai-one’s engineering team sees ai-Fingerprints as a way to make it easier, faster and less expensive for their partners to develop intelligent systems. The team is now testing it for applications in advertising, financial analysis, medical research and search engine optimization (SEO).

“Our mission is to make powerful AI available to all developers. This is a big step in that direction,” said ai-one’s chief operating officer Tom Marsh. “We are eager to find academic and consulting partners who can build upon what we started.”

“BrainBrowser is just a minimally viable product (MVP) to prove the concept,” added Hyde. “The sky is the limit for those that want to build commercial applications. Just take the MVP code and customize to your needs.”

A demo of the system can be seen on www.ai-one.com and the semsys YouTube channel.  ai-one intends to provide the source code for ai-Fingerprint as part of its Topic-Mapper software development kit.

Lead, Follow or Fail: AI and Your Business in 2012

October 20th, 2011

Press Release

San Diego CA | October 20, 2011 – Did you miss the wave? Artificial intelligence is transforming entire industries by finding value in big, complex data.

The San Diego Online Society (SANDIOS) will host a public seminar on Thursday November 17 on how artificial intelligence (AI) is being used by leading edge companies around the world.

Recent advances in AI technology make it easy to build machines that can learn like humans. Now almost any programmer can build systems like Apple’s SIRI and IBM Watson by combining off-the-shelf technologies. A leading vendor of machine learning technology, ai-one, will present case studies from a wide range of customers. The seminar will focus on showing practical ways businesses can use AI.

Questions that will be addressed: 

  • What is AI & why everything you think you know about AI has changed
  • Business uses for ai-one technology
  • Demo of a cutting edge AI application
  • AI incubation models
  • How to succeed with building an AI business
  • AI Product strategy

The event will be hosted by Jones Day which specializes in intellectual property and business law.

Tickets available online at:   http://sandios-11-2011.eventbrite.com/

About ai-one inc., ai-one provides an “API for building learning machines”.  Based inSan Diego,Zurich andBerlin, ai-one’s software technology is an adaptive holosemantic data space with semiotic capabilities (“biologically inspired intelligence”).  The Topic-Mapper™ SDK for text enables developers to create artificial intelligence applications for semantic discovery, knowledge collaboration, sentiment analysis, and data mining.

Contact: Olin Hyde, Ph: 1-858-381-5897, email: oh@ai-one.com, web: www.ai-one.com

###