How many shoppers buy beer and diapers together? (Data mining can help Singapore become a hub for e-commerce) The Government is determined to establish Singapore as an international hub for e-commerce. But high-bandwidth circuits, lightning-fast servers and other infrastructures alone will not be enough. Singapore will succeed only if e-commerce hubbed here can offer better value than competing business models, whether based on paper, telephones, handshakes, or family ties. The answer may lie in the enabling technology called data mining. Data mining transforms raw data, automatically or semi-automatically, into useful knowledge for decision-making by creative combination of techniques such as databases, machine learning, artificial intelligence, statistics, data visualization and high performance computing. Data mining can provide a major competitive advantage for e-commerce. With transactions, whether supermarket purchases, stock trades, or insurance claims conducted on-line, a wealth of information can be accumulated about consumer habits and preferences. Analysts can then search for important and profitable regularities, such as the tendency of stock prices to fall over weekends. This is how it works. With rapid declines in the cost of computing and telecommunications, radical changes have been made in the way that we do business and even socialize. A huge pool of information -- business records, government statistics, and digital libraries -- is just a few clicks away. How can businesses cope with this torrent of information? Consider a typical supermarket chain of 10 outlets: with each outlet serving 400 customers who buy 20 items daily, it records 28 million transactions a year. In the days of the cash register, these transactions would have left no trace. In today's e-commerce environment, the point-of-sale system captures the description, quantity, price, purchase time of every item, and even the way the customer shops. This information can be matched to demographics of the customer and the outlet. How can management make effective use of this torrent of information? A few examples can provide some ideas. {\bf Market basket analysis}: By analyzing the basket data of shopping transactions, we can discover patterns about how items are bought together. This information can support decisions on shelf-space allocation, store layout, and product location and promotion effectiveness. For example, analysis of supermarket records may reveal that customers often buy beer and diapers together. Knowing this, the supermarket management can shelve beer and diapers close to each other or market them together. The correlation is this: A person who has a baby to look after is not likely to go anywhere, so he tends to buy beer without worrying about drinking-and-driving. {\bf Finance}: On an average day, millions of transactions take place on the New York Stock Exchange. In trading of shares and other securities, a few cents' margin can be leveraged over a million shares to generate a healthy profit of several hundred thousand dollars. No wonder that financial institutions are willing to pay for minute-by-minute trading records. Analyzing these records, researchers have discovered some amazing empirical regularities, for instance, a systematic tendency for prices of shares to fall between the Friday close and Monday opening. Smart traders can exploit such regularities to earn huge profiles. {\bf Fraud detection}: Many systems developed for fraud detection are not publicised, for obvious reasons, but a few examples can be given. By analyzing past transactions, the United States Treasury's Financial Crimes Enforcement Network Artificial Intelligence System can identify financial transactions that may indicate money-laundering activity. AT\&T developed a system for detecting international calling fraud by displaying calling activity in a way that lets users see unusual patterns quickly. The Clonedetector system, developed by GTE, uses customer profiles to detect cellular cloning fraud. If a particular customer suddenly starts calling in a very different way, fraud alert kicks in automatically. In a recent survey by market research firm Gartner Group Inc in USA, data mining was ranked as one of the top ten technologies to watch this year. The value of data mining is even greater in the little understood World Wide Web. Amazon.Com and Yahoo! are making losses or only marginal profits, yet their market valuations are billions of dollars. One reason for their high value is the information that they collect. The most popular Web sites are receiving millions of hits a day. The most popular Web sites are receiving millions of hits a day. Data mining from the Web log files coupled with the demographic information about the visitors will reveal a volume of information that makes supermarket records seem trivial. One can discover how different information-seekers browse Web pages, what pages do they visit before placing an order, preferences of travelers in an on-line travel booking: transportation, destination, date of travel, hotel, service, and price, how much time the customer spends to make a decision, the main factors for the customer to make a purchase decision, and so on. The useful information that can be obtained will go far beyond simple correlations such as between purchases of beer and diapers. Understanding consumer preferences and purchasing behaviour must be combined with careful marketing analysis to create superior tactics and strategy. This "managerial software" will underpin Singapore's sustainable advantage as an international e-commerce hub. It will no longer be true that ``only half our advertising is effective, except that we don't know which half". Data mining can revolutionize social practices, too. The Government has long been concerned about low marriage rate and the tendency for highly educated men and women to marry late and have fewer children. It established the Social Development Unit to address these issues. The very same techniques used to analyze retail and financial data can be applied with equal effectiveness to social information: Which men would Ms Leela like? What type of women would attract Mr Chan? What types of men and women have most success rates? In other words, what are the matching characteristics? Answers to these questions are not explicitly stored in the database and cannot be retrieved using a pre-determined pattern. The matchmakers could apply data mining to the hundreds of thousands of successful and unsuccessful matches made here to determine matching characteristics and improve the success rate. In short, data mining could make Singapore a regional hub for e-society as well as e-commerce. --- The Sunday Times, Review section, pp 35, Oct 11, 1998, Singapore, by Wang Ke, Liu Bing, and Ivan Png, School of Computing, National University of Singapore